All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/8] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-10-11  6:14 ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon, shuah,
	andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

This series enables the ring-based dirty memory tracking for ARM64.
The feature has been available and enabled on x86 for a while. It
is beneficial when the number of dirty pages is small in a checkpointing
system or live migration scenario. More details can be found from
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").

This series is applied on top of Marc's v2 series [0], fixing dirty-ring
ordering issue. This series is going to land on v6.1.rc0 pretty soon.

[0] https://lore.kernel.org/kvmarm/20220926145120.27974-1-maz@kernel.org

v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com

Testing
=======
(1) kvm/selftests/dirty_log_test
(2) Live migration by QEMU

Changelog
=========
v6:
  * Add CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP, for arm64
    to advertise KVM_CAP_DIRTY_RING_WITH_BITMAP in
    PATCH[v6 3/8]                                              (Oliver/Peter)
  * Add helper kvm_dirty_ring_exclusive() to check if
    traditional bitmap-based dirty log tracking is
    exclusive to dirty-ring in PATCH[v6 3/8]                   (Peter)
  * Enable KVM_CAP_DIRTY_RING_WITH_BITMAP in PATCH[v6 5/8]     (Gavin)
v5:
  * Drop empty stub kvm_dirty_ring_check_request()             (Marc/Peter)
  * Add PATCH[v5 3/7] to allow using bitmap, indicated by
    KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP                        (Marc/Peter)
v4:
  * Commit log improvement                                     (Marc)
  * Add helper kvm_dirty_ring_check_request()                  (Marc)
  * Drop ifdef for kvm_cpu_dirty_log_size()                    (Marc)
v3:
  * Check KVM_REQ_RING_SOFT_RULL inside kvm_request_pending()  (Peter)
  * Move declaration of kvm_cpu_dirty_log_size()               (test-robot)
v2:
  * Introduce KVM_REQ_RING_SOFT_FULL                           (Marc)
  * Changelog improvement                                      (Marc)
  * Fix dirty_log_test without knowing host page size          (Drew)

Gavin Shan (8):
  KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to
    kvm_dirty_ring.h
  KVM: Add support for using dirty ring in conjunction with bitmap
  KVM: arm64: Enable ring-based dirty memory tracking
  KVM: selftests: Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP if possible
  KVM: selftests: Use host page size to map ring buffer in
    dirty_log_test
  KVM: selftests: Clear dirty ring states between two modes in
    dirty_log_test
  KVM: selftests: Automate choosing dirty ring size in dirty_log_test

 Documentation/virt/kvm/api.rst               | 19 ++++---
 arch/arm64/include/uapi/asm/kvm.h            |  1 +
 arch/arm64/kvm/Kconfig                       |  2 +
 arch/arm64/kvm/arm.c                         |  3 ++
 arch/x86/include/asm/kvm_host.h              |  2 -
 arch/x86/kvm/x86.c                           | 15 +++---
 include/linux/kvm_dirty_ring.h               | 15 +++---
 include/linux/kvm_host.h                     |  2 +
 include/uapi/linux/kvm.h                     |  1 +
 tools/testing/selftests/kvm/dirty_log_test.c | 53 ++++++++++++++------
 tools/testing/selftests/kvm/lib/kvm_util.c   |  5 +-
 virt/kvm/Kconfig                             |  8 +++
 virt/kvm/dirty_ring.c                        | 24 ++++++++-
 virt/kvm/kvm_main.c                          | 34 +++++++++----
 14 files changed, 132 insertions(+), 52 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v6 0/8] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-10-11  6:14 ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, kvm, maz, bgardon, andrew.jones, shan.gavin,
	catalin.marinas, dmatlack, pbonzini, zhenyzha, will, kvmarm

This series enables the ring-based dirty memory tracking for ARM64.
The feature has been available and enabled on x86 for a while. It
is beneficial when the number of dirty pages is small in a checkpointing
system or live migration scenario. More details can be found from
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").

This series is applied on top of Marc's v2 series [0], fixing dirty-ring
ordering issue. This series is going to land on v6.1.rc0 pretty soon.

[0] https://lore.kernel.org/kvmarm/20220926145120.27974-1-maz@kernel.org

v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com

Testing
=======
(1) kvm/selftests/dirty_log_test
(2) Live migration by QEMU

Changelog
=========
v6:
  * Add CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP, for arm64
    to advertise KVM_CAP_DIRTY_RING_WITH_BITMAP in
    PATCH[v6 3/8]                                              (Oliver/Peter)
  * Add helper kvm_dirty_ring_exclusive() to check if
    traditional bitmap-based dirty log tracking is
    exclusive to dirty-ring in PATCH[v6 3/8]                   (Peter)
  * Enable KVM_CAP_DIRTY_RING_WITH_BITMAP in PATCH[v6 5/8]     (Gavin)
v5:
  * Drop empty stub kvm_dirty_ring_check_request()             (Marc/Peter)
  * Add PATCH[v5 3/7] to allow using bitmap, indicated by
    KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP                        (Marc/Peter)
v4:
  * Commit log improvement                                     (Marc)
  * Add helper kvm_dirty_ring_check_request()                  (Marc)
  * Drop ifdef for kvm_cpu_dirty_log_size()                    (Marc)
v3:
  * Check KVM_REQ_RING_SOFT_RULL inside kvm_request_pending()  (Peter)
  * Move declaration of kvm_cpu_dirty_log_size()               (test-robot)
v2:
  * Introduce KVM_REQ_RING_SOFT_FULL                           (Marc)
  * Changelog improvement                                      (Marc)
  * Fix dirty_log_test without knowing host page size          (Drew)

Gavin Shan (8):
  KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to
    kvm_dirty_ring.h
  KVM: Add support for using dirty ring in conjunction with bitmap
  KVM: arm64: Enable ring-based dirty memory tracking
  KVM: selftests: Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP if possible
  KVM: selftests: Use host page size to map ring buffer in
    dirty_log_test
  KVM: selftests: Clear dirty ring states between two modes in
    dirty_log_test
  KVM: selftests: Automate choosing dirty ring size in dirty_log_test

 Documentation/virt/kvm/api.rst               | 19 ++++---
 arch/arm64/include/uapi/asm/kvm.h            |  1 +
 arch/arm64/kvm/Kconfig                       |  2 +
 arch/arm64/kvm/arm.c                         |  3 ++
 arch/x86/include/asm/kvm_host.h              |  2 -
 arch/x86/kvm/x86.c                           | 15 +++---
 include/linux/kvm_dirty_ring.h               | 15 +++---
 include/linux/kvm_host.h                     |  2 +
 include/uapi/linux/kvm.h                     |  1 +
 tools/testing/selftests/kvm/dirty_log_test.c | 53 ++++++++++++++------
 tools/testing/selftests/kvm/lib/kvm_util.c   |  5 +-
 virt/kvm/Kconfig                             |  8 +++
 virt/kvm/dirty_ring.c                        | 24 ++++++++-
 virt/kvm/kvm_main.c                          | 34 +++++++++----
 14 files changed, 132 insertions(+), 52 deletions(-)

-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  2022-10-11  6:14 ` Gavin Shan
@ 2022-10-11  6:14   ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon, shuah,
	andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

This adds KVM_REQ_RING_SOFT_FULL, which is raised when the dirty
ring of the specific VCPU becomes softly full in kvm_dirty_ring_push().
The VCPU is enforced to exit when the request is raised and its
dirty ring is softly full on its entrance.

The event is checked and handled in the newly introduced helper
kvm_dirty_ring_check_request(). With this, kvm_dirty_ring_soft_full()
becomes a private function.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/x86.c             | 15 ++++++---------
 include/linux/kvm_dirty_ring.h |  8 ++------
 include/linux/kvm_host.h       |  1 +
 virt/kvm/dirty_ring.c          | 19 ++++++++++++++++++-
 4 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b0c47b41c264..0dd0d32073e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10260,16 +10260,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
-	/* Forbid vmenter if vcpu dirty ring is soft-full */
-	if (unlikely(vcpu->kvm->dirty_ring_size &&
-		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
-		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
-		trace_kvm_dirty_ring_exit(vcpu);
-		r = 0;
-		goto out;
-	}
-
 	if (kvm_request_pending(vcpu)) {
+		/* Forbid vmenter if vcpu dirty ring is soft-full */
+		if (kvm_dirty_ring_check_request(vcpu)) {
+			r = 0;
+			goto out;
+		}
+
 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
 			r = -EIO;
 			goto out;
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 906f899813dc..66508afa0b40 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -64,11 +64,6 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 {
 }
 
-static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
-{
-	return true;
-}
-
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 u32 kvm_dirty_ring_get_rsvd_entries(void);
@@ -86,11 +81,12 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
  */
 void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset);
 
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
+
 /* for use in vm_operations_struct */
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset);
 
 void kvm_dirty_ring_free(struct kvm_dirty_ring *ring);
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring);
 
 #endif /* CONFIG_HAVE_KVM_DIRTY_RING */
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4519d3689e1..53fa3134fee0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -157,6 +157,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UNBLOCK           2
 #define KVM_REQ_UNHALT            3
+#define KVM_REQ_RING_SOFT_FULL    4
 #define KVM_REQUEST_ARCH_BASE     8
 
 /*
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index d6fabf238032..f68d75026bc0 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -26,7 +26,7 @@ static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
 }
 
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
+static bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
 {
 	return kvm_dirty_ring_used(ring) >= ring->soft_limit;
 }
@@ -149,6 +149,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
 
 void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 {
+	struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring);
 	struct kvm_dirty_gfn *entry;
 
 	/* It should never get full */
@@ -166,6 +167,22 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 	kvm_dirty_gfn_set_dirtied(entry);
 	ring->dirty_index++;
 	trace_kvm_dirty_ring_push(ring, slot, offset);
+
+	if (kvm_dirty_ring_soft_full(ring))
+		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
+}
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
+{
+	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
+		kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
+		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
+		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
+		trace_kvm_dirty_ring_exit(vcpu);
+		return true;
+	}
+
+	return false;
 }
 
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
@ 2022-10-11  6:14   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, kvm, maz, bgardon, andrew.jones, shan.gavin,
	catalin.marinas, dmatlack, pbonzini, zhenyzha, will, kvmarm

This adds KVM_REQ_RING_SOFT_FULL, which is raised when the dirty
ring of the specific VCPU becomes softly full in kvm_dirty_ring_push().
The VCPU is enforced to exit when the request is raised and its
dirty ring is softly full on its entrance.

The event is checked and handled in the newly introduced helper
kvm_dirty_ring_check_request(). With this, kvm_dirty_ring_soft_full()
becomes a private function.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/x86.c             | 15 ++++++---------
 include/linux/kvm_dirty_ring.h |  8 ++------
 include/linux/kvm_host.h       |  1 +
 virt/kvm/dirty_ring.c          | 19 ++++++++++++++++++-
 4 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b0c47b41c264..0dd0d32073e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10260,16 +10260,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
-	/* Forbid vmenter if vcpu dirty ring is soft-full */
-	if (unlikely(vcpu->kvm->dirty_ring_size &&
-		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
-		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
-		trace_kvm_dirty_ring_exit(vcpu);
-		r = 0;
-		goto out;
-	}
-
 	if (kvm_request_pending(vcpu)) {
+		/* Forbid vmenter if vcpu dirty ring is soft-full */
+		if (kvm_dirty_ring_check_request(vcpu)) {
+			r = 0;
+			goto out;
+		}
+
 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
 			r = -EIO;
 			goto out;
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 906f899813dc..66508afa0b40 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -64,11 +64,6 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 {
 }
 
-static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
-{
-	return true;
-}
-
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 u32 kvm_dirty_ring_get_rsvd_entries(void);
@@ -86,11 +81,12 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
  */
 void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset);
 
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
+
 /* for use in vm_operations_struct */
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset);
 
 void kvm_dirty_ring_free(struct kvm_dirty_ring *ring);
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring);
 
 #endif /* CONFIG_HAVE_KVM_DIRTY_RING */
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4519d3689e1..53fa3134fee0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -157,6 +157,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UNBLOCK           2
 #define KVM_REQ_UNHALT            3
+#define KVM_REQ_RING_SOFT_FULL    4
 #define KVM_REQUEST_ARCH_BASE     8
 
 /*
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index d6fabf238032..f68d75026bc0 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -26,7 +26,7 @@ static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
 }
 
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
+static bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
 {
 	return kvm_dirty_ring_used(ring) >= ring->soft_limit;
 }
@@ -149,6 +149,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
 
 void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 {
+	struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring);
 	struct kvm_dirty_gfn *entry;
 
 	/* It should never get full */
@@ -166,6 +167,22 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 	kvm_dirty_gfn_set_dirtied(entry);
 	ring->dirty_index++;
 	trace_kvm_dirty_ring_push(ring, slot, offset);
+
+	if (kvm_dirty_ring_soft_full(ring))
+		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
+}
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
+{
+	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
+		kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
+		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
+		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
+		trace_kvm_dirty_ring_exit(vcpu);
+		return true;
+	}
+
+	return false;
 }
 
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 2/8] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
  2022-10-11  6:14 ` Gavin Shan
@ 2022-10-11  6:14   ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon, shuah,
	andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

Not all architectures like ARM64 need to override the function. Move
its declaration to kvm_dirty_ring.h to avoid the following compiling
warning on ARM64 when the feature is enabled.

  arch/arm64/kvm/../../../virt/kvm/dirty_ring.c:14:12:        \
  warning: no previous prototype for 'kvm_cpu_dirty_log_size' \
  [-Wmissing-prototypes]                                      \
  int __weak kvm_cpu_dirty_log_size(void)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 --
 include/linux/kvm_dirty_ring.h  | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa381ab69a19..f11b6a9388b5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2083,8 +2083,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 #define GET_SMSTATE(type, buf, offset)		\
 	(*(type *)((buf) + (offset) - 0x7e00))
 
-int kvm_cpu_dirty_log_size(void);
-
 int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
 
 #define KVM_CLOCK_VALID_FLAGS						\
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 66508afa0b40..fe5982b46424 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -66,6 +66,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
+int kvm_cpu_dirty_log_size(void);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 2/8] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
@ 2022-10-11  6:14   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, kvm, maz, bgardon, andrew.jones, shan.gavin,
	catalin.marinas, dmatlack, pbonzini, zhenyzha, will, kvmarm

Not all architectures like ARM64 need to override the function. Move
its declaration to kvm_dirty_ring.h to avoid the following compiling
warning on ARM64 when the feature is enabled.

  arch/arm64/kvm/../../../virt/kvm/dirty_ring.c:14:12:        \
  warning: no previous prototype for 'kvm_cpu_dirty_log_size' \
  [-Wmissing-prototypes]                                      \
  int __weak kvm_cpu_dirty_log_size(void)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 --
 include/linux/kvm_dirty_ring.h  | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa381ab69a19..f11b6a9388b5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2083,8 +2083,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 #define GET_SMSTATE(type, buf, offset)		\
 	(*(type *)((buf) + (offset) - 0x7e00))
 
-int kvm_cpu_dirty_log_size(void);
-
 int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
 
 #define KVM_CLOCK_VALID_FLAGS						\
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 66508afa0b40..fe5982b46424 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -66,6 +66,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
+int kvm_cpu_dirty_log_size(void);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-11  6:14 ` Gavin Shan
@ 2022-10-11  6:14   ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon, shuah,
	andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

Some architectures (such as arm64) need to dirty memory outside of the
context of a vCPU. Of course, this simply doesn't fit with the UAPI of
KVM's per-vCPU dirty ring.

Introduce a new flavor of dirty ring that requires the use of both vCPU
dirty rings and a dirty bitmap. The expectation is that for non-vCPU
sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
the dirty bitmap. Userspace should scan the dirty bitmap before
migrating the VM to the target.

Use an additional capability to advertize this behavior and require
explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
not allow userspace to enable dirty ring if it hasn't also enabled the
ring && bitmap capability, as a VM is likely DOA without the pages
marked in the bitmap.

Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Peter Xu <peterx@redhat.com>
Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst | 17 ++++++++---------
 include/linux/kvm_dirty_ring.h |  6 ++++++
 include/linux/kvm_host.h       |  1 +
 include/uapi/linux/kvm.h       |  1 +
 virt/kvm/Kconfig               |  8 ++++++++
 virt/kvm/dirty_ring.c          |  5 +++++
 virt/kvm/kvm_main.c            | 34 +++++++++++++++++++++++++---------
 7 files changed, 54 insertions(+), 18 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 32427ea160df..09fa6c491c1b 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8019,8 +8019,8 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
 (0x40000001). Otherwise, a guest may use the paravirtual features
 regardless of what has actually been exposed through the CPUID leaf.
 
-8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
-----------------------------------------------------------
+8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_WITH_BITMAP}
+-------------------------------------------------------------
 
 :Architectures: x86
 :Parameters: args[0] - size of the dirty log ring
@@ -8104,13 +8104,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
 needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
 vmexit ensures that all dirty GFNs are flushed to the dirty rings.
 
-NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
-ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
-KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
-KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
-machine will switch to ring-buffer dirty page tracking and further
-KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
-
 NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
 should be exposed by weakly ordered architecture, in order to indicate
 the additional memory ordering requirements imposed on userspace when
@@ -8119,6 +8112,12 @@ Architecture with TSO-like ordering (such as x86) are allowed to
 expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 to userspace.
 
+NOTE: There is no running vcpu and available vcpu dirty ring when pages
+becomes dirty in some cases. One example is to save arm64's vgic/its
+tables during migration. The dirty bitmap is still used to track those
+dirty pages, indicated by KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The ditry
+bitmap is visited by KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls.
+
 8.30 KVM_CAP_XEN_HVM
 --------------------
 
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index fe5982b46424..23b2b466aa0f 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -28,6 +28,11 @@ struct kvm_dirty_ring {
 };
 
 #ifndef CONFIG_HAVE_KVM_DIRTY_RING
+static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
+{
+	return false;
+}
+
 /*
  * If CONFIG_HAVE_HVM_DIRTY_RING not defined, kvm_dirty_ring.o should
  * not be included as well, so define these nop functions for the arch.
@@ -66,6 +71,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
+bool kvm_dirty_ring_exclusive(struct kvm *kvm);
 int kvm_cpu_dirty_log_size(void);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 53fa3134fee0..a3fae111f25c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -780,6 +780,7 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
+	bool dirty_ring_with_bitmap;
 	bool vm_bugged;
 	bool vm_dead;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..c87b5882d7ae 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 800f9470e36b..228be1145cf3 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
        bool
        select HAVE_KVM_DIRTY_RING
 
+# Only architectures that need to dirty memory outside of a vCPU
+# context should select this, advertising to userspace the
+# requirement to use a dirty bitmap in addition to the vCPU dirty
+# ring.
+config HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	bool
+	depends on HAVE_KVM_DIRTY_RING
+
 config HAVE_KVM_EVENTFD
        bool
        select EVENTFD
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index f68d75026bc0..9cc60af291ef 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -11,6 +11,11 @@
 #include <trace/events/kvm.h>
 #include "kvm_mm.h"
 
+bool kvm_dirty_ring_exclusive(struct kvm *kvm)
+{
+	return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
+}
+
 int __weak kvm_cpu_dirty_log_size(void)
 {
 	return 0;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5b064dbadaf4..8915dcefcefd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
 			new->dirty_bitmap = NULL;
 		else if (old && old->dirty_bitmap)
 			new->dirty_bitmap = old->dirty_bitmap;
-		else if (!kvm->dirty_ring_size) {
+		else if (!kvm_dirty_ring_exclusive(kvm)) {
 			r = kvm_alloc_dirty_bitmap(new);
 			if (r)
 				return r;
@@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 	unsigned long n;
 	unsigned long any = 0;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (kvm_dirty_ring_exclusive(kvm))
 		return -ENXIO;
 
 	*memslot = NULL;
@@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (kvm_dirty_ring_exclusive(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (kvm_dirty_ring_exclusive(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -3305,15 +3305,20 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
 #ifdef CONFIG_HAVE_KVM_DIRTY_RING
-	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
 		return;
+
+#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	if (WARN_ON_ONCE(!vcpu))
+		return;
+#endif
 #endif
 
 	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
-		if (kvm->dirty_ring_size)
+		if (vcpu && kvm->dirty_ring_size)
 			kvm_dirty_ring_push(&vcpu->dirty_ring,
 					    slot, rel_gfn);
 		else
@@ -4485,6 +4490,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
 #else
 		return 0;
+#endif
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
 #endif
 	case KVM_CAP_BINARY_STATS_FD:
 	case KVM_CAP_SYSTEM_EVENT_DATA:
@@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
 {
 	int r;
 
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	if (!kvm->dirty_ring_with_bitmap)
+		return -EINVAL;
+#endif
+
 	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
 		return -EINVAL;
 
@@ -4588,6 +4601,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 	case KVM_CAP_DIRTY_LOG_RING:
 	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
+		kvm->dirty_ring_with_bitmap = true;
+		return 0;
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-11  6:14   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, kvm, maz, bgardon, andrew.jones, shan.gavin,
	catalin.marinas, dmatlack, pbonzini, zhenyzha, will, kvmarm

Some architectures (such as arm64) need to dirty memory outside of the
context of a vCPU. Of course, this simply doesn't fit with the UAPI of
KVM's per-vCPU dirty ring.

Introduce a new flavor of dirty ring that requires the use of both vCPU
dirty rings and a dirty bitmap. The expectation is that for non-vCPU
sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
the dirty bitmap. Userspace should scan the dirty bitmap before
migrating the VM to the target.

Use an additional capability to advertize this behavior and require
explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
not allow userspace to enable dirty ring if it hasn't also enabled the
ring && bitmap capability, as a VM is likely DOA without the pages
marked in the bitmap.

Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Peter Xu <peterx@redhat.com>
Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst | 17 ++++++++---------
 include/linux/kvm_dirty_ring.h |  6 ++++++
 include/linux/kvm_host.h       |  1 +
 include/uapi/linux/kvm.h       |  1 +
 virt/kvm/Kconfig               |  8 ++++++++
 virt/kvm/dirty_ring.c          |  5 +++++
 virt/kvm/kvm_main.c            | 34 +++++++++++++++++++++++++---------
 7 files changed, 54 insertions(+), 18 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 32427ea160df..09fa6c491c1b 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8019,8 +8019,8 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
 (0x40000001). Otherwise, a guest may use the paravirtual features
 regardless of what has actually been exposed through the CPUID leaf.
 
-8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
-----------------------------------------------------------
+8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_WITH_BITMAP}
+-------------------------------------------------------------
 
 :Architectures: x86
 :Parameters: args[0] - size of the dirty log ring
@@ -8104,13 +8104,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
 needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
 vmexit ensures that all dirty GFNs are flushed to the dirty rings.
 
-NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
-ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
-KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
-KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
-machine will switch to ring-buffer dirty page tracking and further
-KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
-
 NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
 should be exposed by weakly ordered architecture, in order to indicate
 the additional memory ordering requirements imposed on userspace when
@@ -8119,6 +8112,12 @@ Architecture with TSO-like ordering (such as x86) are allowed to
 expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 to userspace.
 
+NOTE: There is no running vcpu and available vcpu dirty ring when pages
+becomes dirty in some cases. One example is to save arm64's vgic/its
+tables during migration. The dirty bitmap is still used to track those
+dirty pages, indicated by KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The ditry
+bitmap is visited by KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls.
+
 8.30 KVM_CAP_XEN_HVM
 --------------------
 
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index fe5982b46424..23b2b466aa0f 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -28,6 +28,11 @@ struct kvm_dirty_ring {
 };
 
 #ifndef CONFIG_HAVE_KVM_DIRTY_RING
+static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
+{
+	return false;
+}
+
 /*
  * If CONFIG_HAVE_HVM_DIRTY_RING not defined, kvm_dirty_ring.o should
  * not be included as well, so define these nop functions for the arch.
@@ -66,6 +71,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
+bool kvm_dirty_ring_exclusive(struct kvm *kvm);
 int kvm_cpu_dirty_log_size(void);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 53fa3134fee0..a3fae111f25c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -780,6 +780,7 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
+	bool dirty_ring_with_bitmap;
 	bool vm_bugged;
 	bool vm_dead;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..c87b5882d7ae 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 800f9470e36b..228be1145cf3 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
        bool
        select HAVE_KVM_DIRTY_RING
 
+# Only architectures that need to dirty memory outside of a vCPU
+# context should select this, advertising to userspace the
+# requirement to use a dirty bitmap in addition to the vCPU dirty
+# ring.
+config HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	bool
+	depends on HAVE_KVM_DIRTY_RING
+
 config HAVE_KVM_EVENTFD
        bool
        select EVENTFD
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index f68d75026bc0..9cc60af291ef 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -11,6 +11,11 @@
 #include <trace/events/kvm.h>
 #include "kvm_mm.h"
 
+bool kvm_dirty_ring_exclusive(struct kvm *kvm)
+{
+	return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
+}
+
 int __weak kvm_cpu_dirty_log_size(void)
 {
 	return 0;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5b064dbadaf4..8915dcefcefd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
 			new->dirty_bitmap = NULL;
 		else if (old && old->dirty_bitmap)
 			new->dirty_bitmap = old->dirty_bitmap;
-		else if (!kvm->dirty_ring_size) {
+		else if (!kvm_dirty_ring_exclusive(kvm)) {
 			r = kvm_alloc_dirty_bitmap(new);
 			if (r)
 				return r;
@@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 	unsigned long n;
 	unsigned long any = 0;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (kvm_dirty_ring_exclusive(kvm))
 		return -ENXIO;
 
 	*memslot = NULL;
@@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (kvm_dirty_ring_exclusive(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (kvm_dirty_ring_exclusive(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -3305,15 +3305,20 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
 #ifdef CONFIG_HAVE_KVM_DIRTY_RING
-	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
 		return;
+
+#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	if (WARN_ON_ONCE(!vcpu))
+		return;
+#endif
 #endif
 
 	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
-		if (kvm->dirty_ring_size)
+		if (vcpu && kvm->dirty_ring_size)
 			kvm_dirty_ring_push(&vcpu->dirty_ring,
 					    slot, rel_gfn);
 		else
@@ -4485,6 +4490,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
 #else
 		return 0;
+#endif
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
 #endif
 	case KVM_CAP_BINARY_STATS_FD:
 	case KVM_CAP_SYSTEM_EVENT_DATA:
@@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
 {
 	int r;
 
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	if (!kvm->dirty_ring_with_bitmap)
+		return -EINVAL;
+#endif
+
 	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
 		return -EINVAL;
 
@@ -4588,6 +4601,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 	case KVM_CAP_DIRTY_LOG_RING:
 	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
+		kvm->dirty_ring_with_bitmap = true;
+		return 0;
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 4/8] KVM: arm64: Enable ring-based dirty memory tracking
  2022-10-11  6:14 ` Gavin Shan
@ 2022-10-11  6:14   ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon, shuah,
	andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

Enable ring-based dirty memory tracking on arm64 by selecting
CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst    | 2 +-
 arch/arm64/include/uapi/asm/kvm.h | 1 +
 arch/arm64/kvm/Kconfig            | 2 ++
 arch/arm64/kvm/arm.c              | 3 +++
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 09fa6c491c1b..4e82ce9e6f2d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8022,7 +8022,7 @@ regardless of what has actually been exposed through the CPUID leaf.
 8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_WITH_BITMAP}
 -------------------------------------------------------------
 
-:Architectures: x86
+:Architectures: x86, arm64
 :Parameters: args[0] - size of the dirty log ring
 
 KVM is capable of tracking dirty memory using ring buffers that are
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 316917b98707..a7a857f1784d 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 815cc118c675..066b053e9eb9 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -32,6 +32,8 @@ menuconfig KVM
 	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_IRQFD
+	select HAVE_KVM_DIRTY_RING_ACQ_REL
+	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
 	select HAVE_KVM_MSI
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQ_ROUTING
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 917086be5c6b..53c963a159bc 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -747,6 +747,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
 			return kvm_vcpu_suspend(vcpu);
+
+		if (kvm_dirty_ring_check_request(vcpu))
+			return 0;
 	}
 
 	return 1;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 4/8] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-10-11  6:14   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, kvm, maz, bgardon, andrew.jones, shan.gavin,
	catalin.marinas, dmatlack, pbonzini, zhenyzha, will, kvmarm

Enable ring-based dirty memory tracking on arm64 by selecting
CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst    | 2 +-
 arch/arm64/include/uapi/asm/kvm.h | 1 +
 arch/arm64/kvm/Kconfig            | 2 ++
 arch/arm64/kvm/arm.c              | 3 +++
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 09fa6c491c1b..4e82ce9e6f2d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8022,7 +8022,7 @@ regardless of what has actually been exposed through the CPUID leaf.
 8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_WITH_BITMAP}
 -------------------------------------------------------------
 
-:Architectures: x86
+:Architectures: x86, arm64
 :Parameters: args[0] - size of the dirty log ring
 
 KVM is capable of tracking dirty memory using ring buffers that are
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 316917b98707..a7a857f1784d 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 815cc118c675..066b053e9eb9 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -32,6 +32,8 @@ menuconfig KVM
 	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_IRQFD
+	select HAVE_KVM_DIRTY_RING_ACQ_REL
+	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
 	select HAVE_KVM_MSI
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQ_ROUTING
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 917086be5c6b..53c963a159bc 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -747,6 +747,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
 			return kvm_vcpu_suspend(vcpu);
+
+		if (kvm_dirty_ring_check_request(vcpu))
+			return 0;
 	}
 
 	return 1;
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 5/8] KVM: selftests: Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP if possible
  2022-10-11  6:14 ` Gavin Shan
@ 2022-10-11  6:14   ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon, shuah,
	andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP if it's supported. Otherwise,
we will fail to enable KVM_CAP_DIRTY_LOG_RING_ACQ_REL on aarch64.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 411a4c0bc81c..54740caea155 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -82,6 +82,9 @@ unsigned int kvm_check_cap(long cap)
 
 void vm_enable_dirty_ring(struct kvm_vm *vm, uint32_t ring_size)
 {
+	if (vm_check_cap(vm, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP))
+		vm_enable_cap(vm, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, 0);
+
 	if (vm_check_cap(vm, KVM_CAP_DIRTY_LOG_RING_ACQ_REL))
 		vm_enable_cap(vm, KVM_CAP_DIRTY_LOG_RING_ACQ_REL, ring_size);
 	else
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 5/8] KVM: selftests: Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP if possible
@ 2022-10-11  6:14   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, kvm, maz, bgardon, andrew.jones, shan.gavin,
	catalin.marinas, dmatlack, pbonzini, zhenyzha, will, kvmarm

Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP if it's supported. Otherwise,
we will fail to enable KVM_CAP_DIRTY_LOG_RING_ACQ_REL on aarch64.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 411a4c0bc81c..54740caea155 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -82,6 +82,9 @@ unsigned int kvm_check_cap(long cap)
 
 void vm_enable_dirty_ring(struct kvm_vm *vm, uint32_t ring_size)
 {
+	if (vm_check_cap(vm, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP))
+		vm_enable_cap(vm, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, 0);
+
 	if (vm_check_cap(vm, KVM_CAP_DIRTY_LOG_RING_ACQ_REL))
 		vm_enable_cap(vm, KVM_CAP_DIRTY_LOG_RING_ACQ_REL, ring_size);
 	else
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 6/8] KVM: selftests: Use host page size to map ring buffer in dirty_log_test
  2022-10-11  6:14 ` Gavin Shan
@ 2022-10-11  6:14   ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon, shuah,
	andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

In vcpu_map_dirty_ring(), the guest's page size is used to figure out
the offset in the virtual area. It works fine when we have same page
sizes on host and guest. However, it fails when the page sizes on host
and guest are different on arm64, like below error messages indicates.

  # ./dirty_log_test -M dirty-ring -m 7
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  ==== Test Assertion Failure ====
  lib/kvm_util.c:1477: addr == MAP_FAILED
  pid=9000 tid=9000 errno=0 - Success
  1  0x0000000000405f5b: vcpu_map_dirty_ring at kvm_util.c:1477
  2  0x0000000000402ebb: dirty_ring_collect_dirty_pages at dirty_log_test.c:349
  3  0x00000000004029b3: log_mode_collect_dirty_pages at dirty_log_test.c:478
  4  (inlined by) run_test at dirty_log_test.c:778
  5  (inlined by) run_test at dirty_log_test.c:691
  6  0x0000000000403a57: for_each_guest_mode at guest_modes.c:105
  7  0x0000000000401ccf: main at dirty_log_test.c:921
  8  0x0000ffffb06ec79b: ?? ??:0
  9  0x0000ffffb06ec86b: ?? ??:0
  10 0x0000000000401def: _start at ??:?
  Dirty ring mapped private

Fix the issue by using host's page size to map the ring buffer.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 54740caea155..4dc52b43b18d 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1470,7 +1470,7 @@ struct kvm_reg_list *vcpu_get_reg_list(struct kvm_vcpu *vcpu)
 
 void *vcpu_map_dirty_ring(struct kvm_vcpu *vcpu)
 {
-	uint32_t page_size = vcpu->vm->page_size;
+	uint32_t page_size = getpagesize();
 	uint32_t size = vcpu->vm->dirty_ring_size;
 
 	TEST_ASSERT(size > 0, "Should enable dirty ring first");
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 6/8] KVM: selftests: Use host page size to map ring buffer in dirty_log_test
@ 2022-10-11  6:14   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, kvm, maz, bgardon, andrew.jones, shan.gavin,
	catalin.marinas, dmatlack, pbonzini, zhenyzha, will, kvmarm

In vcpu_map_dirty_ring(), the guest's page size is used to figure out
the offset in the virtual area. It works fine when we have same page
sizes on host and guest. However, it fails when the page sizes on host
and guest are different on arm64, like below error messages indicates.

  # ./dirty_log_test -M dirty-ring -m 7
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  ==== Test Assertion Failure ====
  lib/kvm_util.c:1477: addr == MAP_FAILED
  pid=9000 tid=9000 errno=0 - Success
  1  0x0000000000405f5b: vcpu_map_dirty_ring at kvm_util.c:1477
  2  0x0000000000402ebb: dirty_ring_collect_dirty_pages at dirty_log_test.c:349
  3  0x00000000004029b3: log_mode_collect_dirty_pages at dirty_log_test.c:478
  4  (inlined by) run_test at dirty_log_test.c:778
  5  (inlined by) run_test at dirty_log_test.c:691
  6  0x0000000000403a57: for_each_guest_mode at guest_modes.c:105
  7  0x0000000000401ccf: main at dirty_log_test.c:921
  8  0x0000ffffb06ec79b: ?? ??:0
  9  0x0000ffffb06ec86b: ?? ??:0
  10 0x0000000000401def: _start at ??:?
  Dirty ring mapped private

Fix the issue by using host's page size to map the ring buffer.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 54740caea155..4dc52b43b18d 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1470,7 +1470,7 @@ struct kvm_reg_list *vcpu_get_reg_list(struct kvm_vcpu *vcpu)
 
 void *vcpu_map_dirty_ring(struct kvm_vcpu *vcpu)
 {
-	uint32_t page_size = vcpu->vm->page_size;
+	uint32_t page_size = getpagesize();
 	uint32_t size = vcpu->vm->dirty_ring_size;
 
 	TEST_ASSERT(size > 0, "Should enable dirty ring first");
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 7/8] KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
  2022-10-11  6:14 ` Gavin Shan
@ 2022-10-11  6:14   ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon, shuah,
	andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

There are two states, which need to be cleared before next mode
is executed. Otherwise, we will hit failure as the following messages
indicate.

- The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu
  thread. It's indicating if the vcpu exit due to full ring buffer.
  The value can be carried from previous mode (VM_MODE_P40V48_4K) to
  current one (VM_MODE_P40V48_64K) when VM_MODE_P40V48_16K isn't
  supported.

- The current ring buffer index needs to be reset before next mode
  (VM_MODE_P40V48_64K) is executed. Otherwise, the stale value is
  carried from previous mode (VM_MODE_P40V48_4K).

  # ./dirty_log_test -M dirty-ring
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbfffc000
    :
  Dirtied 995328 pages
  Total bits checked: dirty (1012434), clear (7114123), track_next (966700)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  vcpu continues now.
  Notifying vcpu to continue
  Iteration 1 collected 0 pages
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  ==== Test Assertion Failure ====
  dirty_log_test.c:369: cleared == count
  pid=10541 tid=10541 errno=22 - Invalid argument
     1	0x0000000000403087: dirty_ring_collect_dirty_pages at dirty_log_test.c:369
     2	0x0000000000402a0b: log_mode_collect_dirty_pages at dirty_log_test.c:492
     3	 (inlined by) run_test at dirty_log_test.c:795
     4	 (inlined by) run_test at dirty_log_test.c:705
     5	0x0000000000403a37: for_each_guest_mode at guest_modes.c:100
     6	0x0000000000401ccf: main at dirty_log_test.c:938
     7	0x0000ffff9ecd279b: ?? ??:0
     8	0x0000ffff9ecd286b: ?? ??:0
     9	0x0000000000401def: _start at ??:?
  Reset dirty pages (0) mismatch with collected (35566)

Fix the issues by clearing 'dirty_ring_vcpu_ring_full' and the ring
buffer index before next new mode is to be executed.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 27 ++++++++++++--------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index b5234d6efbe1..8758c10ec850 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -226,13 +226,15 @@ static void clear_log_create_vm_done(struct kvm_vm *vm)
 }
 
 static void dirty_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 }
 
 static void clear_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 	kvm_vm_clear_dirty_log(vcpu->vm, slot, bitmap, 0, num_pages);
@@ -329,10 +331,9 @@ static void dirty_ring_continue_vcpu(void)
 }
 
 static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					   void *bitmap, uint32_t num_pages)
+					   void *bitmap, uint32_t num_pages,
+					   uint32_t *ring_buf_idx)
 {
-	/* We only have one vcpu */
-	static uint32_t fetch_index = 0;
 	uint32_t count = 0, cleared;
 	bool continued_vcpu = false;
 
@@ -349,7 +350,8 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
 
 	/* Only have one vcpu */
 	count = dirty_ring_collect_one(vcpu_map_dirty_ring(vcpu),
-				       slot, bitmap, num_pages, &fetch_index);
+				       slot, bitmap, num_pages,
+				       ring_buf_idx);
 
 	cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
 
@@ -406,7 +408,8 @@ struct log_mode {
 	void (*create_vm_done)(struct kvm_vm *vm);
 	/* Hook to collect the dirty pages into the bitmap provided */
 	void (*collect_dirty_pages) (struct kvm_vcpu *vcpu, int slot,
-				     void *bitmap, uint32_t num_pages);
+				     void *bitmap, uint32_t num_pages,
+				     uint32_t *ring_buf_idx);
 	/* Hook to call when after each vcpu run */
 	void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
 	void (*before_vcpu_join) (void);
@@ -471,13 +474,14 @@ static void log_mode_create_vm_done(struct kvm_vm *vm)
 }
 
 static void log_mode_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					 void *bitmap, uint32_t num_pages)
+					 void *bitmap, uint32_t num_pages,
+					 uint32_t *ring_buf_idx)
 {
 	struct log_mode *mode = &log_modes[host_log_mode];
 
 	TEST_ASSERT(mode->collect_dirty_pages != NULL,
 		    "collect_dirty_pages() is required for any log mode!");
-	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages);
+	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages, ring_buf_idx);
 }
 
 static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
@@ -696,6 +700,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	struct kvm_vcpu *vcpu;
 	struct kvm_vm *vm;
 	unsigned long *bmap;
+	uint32_t ring_buf_idx = 0;
 
 	if (!log_mode_supported()) {
 		print_skip("Log mode '%s' not supported",
@@ -771,6 +776,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	host_dirty_count = 0;
 	host_clear_count = 0;
 	host_track_next_count = 0;
+	WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
 
 	pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
 
@@ -778,7 +784,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 		/* Give the vcpu thread some time to dirty some pages */
 		usleep(p->interval * 1000);
 		log_mode_collect_dirty_pages(vcpu, TEST_MEM_SLOT_INDEX,
-					     bmap, host_num_pages);
+					     bmap, host_num_pages,
+					     &ring_buf_idx);
 
 		/*
 		 * See vcpu_sync_stop_requested definition for details on why
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 7/8] KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
@ 2022-10-11  6:14   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, kvm, maz, bgardon, andrew.jones, shan.gavin,
	catalin.marinas, dmatlack, pbonzini, zhenyzha, will, kvmarm

There are two states, which need to be cleared before next mode
is executed. Otherwise, we will hit failure as the following messages
indicate.

- The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu
  thread. It's indicating if the vcpu exit due to full ring buffer.
  The value can be carried from previous mode (VM_MODE_P40V48_4K) to
  current one (VM_MODE_P40V48_64K) when VM_MODE_P40V48_16K isn't
  supported.

- The current ring buffer index needs to be reset before next mode
  (VM_MODE_P40V48_64K) is executed. Otherwise, the stale value is
  carried from previous mode (VM_MODE_P40V48_4K).

  # ./dirty_log_test -M dirty-ring
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbfffc000
    :
  Dirtied 995328 pages
  Total bits checked: dirty (1012434), clear (7114123), track_next (966700)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  vcpu continues now.
  Notifying vcpu to continue
  Iteration 1 collected 0 pages
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  ==== Test Assertion Failure ====
  dirty_log_test.c:369: cleared == count
  pid=10541 tid=10541 errno=22 - Invalid argument
     1	0x0000000000403087: dirty_ring_collect_dirty_pages at dirty_log_test.c:369
     2	0x0000000000402a0b: log_mode_collect_dirty_pages at dirty_log_test.c:492
     3	 (inlined by) run_test at dirty_log_test.c:795
     4	 (inlined by) run_test at dirty_log_test.c:705
     5	0x0000000000403a37: for_each_guest_mode at guest_modes.c:100
     6	0x0000000000401ccf: main at dirty_log_test.c:938
     7	0x0000ffff9ecd279b: ?? ??:0
     8	0x0000ffff9ecd286b: ?? ??:0
     9	0x0000000000401def: _start at ??:?
  Reset dirty pages (0) mismatch with collected (35566)

Fix the issues by clearing 'dirty_ring_vcpu_ring_full' and the ring
buffer index before next new mode is to be executed.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 27 ++++++++++++--------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index b5234d6efbe1..8758c10ec850 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -226,13 +226,15 @@ static void clear_log_create_vm_done(struct kvm_vm *vm)
 }
 
 static void dirty_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 }
 
 static void clear_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 	kvm_vm_clear_dirty_log(vcpu->vm, slot, bitmap, 0, num_pages);
@@ -329,10 +331,9 @@ static void dirty_ring_continue_vcpu(void)
 }
 
 static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					   void *bitmap, uint32_t num_pages)
+					   void *bitmap, uint32_t num_pages,
+					   uint32_t *ring_buf_idx)
 {
-	/* We only have one vcpu */
-	static uint32_t fetch_index = 0;
 	uint32_t count = 0, cleared;
 	bool continued_vcpu = false;
 
@@ -349,7 +350,8 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
 
 	/* Only have one vcpu */
 	count = dirty_ring_collect_one(vcpu_map_dirty_ring(vcpu),
-				       slot, bitmap, num_pages, &fetch_index);
+				       slot, bitmap, num_pages,
+				       ring_buf_idx);
 
 	cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
 
@@ -406,7 +408,8 @@ struct log_mode {
 	void (*create_vm_done)(struct kvm_vm *vm);
 	/* Hook to collect the dirty pages into the bitmap provided */
 	void (*collect_dirty_pages) (struct kvm_vcpu *vcpu, int slot,
-				     void *bitmap, uint32_t num_pages);
+				     void *bitmap, uint32_t num_pages,
+				     uint32_t *ring_buf_idx);
 	/* Hook to call when after each vcpu run */
 	void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
 	void (*before_vcpu_join) (void);
@@ -471,13 +474,14 @@ static void log_mode_create_vm_done(struct kvm_vm *vm)
 }
 
 static void log_mode_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					 void *bitmap, uint32_t num_pages)
+					 void *bitmap, uint32_t num_pages,
+					 uint32_t *ring_buf_idx)
 {
 	struct log_mode *mode = &log_modes[host_log_mode];
 
 	TEST_ASSERT(mode->collect_dirty_pages != NULL,
 		    "collect_dirty_pages() is required for any log mode!");
-	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages);
+	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages, ring_buf_idx);
 }
 
 static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
@@ -696,6 +700,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	struct kvm_vcpu *vcpu;
 	struct kvm_vm *vm;
 	unsigned long *bmap;
+	uint32_t ring_buf_idx = 0;
 
 	if (!log_mode_supported()) {
 		print_skip("Log mode '%s' not supported",
@@ -771,6 +776,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	host_dirty_count = 0;
 	host_clear_count = 0;
 	host_track_next_count = 0;
+	WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
 
 	pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
 
@@ -778,7 +784,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 		/* Give the vcpu thread some time to dirty some pages */
 		usleep(p->interval * 1000);
 		log_mode_collect_dirty_pages(vcpu, TEST_MEM_SLOT_INDEX,
-					     bmap, host_num_pages);
+					     bmap, host_num_pages,
+					     &ring_buf_idx);
 
 		/*
 		 * See vcpu_sync_stop_requested definition for details on why
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 8/8] KVM: selftests: Automate choosing dirty ring size in dirty_log_test
  2022-10-11  6:14 ` Gavin Shan
@ 2022-10-11  6:14   ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon, shuah,
	andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

In the dirty ring case, we rely on vcpu exit due to full dirty ring
state. On ARM64 system, there are 4096 host pages when the host
page size is 64KB. In this case, the vcpu never exits due to the
full dirty ring state. The similar case is 4KB page size on host
and 64KB page size on guest. The vcpu corrupts same set of host
pages, but the dirty page information isn't collected in the main
thread. This leads to infinite loop as the following log shows.

  # ./dirty_log_test -M dirty-ring -c 65536 -m 5
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbffe0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  Iteration 1 collected 576 pages
  <No more output afterwards>

Fix the issue by automatically choosing the best dirty ring size,
to ensure vcpu exit due to full dirty ring state. The option '-c'
becomes a hint to the dirty ring count, instead of the value of it.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 26 +++++++++++++++++---
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 8758c10ec850..a87e5f78ebf1 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -24,6 +24,9 @@
 #include "guest_modes.h"
 #include "processor.h"
 
+#define DIRTY_MEM_BITS 30 /* 1G */
+#define PAGE_SHIFT_4K  12
+
 /* The memory slot index to track dirty pages */
 #define TEST_MEM_SLOT_INDEX		1
 
@@ -273,6 +276,24 @@ static bool dirty_ring_supported(void)
 
 static void dirty_ring_create_vm_done(struct kvm_vm *vm)
 {
+	uint64_t pages;
+	uint32_t limit;
+
+	/*
+	 * We rely on vcpu exit due to full dirty ring state. Adjust
+	 * the ring buffer size to ensure we're able to reach the
+	 * full dirty ring state.
+	 */
+	pages = (1ul << (DIRTY_MEM_BITS - vm->page_shift)) + 3;
+	pages = vm_adjust_num_guest_pages(vm->mode, pages);
+	if (vm->page_size < getpagesize())
+		pages = vm_num_host_pages(vm->mode, pages);
+
+	limit = 1 << (31 - __builtin_clz(pages));
+	test_dirty_ring_count = 1 << (31 - __builtin_clz(test_dirty_ring_count));
+	test_dirty_ring_count = min(limit, test_dirty_ring_count);
+	pr_info("dirty ring count: 0x%x\n", test_dirty_ring_count);
+
 	/*
 	 * Switch to dirty ring mode after VM creation but before any
 	 * of the vcpu creation.
@@ -685,9 +706,6 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, struct kvm_vcpu **vcpu,
 	return vm;
 }
 
-#define DIRTY_MEM_BITS 30 /* 1G */
-#define PAGE_SHIFT_4K  12
-
 struct test_params {
 	unsigned long iterations;
 	unsigned long interval;
@@ -830,7 +848,7 @@ static void help(char *name)
 	printf("usage: %s [-h] [-i iterations] [-I interval] "
 	       "[-p offset] [-m mode]\n", name);
 	puts("");
-	printf(" -c: specify dirty ring size, in number of entries\n");
+	printf(" -c: hint to dirty ring size, in number of entries\n");
 	printf("     (only useful for dirty-ring test; default: %"PRIu32")\n",
 	       TEST_DIRTY_RING_COUNT);
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v6 8/8] KVM: selftests: Automate choosing dirty ring size in dirty_log_test
@ 2022-10-11  6:14   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:14 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, kvm, maz, bgardon, andrew.jones, shan.gavin,
	catalin.marinas, dmatlack, pbonzini, zhenyzha, will, kvmarm

In the dirty ring case, we rely on vcpu exit due to full dirty ring
state. On ARM64 system, there are 4096 host pages when the host
page size is 64KB. In this case, the vcpu never exits due to the
full dirty ring state. The similar case is 4KB page size on host
and 64KB page size on guest. The vcpu corrupts same set of host
pages, but the dirty page information isn't collected in the main
thread. This leads to infinite loop as the following log shows.

  # ./dirty_log_test -M dirty-ring -c 65536 -m 5
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbffe0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  Iteration 1 collected 576 pages
  <No more output afterwards>

Fix the issue by automatically choosing the best dirty ring size,
to ensure vcpu exit due to full dirty ring state. The option '-c'
becomes a hint to the dirty ring count, instead of the value of it.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 26 +++++++++++++++++---
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 8758c10ec850..a87e5f78ebf1 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -24,6 +24,9 @@
 #include "guest_modes.h"
 #include "processor.h"
 
+#define DIRTY_MEM_BITS 30 /* 1G */
+#define PAGE_SHIFT_4K  12
+
 /* The memory slot index to track dirty pages */
 #define TEST_MEM_SLOT_INDEX		1
 
@@ -273,6 +276,24 @@ static bool dirty_ring_supported(void)
 
 static void dirty_ring_create_vm_done(struct kvm_vm *vm)
 {
+	uint64_t pages;
+	uint32_t limit;
+
+	/*
+	 * We rely on vcpu exit due to full dirty ring state. Adjust
+	 * the ring buffer size to ensure we're able to reach the
+	 * full dirty ring state.
+	 */
+	pages = (1ul << (DIRTY_MEM_BITS - vm->page_shift)) + 3;
+	pages = vm_adjust_num_guest_pages(vm->mode, pages);
+	if (vm->page_size < getpagesize())
+		pages = vm_num_host_pages(vm->mode, pages);
+
+	limit = 1 << (31 - __builtin_clz(pages));
+	test_dirty_ring_count = 1 << (31 - __builtin_clz(test_dirty_ring_count));
+	test_dirty_ring_count = min(limit, test_dirty_ring_count);
+	pr_info("dirty ring count: 0x%x\n", test_dirty_ring_count);
+
 	/*
 	 * Switch to dirty ring mode after VM creation but before any
 	 * of the vcpu creation.
@@ -685,9 +706,6 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, struct kvm_vcpu **vcpu,
 	return vm;
 }
 
-#define DIRTY_MEM_BITS 30 /* 1G */
-#define PAGE_SHIFT_4K  12
-
 struct test_params {
 	unsigned long iterations;
 	unsigned long interval;
@@ -830,7 +848,7 @@ static void help(char *name)
 	printf("usage: %s [-h] [-i iterations] [-I interval] "
 	       "[-p offset] [-m mode]\n", name);
 	puts("");
-	printf(" -c: specify dirty ring size, in number of entries\n");
+	printf(" -c: hint to dirty ring size, in number of entries\n");
 	printf("     (only useful for dirty-ring test; default: %"PRIu32")\n",
 	       TEST_DIRTY_RING_COUNT);
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 0/8] KVM: arm64: Enable ring-based dirty memory tracking
  2022-10-11  6:14 ` Gavin Shan
@ 2022-10-11  6:23   ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:23 UTC (permalink / raw)
  To: kvmarm
  Cc: Catalin Marinas, kvm, maz, andrew.jones, will, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, shuah, kvmarm

On 10/11/22 2:14 PM, Gavin Shan wrote:
> This series enables the ring-based dirty memory tracking for ARM64.
> The feature has been available and enabled on x86 for a while. It
> is beneficial when the number of dirty pages is small in a checkpointing
> system or live migration scenario. More details can be found from
> fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").
> 
> This series is applied on top of Marc's v2 series [0], fixing dirty-ring
> ordering issue. This series is going to land on v6.1.rc0 pretty soon.
> 
> [0] https://lore.kernel.org/kvmarm/20220926145120.27974-1-maz@kernel.org
> 
> v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
> v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
> v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
> v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
> v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com
> 
> Testing
> =======
> (1) kvm/selftests/dirty_log_test
> (2) Live migration by QEMU
> 
> Changelog
> =========
> v6:
>    * Add CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP, for arm64
>      to advertise KVM_CAP_DIRTY_RING_WITH_BITMAP in
>      PATCH[v6 3/8]                                              (Oliver/Peter)
>    * Add helper kvm_dirty_ring_exclusive() to check if
>      traditional bitmap-based dirty log tracking is
>      exclusive to dirty-ring in PATCH[v6 3/8]                   (Peter)
>    * Enable KVM_CAP_DIRTY_RING_WITH_BITMAP in PATCH[v6 5/8]     (Gavin)
> v5:
>    * Drop empty stub kvm_dirty_ring_check_request()             (Marc/Peter)
>    * Add PATCH[v5 3/7] to allow using bitmap, indicated by
>      KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP                        (Marc/Peter)
> v4:
>    * Commit log improvement                                     (Marc)
>    * Add helper kvm_dirty_ring_check_request()                  (Marc)
>    * Drop ifdef for kvm_cpu_dirty_log_size()                    (Marc)
> v3:
>    * Check KVM_REQ_RING_SOFT_RULL inside kvm_request_pending()  (Peter)
>    * Move declaration of kvm_cpu_dirty_log_size()               (test-robot)
> v2:
>    * Introduce KVM_REQ_RING_SOFT_FULL                           (Marc)
>    * Changelog improvement                                      (Marc)
>    * Fix dirty_log_test without knowing host page size          (Drew)
> 
> Gavin Shan (8):
>    KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
>    KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to
>      kvm_dirty_ring.h
>    KVM: Add support for using dirty ring in conjunction with bitmap
>    KVM: arm64: Enable ring-based dirty memory tracking
>    KVM: selftests: Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP if possible
>    KVM: selftests: Use host page size to map ring buffer in
>      dirty_log_test
>    KVM: selftests: Clear dirty ring states between two modes in
>      dirty_log_test
>    KVM: selftests: Automate choosing dirty ring size in dirty_log_test
> 
>   Documentation/virt/kvm/api.rst               | 19 ++++---
>   arch/arm64/include/uapi/asm/kvm.h            |  1 +
>   arch/arm64/kvm/Kconfig                       |  2 +
>   arch/arm64/kvm/arm.c                         |  3 ++
>   arch/x86/include/asm/kvm_host.h              |  2 -
>   arch/x86/kvm/x86.c                           | 15 +++---
>   include/linux/kvm_dirty_ring.h               | 15 +++---
>   include/linux/kvm_host.h                     |  2 +
>   include/uapi/linux/kvm.h                     |  1 +
>   tools/testing/selftests/kvm/dirty_log_test.c | 53 ++++++++++++++------
>   tools/testing/selftests/kvm/lib/kvm_util.c   |  5 +-
>   virt/kvm/Kconfig                             |  8 +++
>   virt/kvm/dirty_ring.c                        | 24 ++++++++-
>   virt/kvm/kvm_main.c                          | 34 +++++++++----
>   14 files changed, 132 insertions(+), 52 deletions(-)
> 

It seems Oliver and Sean were missed in the loop, even though I explicitly
copied them by git-send-email. Amend to include them.

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 0/8] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-10-11  6:23   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-11  6:23 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, kvm, maz, bgardon, andrew.jones, shan.gavin,
	catalin.marinas, dmatlack, pbonzini, zhenyzha, will, kvmarm,
	oliver upton, Sean Christopherson, Catalin Marinas

On 10/11/22 2:14 PM, Gavin Shan wrote:
> This series enables the ring-based dirty memory tracking for ARM64.
> The feature has been available and enabled on x86 for a while. It
> is beneficial when the number of dirty pages is small in a checkpointing
> system or live migration scenario. More details can be found from
> fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").
> 
> This series is applied on top of Marc's v2 series [0], fixing dirty-ring
> ordering issue. This series is going to land on v6.1.rc0 pretty soon.
> 
> [0] https://lore.kernel.org/kvmarm/20220926145120.27974-1-maz@kernel.org
> 
> v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
> v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
> v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
> v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
> v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com
> 
> Testing
> =======
> (1) kvm/selftests/dirty_log_test
> (2) Live migration by QEMU
> 
> Changelog
> =========
> v6:
>    * Add CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP, for arm64
>      to advertise KVM_CAP_DIRTY_RING_WITH_BITMAP in
>      PATCH[v6 3/8]                                              (Oliver/Peter)
>    * Add helper kvm_dirty_ring_exclusive() to check if
>      traditional bitmap-based dirty log tracking is
>      exclusive to dirty-ring in PATCH[v6 3/8]                   (Peter)
>    * Enable KVM_CAP_DIRTY_RING_WITH_BITMAP in PATCH[v6 5/8]     (Gavin)
> v5:
>    * Drop empty stub kvm_dirty_ring_check_request()             (Marc/Peter)
>    * Add PATCH[v5 3/7] to allow using bitmap, indicated by
>      KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP                        (Marc/Peter)
> v4:
>    * Commit log improvement                                     (Marc)
>    * Add helper kvm_dirty_ring_check_request()                  (Marc)
>    * Drop ifdef for kvm_cpu_dirty_log_size()                    (Marc)
> v3:
>    * Check KVM_REQ_RING_SOFT_RULL inside kvm_request_pending()  (Peter)
>    * Move declaration of kvm_cpu_dirty_log_size()               (test-robot)
> v2:
>    * Introduce KVM_REQ_RING_SOFT_FULL                           (Marc)
>    * Changelog improvement                                      (Marc)
>    * Fix dirty_log_test without knowing host page size          (Drew)
> 
> Gavin Shan (8):
>    KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
>    KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to
>      kvm_dirty_ring.h
>    KVM: Add support for using dirty ring in conjunction with bitmap
>    KVM: arm64: Enable ring-based dirty memory tracking
>    KVM: selftests: Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP if possible
>    KVM: selftests: Use host page size to map ring buffer in
>      dirty_log_test
>    KVM: selftests: Clear dirty ring states between two modes in
>      dirty_log_test
>    KVM: selftests: Automate choosing dirty ring size in dirty_log_test
> 
>   Documentation/virt/kvm/api.rst               | 19 ++++---
>   arch/arm64/include/uapi/asm/kvm.h            |  1 +
>   arch/arm64/kvm/Kconfig                       |  2 +
>   arch/arm64/kvm/arm.c                         |  3 ++
>   arch/x86/include/asm/kvm_host.h              |  2 -
>   arch/x86/kvm/x86.c                           | 15 +++---
>   include/linux/kvm_dirty_ring.h               | 15 +++---
>   include/linux/kvm_host.h                     |  2 +
>   include/uapi/linux/kvm.h                     |  1 +
>   tools/testing/selftests/kvm/dirty_log_test.c | 53 ++++++++++++++------
>   tools/testing/selftests/kvm/lib/kvm_util.c   |  5 +-
>   virt/kvm/Kconfig                             |  8 +++
>   virt/kvm/dirty_ring.c                        | 24 ++++++++-
>   virt/kvm/kvm_main.c                          | 34 +++++++++----
>   14 files changed, 132 insertions(+), 52 deletions(-)
> 

It seems Oliver and Sean were missed in the loop, even though I explicitly
copied them by git-send-email. Amend to include them.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-11  6:14   ` Gavin Shan
@ 2022-10-18 16:07     ` Peter Xu
  -1 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2022-10-18 16:07 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, maz, will, catalin.marinas, bgardon, shuah,
	andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

On Tue, Oct 11, 2022 at 02:14:42PM +0800, Gavin Shan wrote:
> Some architectures (such as arm64) need to dirty memory outside of the
> context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> KVM's per-vCPU dirty ring.
> 
> Introduce a new flavor of dirty ring that requires the use of both vCPU
> dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> the dirty bitmap. Userspace should scan the dirty bitmap before
> migrating the VM to the target.
> 
> Use an additional capability to advertize this behavior and require
> explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> not allow userspace to enable dirty ring if it hasn't also enabled the
> ring && bitmap capability, as a VM is likely DOA without the pages
> marked in the bitmap.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  Documentation/virt/kvm/api.rst | 17 ++++++++---------
>  include/linux/kvm_dirty_ring.h |  6 ++++++
>  include/linux/kvm_host.h       |  1 +
>  include/uapi/linux/kvm.h       |  1 +
>  virt/kvm/Kconfig               |  8 ++++++++
>  virt/kvm/dirty_ring.c          |  5 +++++
>  virt/kvm/kvm_main.c            | 34 +++++++++++++++++++++++++---------
>  7 files changed, 54 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 32427ea160df..09fa6c491c1b 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8019,8 +8019,8 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
>  (0x40000001). Otherwise, a guest may use the paravirtual features
>  regardless of what has actually been exposed through the CPUID leaf.
>  
> -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> -----------------------------------------------------------
> +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_WITH_BITMAP}

Shall we open a new section for RING_WITH_BITMAP?  Otherwise here it still
looks like these are three options for the rings.

Perhaps RING_WITH_BITMAP doesn't worth a section at all, so we can avoid
mentioning it here to avoid confusing.

> +-------------------------------------------------------------
>  
>  :Architectures: x86
>  :Parameters: args[0] - size of the dirty log ring
> @@ -8104,13 +8104,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
>  needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
>  vmexit ensures that all dirty GFNs are flushed to the dirty rings.
>  
> -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> -machine will switch to ring-buffer dirty page tracking and further
> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> -
>  NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
>  should be exposed by weakly ordered architecture, in order to indicate
>  the additional memory ordering requirements imposed on userspace when
> @@ -8119,6 +8112,12 @@ Architecture with TSO-like ordering (such as x86) are allowed to
>  expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>  to userspace.
>  
> +NOTE: There is no running vcpu and available vcpu dirty ring when pages

IMHO it'll be great to start with something like below to describe the
userspace's responsibility to proactively detect the WITH_BITMAP cap:

  Before using the dirty rings, the userspace needs to detect the cap of
  KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
  need to be backed by per-slot bitmaps.

  When KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP returns 1, it means the arch can
  dirty guest pages without vcpu/ring context, so that some of the dirty
  information will still be maintained in the bitmap structure.

  Note that the bitmap here is only a backup of the ring structure, and it
  doesn't need to be collected until the final switch-over of migration
  process.  Normally the bitmap should only contain a very small amount of
  dirty pages only, which needs to be transferred during VM downtime.

  To collect dirty bits in the backup bitmap, the userspace can use the
  same KVM_GET_DIRTY_LOG ioctl.  Since it's always the last phase of
  migration that needs the fetching of dirty bitmap, KVM_CLEAR_DIRTY_LOG
  ioctl should not be needed in this case and its behavior undefined.

That's how I understand this new cap, but let me know if you think any of
above is inproper.

> +becomes dirty in some cases. One example is to save arm64's vgic/its
> +tables during migration.

Nit: it'll be great to mention the exact arm ioctl here just in case anyone
would like to further reference the code.

> The dirty bitmap is still used to track those
> +dirty pages, indicated by KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The ditry
> +bitmap is visited by KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls.
> +
>  8.30 KVM_CAP_XEN_HVM
>  --------------------
>  
> diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
> index fe5982b46424..23b2b466aa0f 100644
> --- a/include/linux/kvm_dirty_ring.h
> +++ b/include/linux/kvm_dirty_ring.h
> @@ -28,6 +28,11 @@ struct kvm_dirty_ring {
>  };
>  
>  #ifndef CONFIG_HAVE_KVM_DIRTY_RING
> +static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
> +{
> +	return false;
> +}
> +
>  /*
>   * If CONFIG_HAVE_HVM_DIRTY_RING not defined, kvm_dirty_ring.o should
>   * not be included as well, so define these nop functions for the arch.
> @@ -66,6 +71,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
>  
>  #else /* CONFIG_HAVE_KVM_DIRTY_RING */
>  
> +bool kvm_dirty_ring_exclusive(struct kvm *kvm);
>  int kvm_cpu_dirty_log_size(void);
>  u32 kvm_dirty_ring_get_rsvd_entries(void);
>  int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 53fa3134fee0..a3fae111f25c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -780,6 +780,7 @@ struct kvm {
>  	pid_t userspace_pid;
>  	unsigned int max_halt_poll_ns;
>  	u32 dirty_ring_size;
> +	bool dirty_ring_with_bitmap;
>  	bool vm_bugged;
>  	bool vm_dead;
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 0d5d4419139a..c87b5882d7ae 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_S390_ZPCI_OP 221
>  #define KVM_CAP_S390_CPU_TOPOLOGY 222
>  #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
> +#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 800f9470e36b..228be1145cf3 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
>         bool
>         select HAVE_KVM_DIRTY_RING
>  
> +# Only architectures that need to dirty memory outside of a vCPU
> +# context should select this, advertising to userspace the
> +# requirement to use a dirty bitmap in addition to the vCPU dirty
> +# ring.
> +config HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	bool
> +	depends on HAVE_KVM_DIRTY_RING
> +
>  config HAVE_KVM_EVENTFD
>         bool
>         select EVENTFD
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index f68d75026bc0..9cc60af291ef 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -11,6 +11,11 @@
>  #include <trace/events/kvm.h>
>  #include "kvm_mm.h"
>  
> +bool kvm_dirty_ring_exclusive(struct kvm *kvm)
> +{
> +	return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
> +}
> +
>  int __weak kvm_cpu_dirty_log_size(void)
>  {
>  	return 0;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 5b064dbadaf4..8915dcefcefd 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
>  			new->dirty_bitmap = NULL;
>  		else if (old && old->dirty_bitmap)
>  			new->dirty_bitmap = old->dirty_bitmap;
> -		else if (!kvm->dirty_ring_size) {
> +		else if (!kvm_dirty_ring_exclusive(kvm)) {
>  			r = kvm_alloc_dirty_bitmap(new);
>  			if (r)
>  				return r;
> @@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>  	unsigned long n;
>  	unsigned long any = 0;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
> +	if (kvm_dirty_ring_exclusive(kvm))
>  		return -ENXIO;
>  
>  	*memslot = NULL;
> @@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>  	unsigned long *dirty_bitmap_buffer;
>  	bool flush;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
> +	if (kvm_dirty_ring_exclusive(kvm))
>  		return -ENXIO;
>  
>  	as_id = log->slot >> 16;
> @@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>  	unsigned long *dirty_bitmap_buffer;
>  	bool flush;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
> +	if (kvm_dirty_ring_exclusive(kvm))
>  		return -ENXIO;
>  
>  	as_id = log->slot >> 16;
> @@ -3305,15 +3305,20 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>  	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  
>  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>  		return;
> +
> +#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	if (WARN_ON_ONCE(!vcpu))
> +		return;
> +#endif
>  #endif
>  
>  	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
>  		unsigned long rel_gfn = gfn - memslot->base_gfn;
>  		u32 slot = (memslot->as_id << 16) | memslot->id;
>  
> -		if (kvm->dirty_ring_size)
> +		if (vcpu && kvm->dirty_ring_size)
>  			kvm_dirty_ring_push(&vcpu->dirty_ring,
>  					    slot, rel_gfn);
>  		else
> @@ -4485,6 +4490,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
>  #else
>  		return 0;
> +#endif
> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>  #endif
>  	case KVM_CAP_BINARY_STATS_FD:
>  	case KVM_CAP_SYSTEM_EVENT_DATA:
> @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
>  {
>  	int r;
>  
> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	if (!kvm->dirty_ring_with_bitmap)
> +		return -EINVAL;
> +#endif
> +
>  	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
>  		return -EINVAL;
>  
> @@ -4588,6 +4601,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  	case KVM_CAP_DIRTY_LOG_RING:
>  	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
>  		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> +		kvm->dirty_ring_with_bitmap = true;

IIUC what Oliver wanted to suggest is we can avoid enabling of this cap,
then we don't need dirty_ring_with_bitmap field but instead we can check
against CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP when needed.

I think that'll make sense, because without the bitmap the ring won't work
with arm64, so not valid to not enable it at all.  But good to double check
with Oliver too.

The rest looks good to me, thanks,

> +		return 0;
>  	default:
>  		return kvm_vm_ioctl_enable_cap(kvm, cap);
>  	}
> -- 
> 2.23.0
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-18 16:07     ` Peter Xu
  0 siblings, 0 replies; 86+ messages in thread
From: Peter Xu @ 2022-10-18 16:07 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Tue, Oct 11, 2022 at 02:14:42PM +0800, Gavin Shan wrote:
> Some architectures (such as arm64) need to dirty memory outside of the
> context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> KVM's per-vCPU dirty ring.
> 
> Introduce a new flavor of dirty ring that requires the use of both vCPU
> dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> the dirty bitmap. Userspace should scan the dirty bitmap before
> migrating the VM to the target.
> 
> Use an additional capability to advertize this behavior and require
> explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> not allow userspace to enable dirty ring if it hasn't also enabled the
> ring && bitmap capability, as a VM is likely DOA without the pages
> marked in the bitmap.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  Documentation/virt/kvm/api.rst | 17 ++++++++---------
>  include/linux/kvm_dirty_ring.h |  6 ++++++
>  include/linux/kvm_host.h       |  1 +
>  include/uapi/linux/kvm.h       |  1 +
>  virt/kvm/Kconfig               |  8 ++++++++
>  virt/kvm/dirty_ring.c          |  5 +++++
>  virt/kvm/kvm_main.c            | 34 +++++++++++++++++++++++++---------
>  7 files changed, 54 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 32427ea160df..09fa6c491c1b 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8019,8 +8019,8 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
>  (0x40000001). Otherwise, a guest may use the paravirtual features
>  regardless of what has actually been exposed through the CPUID leaf.
>  
> -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> -----------------------------------------------------------
> +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_WITH_BITMAP}

Shall we open a new section for RING_WITH_BITMAP?  Otherwise here it still
looks like these are three options for the rings.

Perhaps RING_WITH_BITMAP doesn't worth a section at all, so we can avoid
mentioning it here to avoid confusing.

> +-------------------------------------------------------------
>  
>  :Architectures: x86
>  :Parameters: args[0] - size of the dirty log ring
> @@ -8104,13 +8104,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
>  needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
>  vmexit ensures that all dirty GFNs are flushed to the dirty rings.
>  
> -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> -machine will switch to ring-buffer dirty page tracking and further
> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> -
>  NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
>  should be exposed by weakly ordered architecture, in order to indicate
>  the additional memory ordering requirements imposed on userspace when
> @@ -8119,6 +8112,12 @@ Architecture with TSO-like ordering (such as x86) are allowed to
>  expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>  to userspace.
>  
> +NOTE: There is no running vcpu and available vcpu dirty ring when pages

IMHO it'll be great to start with something like below to describe the
userspace's responsibility to proactively detect the WITH_BITMAP cap:

  Before using the dirty rings, the userspace needs to detect the cap of
  KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
  need to be backed by per-slot bitmaps.

  When KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP returns 1, it means the arch can
  dirty guest pages without vcpu/ring context, so that some of the dirty
  information will still be maintained in the bitmap structure.

  Note that the bitmap here is only a backup of the ring structure, and it
  doesn't need to be collected until the final switch-over of migration
  process.  Normally the bitmap should only contain a very small amount of
  dirty pages only, which needs to be transferred during VM downtime.

  To collect dirty bits in the backup bitmap, the userspace can use the
  same KVM_GET_DIRTY_LOG ioctl.  Since it's always the last phase of
  migration that needs the fetching of dirty bitmap, KVM_CLEAR_DIRTY_LOG
  ioctl should not be needed in this case and its behavior undefined.

That's how I understand this new cap, but let me know if you think any of
above is inproper.

> +becomes dirty in some cases. One example is to save arm64's vgic/its
> +tables during migration.

Nit: it'll be great to mention the exact arm ioctl here just in case anyone
would like to further reference the code.

> The dirty bitmap is still used to track those
> +dirty pages, indicated by KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The ditry
> +bitmap is visited by KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls.
> +
>  8.30 KVM_CAP_XEN_HVM
>  --------------------
>  
> diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
> index fe5982b46424..23b2b466aa0f 100644
> --- a/include/linux/kvm_dirty_ring.h
> +++ b/include/linux/kvm_dirty_ring.h
> @@ -28,6 +28,11 @@ struct kvm_dirty_ring {
>  };
>  
>  #ifndef CONFIG_HAVE_KVM_DIRTY_RING
> +static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
> +{
> +	return false;
> +}
> +
>  /*
>   * If CONFIG_HAVE_HVM_DIRTY_RING not defined, kvm_dirty_ring.o should
>   * not be included as well, so define these nop functions for the arch.
> @@ -66,6 +71,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
>  
>  #else /* CONFIG_HAVE_KVM_DIRTY_RING */
>  
> +bool kvm_dirty_ring_exclusive(struct kvm *kvm);
>  int kvm_cpu_dirty_log_size(void);
>  u32 kvm_dirty_ring_get_rsvd_entries(void);
>  int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 53fa3134fee0..a3fae111f25c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -780,6 +780,7 @@ struct kvm {
>  	pid_t userspace_pid;
>  	unsigned int max_halt_poll_ns;
>  	u32 dirty_ring_size;
> +	bool dirty_ring_with_bitmap;
>  	bool vm_bugged;
>  	bool vm_dead;
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 0d5d4419139a..c87b5882d7ae 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_S390_ZPCI_OP 221
>  #define KVM_CAP_S390_CPU_TOPOLOGY 222
>  #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
> +#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 800f9470e36b..228be1145cf3 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
>         bool
>         select HAVE_KVM_DIRTY_RING
>  
> +# Only architectures that need to dirty memory outside of a vCPU
> +# context should select this, advertising to userspace the
> +# requirement to use a dirty bitmap in addition to the vCPU dirty
> +# ring.
> +config HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	bool
> +	depends on HAVE_KVM_DIRTY_RING
> +
>  config HAVE_KVM_EVENTFD
>         bool
>         select EVENTFD
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index f68d75026bc0..9cc60af291ef 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -11,6 +11,11 @@
>  #include <trace/events/kvm.h>
>  #include "kvm_mm.h"
>  
> +bool kvm_dirty_ring_exclusive(struct kvm *kvm)
> +{
> +	return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
> +}
> +
>  int __weak kvm_cpu_dirty_log_size(void)
>  {
>  	return 0;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 5b064dbadaf4..8915dcefcefd 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
>  			new->dirty_bitmap = NULL;
>  		else if (old && old->dirty_bitmap)
>  			new->dirty_bitmap = old->dirty_bitmap;
> -		else if (!kvm->dirty_ring_size) {
> +		else if (!kvm_dirty_ring_exclusive(kvm)) {
>  			r = kvm_alloc_dirty_bitmap(new);
>  			if (r)
>  				return r;
> @@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>  	unsigned long n;
>  	unsigned long any = 0;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
> +	if (kvm_dirty_ring_exclusive(kvm))
>  		return -ENXIO;
>  
>  	*memslot = NULL;
> @@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>  	unsigned long *dirty_bitmap_buffer;
>  	bool flush;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
> +	if (kvm_dirty_ring_exclusive(kvm))
>  		return -ENXIO;
>  
>  	as_id = log->slot >> 16;
> @@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>  	unsigned long *dirty_bitmap_buffer;
>  	bool flush;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
> +	if (kvm_dirty_ring_exclusive(kvm))
>  		return -ENXIO;
>  
>  	as_id = log->slot >> 16;
> @@ -3305,15 +3305,20 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>  	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  
>  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>  		return;
> +
> +#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	if (WARN_ON_ONCE(!vcpu))
> +		return;
> +#endif
>  #endif
>  
>  	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
>  		unsigned long rel_gfn = gfn - memslot->base_gfn;
>  		u32 slot = (memslot->as_id << 16) | memslot->id;
>  
> -		if (kvm->dirty_ring_size)
> +		if (vcpu && kvm->dirty_ring_size)
>  			kvm_dirty_ring_push(&vcpu->dirty_ring,
>  					    slot, rel_gfn);
>  		else
> @@ -4485,6 +4490,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
>  #else
>  		return 0;
> +#endif
> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>  #endif
>  	case KVM_CAP_BINARY_STATS_FD:
>  	case KVM_CAP_SYSTEM_EVENT_DATA:
> @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
>  {
>  	int r;
>  
> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	if (!kvm->dirty_ring_with_bitmap)
> +		return -EINVAL;
> +#endif
> +
>  	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
>  		return -EINVAL;
>  
> @@ -4588,6 +4601,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  	case KVM_CAP_DIRTY_LOG_RING:
>  	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
>  		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> +		kvm->dirty_ring_with_bitmap = true;

IIUC what Oliver wanted to suggest is we can avoid enabling of this cap,
then we don't need dirty_ring_with_bitmap field but instead we can check
against CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP when needed.

I think that'll make sense, because without the bitmap the ring won't work
with arm64, so not valid to not enable it at all.  But good to double check
with Oliver too.

The rest looks good to me, thanks,

> +		return 0;
>  	default:
>  		return kvm_vm_ioctl_enable_cap(kvm, cap);
>  	}
> -- 
> 2.23.0
> 

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-18 16:07     ` Peter Xu
@ 2022-10-18 22:20       ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-18 22:20 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvmarm, kvmarm, kvm, maz, will, catalin.marinas, bgardon, shuah,
	andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

Hi Peter,

On 10/19/22 12:07 AM, Peter Xu wrote:
> On Tue, Oct 11, 2022 at 02:14:42PM +0800, Gavin Shan wrote:
>> Some architectures (such as arm64) need to dirty memory outside of the
>> context of a vCPU. Of course, this simply doesn't fit with the UAPI of
>> KVM's per-vCPU dirty ring.
>>
>> Introduce a new flavor of dirty ring that requires the use of both vCPU
>> dirty rings and a dirty bitmap. The expectation is that for non-vCPU
>> sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
>> the dirty bitmap. Userspace should scan the dirty bitmap before
>> migrating the VM to the target.
>>
>> Use an additional capability to advertize this behavior and require
>> explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
>> you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
>> not allow userspace to enable dirty ring if it hasn't also enabled the
>> ring && bitmap capability, as a VM is likely DOA without the pages
>> marked in the bitmap.
>>
>> Suggested-by: Marc Zyngier <maz@kernel.org>
>> Suggested-by: Peter Xu <peterx@redhat.com>
>> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   Documentation/virt/kvm/api.rst | 17 ++++++++---------
>>   include/linux/kvm_dirty_ring.h |  6 ++++++
>>   include/linux/kvm_host.h       |  1 +
>>   include/uapi/linux/kvm.h       |  1 +
>>   virt/kvm/Kconfig               |  8 ++++++++
>>   virt/kvm/dirty_ring.c          |  5 +++++
>>   virt/kvm/kvm_main.c            | 34 +++++++++++++++++++++++++---------
>>   7 files changed, 54 insertions(+), 18 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 32427ea160df..09fa6c491c1b 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -8019,8 +8019,8 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
>>   (0x40000001). Otherwise, a guest may use the paravirtual features
>>   regardless of what has actually been exposed through the CPUID leaf.
>>   
>> -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>> -----------------------------------------------------------
>> +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_WITH_BITMAP}
> 
> Shall we open a new section for RING_WITH_BITMAP?  Otherwise here it still
> looks like these are three options for the rings.
> 
> Perhaps RING_WITH_BITMAP doesn't worth a section at all, so we can avoid
> mentioning it here to avoid confusing.
> 

Lets avoid mentioning it in the subject since it's a add-on functionality
and not all architectures need it.

>> +-------------------------------------------------------------
>>   
>>   :Architectures: x86
>>   :Parameters: args[0] - size of the dirty log ring
>> @@ -8104,13 +8104,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
>>   needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
>>   vmexit ensures that all dirty GFNs are flushed to the dirty rings.
>>   
>> -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
>> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
>> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
>> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
>> -machine will switch to ring-buffer dirty page tracking and further
>> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
>> -
>>   NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
>>   should be exposed by weakly ordered architecture, in order to indicate
>>   the additional memory ordering requirements imposed on userspace when
>> @@ -8119,6 +8112,12 @@ Architecture with TSO-like ordering (such as x86) are allowed to
>>   expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>>   to userspace.
>>   
>> +NOTE: There is no running vcpu and available vcpu dirty ring when pages
> 
> IMHO it'll be great to start with something like below to describe the
> userspace's responsibility to proactively detect the WITH_BITMAP cap:
> 
>    Before using the dirty rings, the userspace needs to detect the cap of
>    KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
>    need to be backed by per-slot bitmaps.
> 
>    When KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP returns 1, it means the arch can
>    dirty guest pages without vcpu/ring context, so that some of the dirty
>    information will still be maintained in the bitmap structure.
> 
>    Note that the bitmap here is only a backup of the ring structure, and it
>    doesn't need to be collected until the final switch-over of migration
>    process.  Normally the bitmap should only contain a very small amount of
>    dirty pages only, which needs to be transferred during VM downtime.
> 
>    To collect dirty bits in the backup bitmap, the userspace can use the
>    same KVM_GET_DIRTY_LOG ioctl.  Since it's always the last phase of
>    migration that needs the fetching of dirty bitmap, KVM_CLEAR_DIRTY_LOG
>    ioctl should not be needed in this case and its behavior undefined.
> 
> That's how I understand this new cap, but let me know if you think any of
> above is inproper.
> 

Yes, It looks much better to describe how KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
is used. However, the missed part is the capability is still need to be enabled
prior to KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. It means the capability needs
to be acknowledged (confirmed) by user space. Otherwise, KVM_CAP_DIRTY_LOG_RING_ACQ_REL
can't be enabled successfully. It seems Oliver, you and I aren't on same page for
this part. Please refer to below reply for more discussion. After the discussion
is finalized, I can amend the description accordingly here.

>> +becomes dirty in some cases. One example is to save arm64's vgic/its
>> +tables during migration.
> 
> Nit: it'll be great to mention the exact arm ioctl here just in case anyone
> would like to further reference the code.
> 

Yep, good point. I will mention the ioctl here.

>> The dirty bitmap is still used to track those
>> +dirty pages, indicated by KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The ditry
>> +bitmap is visited by KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls.
>> +
>>   8.30 KVM_CAP_XEN_HVM
>>   --------------------
>>   
>> diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
>> index fe5982b46424..23b2b466aa0f 100644
>> --- a/include/linux/kvm_dirty_ring.h
>> +++ b/include/linux/kvm_dirty_ring.h
>> @@ -28,6 +28,11 @@ struct kvm_dirty_ring {
>>   };
>>   
>>   #ifndef CONFIG_HAVE_KVM_DIRTY_RING
>> +static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
>> +{
>> +	return false;
>> +}
>> +
>>   /*
>>    * If CONFIG_HAVE_HVM_DIRTY_RING not defined, kvm_dirty_ring.o should
>>    * not be included as well, so define these nop functions for the arch.
>> @@ -66,6 +71,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
>>   
>>   #else /* CONFIG_HAVE_KVM_DIRTY_RING */
>>   
>> +bool kvm_dirty_ring_exclusive(struct kvm *kvm);
>>   int kvm_cpu_dirty_log_size(void);
>>   u32 kvm_dirty_ring_get_rsvd_entries(void);
>>   int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 53fa3134fee0..a3fae111f25c 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -780,6 +780,7 @@ struct kvm {
>>   	pid_t userspace_pid;
>>   	unsigned int max_halt_poll_ns;
>>   	u32 dirty_ring_size;
>> +	bool dirty_ring_with_bitmap;
>>   	bool vm_bugged;
>>   	bool vm_dead;
>>   
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 0d5d4419139a..c87b5882d7ae 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
>>   #define KVM_CAP_S390_ZPCI_OP 221
>>   #define KVM_CAP_S390_CPU_TOPOLOGY 222
>>   #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
>> +#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
>>   
>>   #ifdef KVM_CAP_IRQ_ROUTING
>>   
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index 800f9470e36b..228be1145cf3 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
>>          bool
>>          select HAVE_KVM_DIRTY_RING
>>   
>> +# Only architectures that need to dirty memory outside of a vCPU
>> +# context should select this, advertising to userspace the
>> +# requirement to use a dirty bitmap in addition to the vCPU dirty
>> +# ring.
>> +config HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	bool
>> +	depends on HAVE_KVM_DIRTY_RING
>> +
>>   config HAVE_KVM_EVENTFD
>>          bool
>>          select EVENTFD
>> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
>> index f68d75026bc0..9cc60af291ef 100644
>> --- a/virt/kvm/dirty_ring.c
>> +++ b/virt/kvm/dirty_ring.c
>> @@ -11,6 +11,11 @@
>>   #include <trace/events/kvm.h>
>>   #include "kvm_mm.h"
>>   
>> +bool kvm_dirty_ring_exclusive(struct kvm *kvm)
>> +{
>> +	return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
>> +}
>> +
>>   int __weak kvm_cpu_dirty_log_size(void)
>>   {
>>   	return 0;
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 5b064dbadaf4..8915dcefcefd 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
>>   			new->dirty_bitmap = NULL;
>>   		else if (old && old->dirty_bitmap)
>>   			new->dirty_bitmap = old->dirty_bitmap;
>> -		else if (!kvm->dirty_ring_size) {
>> +		else if (!kvm_dirty_ring_exclusive(kvm)) {
>>   			r = kvm_alloc_dirty_bitmap(new);
>>   			if (r)
>>   				return r;
>> @@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>>   	unsigned long n;
>>   	unsigned long any = 0;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
>> +	if (kvm_dirty_ring_exclusive(kvm))
>>   		return -ENXIO;
>>   
>>   	*memslot = NULL;
>> @@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>>   	unsigned long *dirty_bitmap_buffer;
>>   	bool flush;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
>> +	if (kvm_dirty_ring_exclusive(kvm))
>>   		return -ENXIO;
>>   
>>   	as_id = log->slot >> 16;
>> @@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>>   	unsigned long *dirty_bitmap_buffer;
>>   	bool flush;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
>> +	if (kvm_dirty_ring_exclusive(kvm))
>>   		return -ENXIO;
>>   
>>   	as_id = log->slot >> 16;
>> @@ -3305,15 +3305,20 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>   	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>   
>>   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>> -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>> +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>>   		return;
>> +
>> +#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	if (WARN_ON_ONCE(!vcpu))
>> +		return;
>> +#endif
>>   #endif
>>   
>>   	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
>>   		unsigned long rel_gfn = gfn - memslot->base_gfn;
>>   		u32 slot = (memslot->as_id << 16) | memslot->id;
>>   
>> -		if (kvm->dirty_ring_size)
>> +		if (vcpu && kvm->dirty_ring_size)
>>   			kvm_dirty_ring_push(&vcpu->dirty_ring,
>>   					    slot, rel_gfn);
>>   		else
>> @@ -4485,6 +4490,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>>   		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
>>   #else
>>   		return 0;
>> +#endif
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>>   #endif
>>   	case KVM_CAP_BINARY_STATS_FD:
>>   	case KVM_CAP_SYSTEM_EVENT_DATA:
>> @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
>>   {
>>   	int r;
>>   
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	if (!kvm->dirty_ring_with_bitmap)
>> +		return -EINVAL;
>> +#endif
>> +
>>   	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
>>   		return -EINVAL;
>>   
>> @@ -4588,6 +4601,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>   	case KVM_CAP_DIRTY_LOG_RING:
>>   	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
>>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>> +		kvm->dirty_ring_with_bitmap = true;
> 
> IIUC what Oliver wanted to suggest is we can avoid enabling of this cap,
> then we don't need dirty_ring_with_bitmap field but instead we can check
> against CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP when needed.
> 
> I think that'll make sense, because without the bitmap the ring won't work
> with arm64, so not valid to not enable it at all.  But good to double check
> with Oliver too.
> 
> The rest looks good to me, thanks,
> 

It was suggested by Oliver to expose KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The
user space also needs to enable the capability prior to KVM_CAP_DIRTY_LOG_RING_ACQ_REL
on ARM64. I may be missing something since Oliver and you had lots of discussion
on this particular new capability.

I'm fine to drop the bits to enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. It means
the capability is exposed to user space on ARM64 and user space need __not__ to
enable it prior to KVM_CAP_DIRTY_LOG_RING_ACQ_REL. I would like Oliver helps to
confirm before I'm able to post v7.

>> +		return 0;
>>   	default:
>>   		return kvm_vm_ioctl_enable_cap(kvm, cap);
>>   	}

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-18 22:20       ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-18 22:20 UTC (permalink / raw)
  To: Peter Xu
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

Hi Peter,

On 10/19/22 12:07 AM, Peter Xu wrote:
> On Tue, Oct 11, 2022 at 02:14:42PM +0800, Gavin Shan wrote:
>> Some architectures (such as arm64) need to dirty memory outside of the
>> context of a vCPU. Of course, this simply doesn't fit with the UAPI of
>> KVM's per-vCPU dirty ring.
>>
>> Introduce a new flavor of dirty ring that requires the use of both vCPU
>> dirty rings and a dirty bitmap. The expectation is that for non-vCPU
>> sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
>> the dirty bitmap. Userspace should scan the dirty bitmap before
>> migrating the VM to the target.
>>
>> Use an additional capability to advertize this behavior and require
>> explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
>> you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
>> not allow userspace to enable dirty ring if it hasn't also enabled the
>> ring && bitmap capability, as a VM is likely DOA without the pages
>> marked in the bitmap.
>>
>> Suggested-by: Marc Zyngier <maz@kernel.org>
>> Suggested-by: Peter Xu <peterx@redhat.com>
>> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   Documentation/virt/kvm/api.rst | 17 ++++++++---------
>>   include/linux/kvm_dirty_ring.h |  6 ++++++
>>   include/linux/kvm_host.h       |  1 +
>>   include/uapi/linux/kvm.h       |  1 +
>>   virt/kvm/Kconfig               |  8 ++++++++
>>   virt/kvm/dirty_ring.c          |  5 +++++
>>   virt/kvm/kvm_main.c            | 34 +++++++++++++++++++++++++---------
>>   7 files changed, 54 insertions(+), 18 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 32427ea160df..09fa6c491c1b 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -8019,8 +8019,8 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
>>   (0x40000001). Otherwise, a guest may use the paravirtual features
>>   regardless of what has actually been exposed through the CPUID leaf.
>>   
>> -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>> -----------------------------------------------------------
>> +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_WITH_BITMAP}
> 
> Shall we open a new section for RING_WITH_BITMAP?  Otherwise here it still
> looks like these are three options for the rings.
> 
> Perhaps RING_WITH_BITMAP doesn't worth a section at all, so we can avoid
> mentioning it here to avoid confusing.
> 

Lets avoid mentioning it in the subject since it's a add-on functionality
and not all architectures need it.

>> +-------------------------------------------------------------
>>   
>>   :Architectures: x86
>>   :Parameters: args[0] - size of the dirty log ring
>> @@ -8104,13 +8104,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
>>   needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
>>   vmexit ensures that all dirty GFNs are flushed to the dirty rings.
>>   
>> -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
>> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
>> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
>> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
>> -machine will switch to ring-buffer dirty page tracking and further
>> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
>> -
>>   NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
>>   should be exposed by weakly ordered architecture, in order to indicate
>>   the additional memory ordering requirements imposed on userspace when
>> @@ -8119,6 +8112,12 @@ Architecture with TSO-like ordering (such as x86) are allowed to
>>   expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>>   to userspace.
>>   
>> +NOTE: There is no running vcpu and available vcpu dirty ring when pages
> 
> IMHO it'll be great to start with something like below to describe the
> userspace's responsibility to proactively detect the WITH_BITMAP cap:
> 
>    Before using the dirty rings, the userspace needs to detect the cap of
>    KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
>    need to be backed by per-slot bitmaps.
> 
>    When KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP returns 1, it means the arch can
>    dirty guest pages without vcpu/ring context, so that some of the dirty
>    information will still be maintained in the bitmap structure.
> 
>    Note that the bitmap here is only a backup of the ring structure, and it
>    doesn't need to be collected until the final switch-over of migration
>    process.  Normally the bitmap should only contain a very small amount of
>    dirty pages only, which needs to be transferred during VM downtime.
> 
>    To collect dirty bits in the backup bitmap, the userspace can use the
>    same KVM_GET_DIRTY_LOG ioctl.  Since it's always the last phase of
>    migration that needs the fetching of dirty bitmap, KVM_CLEAR_DIRTY_LOG
>    ioctl should not be needed in this case and its behavior undefined.
> 
> That's how I understand this new cap, but let me know if you think any of
> above is inproper.
> 

Yes, It looks much better to describe how KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
is used. However, the missed part is the capability is still need to be enabled
prior to KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. It means the capability needs
to be acknowledged (confirmed) by user space. Otherwise, KVM_CAP_DIRTY_LOG_RING_ACQ_REL
can't be enabled successfully. It seems Oliver, you and I aren't on same page for
this part. Please refer to below reply for more discussion. After the discussion
is finalized, I can amend the description accordingly here.

>> +becomes dirty in some cases. One example is to save arm64's vgic/its
>> +tables during migration.
> 
> Nit: it'll be great to mention the exact arm ioctl here just in case anyone
> would like to further reference the code.
> 

Yep, good point. I will mention the ioctl here.

>> The dirty bitmap is still used to track those
>> +dirty pages, indicated by KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The ditry
>> +bitmap is visited by KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls.
>> +
>>   8.30 KVM_CAP_XEN_HVM
>>   --------------------
>>   
>> diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
>> index fe5982b46424..23b2b466aa0f 100644
>> --- a/include/linux/kvm_dirty_ring.h
>> +++ b/include/linux/kvm_dirty_ring.h
>> @@ -28,6 +28,11 @@ struct kvm_dirty_ring {
>>   };
>>   
>>   #ifndef CONFIG_HAVE_KVM_DIRTY_RING
>> +static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
>> +{
>> +	return false;
>> +}
>> +
>>   /*
>>    * If CONFIG_HAVE_HVM_DIRTY_RING not defined, kvm_dirty_ring.o should
>>    * not be included as well, so define these nop functions for the arch.
>> @@ -66,6 +71,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
>>   
>>   #else /* CONFIG_HAVE_KVM_DIRTY_RING */
>>   
>> +bool kvm_dirty_ring_exclusive(struct kvm *kvm);
>>   int kvm_cpu_dirty_log_size(void);
>>   u32 kvm_dirty_ring_get_rsvd_entries(void);
>>   int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 53fa3134fee0..a3fae111f25c 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -780,6 +780,7 @@ struct kvm {
>>   	pid_t userspace_pid;
>>   	unsigned int max_halt_poll_ns;
>>   	u32 dirty_ring_size;
>> +	bool dirty_ring_with_bitmap;
>>   	bool vm_bugged;
>>   	bool vm_dead;
>>   
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 0d5d4419139a..c87b5882d7ae 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
>>   #define KVM_CAP_S390_ZPCI_OP 221
>>   #define KVM_CAP_S390_CPU_TOPOLOGY 222
>>   #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
>> +#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
>>   
>>   #ifdef KVM_CAP_IRQ_ROUTING
>>   
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index 800f9470e36b..228be1145cf3 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
>>          bool
>>          select HAVE_KVM_DIRTY_RING
>>   
>> +# Only architectures that need to dirty memory outside of a vCPU
>> +# context should select this, advertising to userspace the
>> +# requirement to use a dirty bitmap in addition to the vCPU dirty
>> +# ring.
>> +config HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	bool
>> +	depends on HAVE_KVM_DIRTY_RING
>> +
>>   config HAVE_KVM_EVENTFD
>>          bool
>>          select EVENTFD
>> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
>> index f68d75026bc0..9cc60af291ef 100644
>> --- a/virt/kvm/dirty_ring.c
>> +++ b/virt/kvm/dirty_ring.c
>> @@ -11,6 +11,11 @@
>>   #include <trace/events/kvm.h>
>>   #include "kvm_mm.h"
>>   
>> +bool kvm_dirty_ring_exclusive(struct kvm *kvm)
>> +{
>> +	return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
>> +}
>> +
>>   int __weak kvm_cpu_dirty_log_size(void)
>>   {
>>   	return 0;
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 5b064dbadaf4..8915dcefcefd 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
>>   			new->dirty_bitmap = NULL;
>>   		else if (old && old->dirty_bitmap)
>>   			new->dirty_bitmap = old->dirty_bitmap;
>> -		else if (!kvm->dirty_ring_size) {
>> +		else if (!kvm_dirty_ring_exclusive(kvm)) {
>>   			r = kvm_alloc_dirty_bitmap(new);
>>   			if (r)
>>   				return r;
>> @@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>>   	unsigned long n;
>>   	unsigned long any = 0;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
>> +	if (kvm_dirty_ring_exclusive(kvm))
>>   		return -ENXIO;
>>   
>>   	*memslot = NULL;
>> @@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>>   	unsigned long *dirty_bitmap_buffer;
>>   	bool flush;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
>> +	if (kvm_dirty_ring_exclusive(kvm))
>>   		return -ENXIO;
>>   
>>   	as_id = log->slot >> 16;
>> @@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>>   	unsigned long *dirty_bitmap_buffer;
>>   	bool flush;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
>> +	if (kvm_dirty_ring_exclusive(kvm))
>>   		return -ENXIO;
>>   
>>   	as_id = log->slot >> 16;
>> @@ -3305,15 +3305,20 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>   	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>   
>>   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>> -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>> +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>>   		return;
>> +
>> +#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	if (WARN_ON_ONCE(!vcpu))
>> +		return;
>> +#endif
>>   #endif
>>   
>>   	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
>>   		unsigned long rel_gfn = gfn - memslot->base_gfn;
>>   		u32 slot = (memslot->as_id << 16) | memslot->id;
>>   
>> -		if (kvm->dirty_ring_size)
>> +		if (vcpu && kvm->dirty_ring_size)
>>   			kvm_dirty_ring_push(&vcpu->dirty_ring,
>>   					    slot, rel_gfn);
>>   		else
>> @@ -4485,6 +4490,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>>   		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
>>   #else
>>   		return 0;
>> +#endif
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>>   #endif
>>   	case KVM_CAP_BINARY_STATS_FD:
>>   	case KVM_CAP_SYSTEM_EVENT_DATA:
>> @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
>>   {
>>   	int r;
>>   
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	if (!kvm->dirty_ring_with_bitmap)
>> +		return -EINVAL;
>> +#endif
>> +
>>   	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
>>   		return -EINVAL;
>>   
>> @@ -4588,6 +4601,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>   	case KVM_CAP_DIRTY_LOG_RING:
>>   	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
>>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>> +		kvm->dirty_ring_with_bitmap = true;
> 
> IIUC what Oliver wanted to suggest is we can avoid enabling of this cap,
> then we don't need dirty_ring_with_bitmap field but instead we can check
> against CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP when needed.
> 
> I think that'll make sense, because without the bitmap the ring won't work
> with arm64, so not valid to not enable it at all.  But good to double check
> with Oliver too.
> 
> The rest looks good to me, thanks,
> 

It was suggested by Oliver to expose KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The
user space also needs to enable the capability prior to KVM_CAP_DIRTY_LOG_RING_ACQ_REL
on ARM64. I may be missing something since Oliver and you had lots of discussion
on this particular new capability.

I'm fine to drop the bits to enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. It means
the capability is exposed to user space on ARM64 and user space need __not__ to
enable it prior to KVM_CAP_DIRTY_LOG_RING_ACQ_REL. I would like Oliver helps to
confirm before I'm able to post v7.

>> +		return 0;
>>   	default:
>>   		return kvm_vm_ioctl_enable_cap(kvm, cap);
>>   	}

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-18 22:20       ` Gavin Shan
@ 2022-10-20 18:58         ` Oliver Upton
  -1 siblings, 0 replies; 86+ messages in thread
From: Oliver Upton @ 2022-10-20 18:58 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Peter Xu, kvmarm, kvmarm, kvm, maz, will, catalin.marinas,
	bgardon, shuah, andrew.jones, dmatlack, pbonzini, zhenyzha,
	james.morse, suzuki.poulose, alexandru.elisei, seanjc,
	shan.gavin

On Wed, Oct 19, 2022 at 06:20:32AM +0800, Gavin Shan wrote:
> Hi Peter,
> 
> On 10/19/22 12:07 AM, Peter Xu wrote:
> > On Tue, Oct 11, 2022 at 02:14:42PM +0800, Gavin Shan wrote:

[...]

> > IMHO it'll be great to start with something like below to describe the
> > userspace's responsibility to proactively detect the WITH_BITMAP cap:
> > 
> >    Before using the dirty rings, the userspace needs to detect the cap of
> >    KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> >    need to be backed by per-slot bitmaps.
> > 
> >    When KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP returns 1, it means the arch can
> >    dirty guest pages without vcpu/ring context, so that some of the dirty
> >    information will still be maintained in the bitmap structure.
> > 
> >    Note that the bitmap here is only a backup of the ring structure, and it
> >    doesn't need to be collected until the final switch-over of migration
> >    process.  Normally the bitmap should only contain a very small amount of
> >    dirty pages only, which needs to be transferred during VM downtime.
> > 
> >    To collect dirty bits in the backup bitmap, the userspace can use the
> >    same KVM_GET_DIRTY_LOG ioctl.  Since it's always the last phase of
> >    migration that needs the fetching of dirty bitmap, KVM_CLEAR_DIRTY_LOG
> >    ioctl should not be needed in this case and its behavior undefined.
> > 
> > That's how I understand this new cap, but let me know if you think any of
> > above is inproper.
> > 
> 
> Yes, It looks much better to describe how KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> is used. However, the missed part is the capability is still need to be enabled
> prior to KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. It means the capability needs
> to be acknowledged (confirmed) by user space. Otherwise, KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> can't be enabled successfully. It seems Oliver, you and I aren't on same page for
> this part. Please refer to below reply for more discussion. After the discussion
> is finalized, I can amend the description accordingly here.

I'll follow up on the details of the CAP below, but wanted to explicitly
note some stuff for documentation:

Collecting the dirty bitmap should be the very last thing that the VMM
does before transmitting state to the target VMM. You'll want to make
sure that the dirty state is final and avoid missing dirty pages from
another ioctl ordered after bitmap collection.

[...]

> > > +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> > > +		kvm->dirty_ring_with_bitmap = true;
> > 
> > IIUC what Oliver wanted to suggest is we can avoid enabling of this cap,
> > then we don't need dirty_ring_with_bitmap field but instead we can check
> > against CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP when needed.
> > 
> > I think that'll make sense, because without the bitmap the ring won't work
> > with arm64, so not valid to not enable it at all.  But good to double check
> > with Oliver too.
> > 
> > The rest looks good to me, thanks,
> > 
> 
> It was suggested by Oliver to expose KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The
> user space also needs to enable the capability prior to KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> on ARM64. I may be missing something since Oliver and you had lots of discussion
> on this particular new capability.
> 
> I'm fine to drop the bits to enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. It means
> the capability is exposed to user space on ARM64 and user space need __not__ to
> enable it prior to KVM_CAP_DIRTY_LOG_RING_ACQ_REL. I would like Oliver helps to
> confirm before I'm able to post v7.

IMO you really want the explicit buy-in from userspace, as failure to
collect the dirty bitmap will result in a dead VM on the other side of
migration. Fundamentally we're changing the ABI of
KVM_CAP_DIRTY_LOG_RING[_ACQ_REL].

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-20 18:58         ` Oliver Upton
  0 siblings, 0 replies; 86+ messages in thread
From: Oliver Upton @ 2022-10-20 18:58 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Wed, Oct 19, 2022 at 06:20:32AM +0800, Gavin Shan wrote:
> Hi Peter,
> 
> On 10/19/22 12:07 AM, Peter Xu wrote:
> > On Tue, Oct 11, 2022 at 02:14:42PM +0800, Gavin Shan wrote:

[...]

> > IMHO it'll be great to start with something like below to describe the
> > userspace's responsibility to proactively detect the WITH_BITMAP cap:
> > 
> >    Before using the dirty rings, the userspace needs to detect the cap of
> >    KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> >    need to be backed by per-slot bitmaps.
> > 
> >    When KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP returns 1, it means the arch can
> >    dirty guest pages without vcpu/ring context, so that some of the dirty
> >    information will still be maintained in the bitmap structure.
> > 
> >    Note that the bitmap here is only a backup of the ring structure, and it
> >    doesn't need to be collected until the final switch-over of migration
> >    process.  Normally the bitmap should only contain a very small amount of
> >    dirty pages only, which needs to be transferred during VM downtime.
> > 
> >    To collect dirty bits in the backup bitmap, the userspace can use the
> >    same KVM_GET_DIRTY_LOG ioctl.  Since it's always the last phase of
> >    migration that needs the fetching of dirty bitmap, KVM_CLEAR_DIRTY_LOG
> >    ioctl should not be needed in this case and its behavior undefined.
> > 
> > That's how I understand this new cap, but let me know if you think any of
> > above is inproper.
> > 
> 
> Yes, It looks much better to describe how KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> is used. However, the missed part is the capability is still need to be enabled
> prior to KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. It means the capability needs
> to be acknowledged (confirmed) by user space. Otherwise, KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> can't be enabled successfully. It seems Oliver, you and I aren't on same page for
> this part. Please refer to below reply for more discussion. After the discussion
> is finalized, I can amend the description accordingly here.

I'll follow up on the details of the CAP below, but wanted to explicitly
note some stuff for documentation:

Collecting the dirty bitmap should be the very last thing that the VMM
does before transmitting state to the target VMM. You'll want to make
sure that the dirty state is final and avoid missing dirty pages from
another ioctl ordered after bitmap collection.

[...]

> > > +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> > > +		kvm->dirty_ring_with_bitmap = true;
> > 
> > IIUC what Oliver wanted to suggest is we can avoid enabling of this cap,
> > then we don't need dirty_ring_with_bitmap field but instead we can check
> > against CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP when needed.
> > 
> > I think that'll make sense, because without the bitmap the ring won't work
> > with arm64, so not valid to not enable it at all.  But good to double check
> > with Oliver too.
> > 
> > The rest looks good to me, thanks,
> > 
> 
> It was suggested by Oliver to expose KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The
> user space also needs to enable the capability prior to KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> on ARM64. I may be missing something since Oliver and you had lots of discussion
> on this particular new capability.
> 
> I'm fine to drop the bits to enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. It means
> the capability is exposed to user space on ARM64 and user space need __not__ to
> enable it prior to KVM_CAP_DIRTY_LOG_RING_ACQ_REL. I would like Oliver helps to
> confirm before I'm able to post v7.

IMO you really want the explicit buy-in from userspace, as failure to
collect the dirty bitmap will result in a dead VM on the other side of
migration. Fundamentally we're changing the ABI of
KVM_CAP_DIRTY_LOG_RING[_ACQ_REL].

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  2022-10-11  6:14   ` Gavin Shan
@ 2022-10-20 22:42     ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-20 22:42 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon,
	shuah, andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, shan.gavin

On Tue, Oct 11, 2022, Gavin Shan wrote:
> This adds KVM_REQ_RING_SOFT_FULL, which is raised when the dirty

"This" is basically "This patch", which is generally frowned upon.  Just state
what changes are being made.

> ring of the specific VCPU becomes softly full in kvm_dirty_ring_push().
> The VCPU is enforced to exit when the request is raised and its
> dirty ring is softly full on its entrance.
> 
> The event is checked and handled in the newly introduced helper
> kvm_dirty_ring_check_request(). With this, kvm_dirty_ring_soft_full()
> becomes a private function.

None of this captures why the request is being added.  I'm guessing Marc's
motivation is to avoid having to check ring on every entry, though there might
also be a correctness issue too?

It'd also be helpful to explain that KVM re-queues the request to maintain KVM's
existing uABI, which enforces the soft_limit even if no entries have been added
to the ring since the last KVM_EXIT_DIRTY_RING_FULL exit.

And maybe call out the alternative(s) that was discussed in v2[*]?

[*] https://lore.kernel.org/all/87illlkqfu.wl-maz@kernel.org

> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> ---
>  arch/x86/kvm/x86.c             | 15 ++++++---------
>  include/linux/kvm_dirty_ring.h |  8 ++------
>  include/linux/kvm_host.h       |  1 +
>  virt/kvm/dirty_ring.c          | 19 ++++++++++++++++++-
>  4 files changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b0c47b41c264..0dd0d32073e7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10260,16 +10260,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>  	bool req_immediate_exit = false;
>  
> -	/* Forbid vmenter if vcpu dirty ring is soft-full */
> -	if (unlikely(vcpu->kvm->dirty_ring_size &&
> -		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
> -		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
> -		trace_kvm_dirty_ring_exit(vcpu);
> -		r = 0;
> -		goto out;
> -	}
> -
>  	if (kvm_request_pending(vcpu)) {
> +		/* Forbid vmenter if vcpu dirty ring is soft-full */

Eh, I'd drop the comment, pretty obvious what the code is doing

> +		if (kvm_dirty_ring_check_request(vcpu)) {

I think it makes to move this check below at KVM_REQ_VM_DEAD.  I doubt it will
ever matter in practice, but conceptually VM_DEAD is a higher priority event.

I'm pretty sure the check can be moved to the very end of the request checks,
e.g. to avoid an aborted VM-Enter attempt if one of the other request triggers
KVM_REQ_RING_SOFT_FULL.

Heh, this might actually be a bug fix of sorts.  If anything pushes to the ring
after the check at the start of vcpu_enter_guest(), then without the request, KVM
would enter the guest while at or above the soft limit, e.g. record_steal_time()
can dirty a page, and the big pile of stuff that's behind KVM_REQ_EVENT can
certainly dirty pages.

> +			r = 0;
> +			goto out;
> +		}
> +
>  		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
>  			r = -EIO;
>  			goto out;

> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -157,6 +157,7 @@ static inline bool is_error_page(struct page *page)
>  #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_UNBLOCK           2
>  #define KVM_REQ_UNHALT            3

UNHALT is gone, the new request can use '3'.

> +#define KVM_REQ_RING_SOFT_FULL    4

Any objection to calling this KVM_REQ_DIRTY_RING_SOFT_FULL?  None of the users
are in danger of having too long lines, and at first glance it's not clear that
this is specifically for the dirty ring.

It'd also give us an excuse to replace spaces with tabs in the above alignment :-)

#define KVM_REQ_TLB_FLUSH		(0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_VM_DEAD			(1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_UNBLOCK			2
#define KVM_REQ_DIRTY_RING_SOFT_FULL	3
#define KVM_REQUEST_ARCH_BASE		8

> @@ -149,6 +149,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
>  
>  void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>  {
> +	struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring);
>  	struct kvm_dirty_gfn *entry;
>  
>  	/* It should never get full */
> @@ -166,6 +167,22 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>  	kvm_dirty_gfn_set_dirtied(entry);
>  	ring->dirty_index++;
>  	trace_kvm_dirty_ring_push(ring, slot, offset);
> +
> +	if (kvm_dirty_ring_soft_full(ring))
> +		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);

Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
about the overhead of having to re-check the request, the goal would be to help
document what causes the request to go away.

E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:

	if (!kvm_dirty_ring_soft_full(ring))
		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);

> +}
> +
> +bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
> +{
> +	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
> +		kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {

Align please,

	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
	    kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {

> +		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);

A comment would be helpful to explain (a) why KVM needs to re-check on the next
KVM_RUN and (b) why this won't indefinitely prevent KVM from entering the guest.
For pretty every other request I can think of, re-queueing a request like this
will effectively hang the vCPU, i.e. this looks wrong at first glance.

> +		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
> +		trace_kvm_dirty_ring_exit(vcpu);
> +		return true;
> +	}
> +
> +	return false;
>  }
>  
>  struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
> -- 
> 2.23.0
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
@ 2022-10-20 22:42     ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-20 22:42 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Tue, Oct 11, 2022, Gavin Shan wrote:
> This adds KVM_REQ_RING_SOFT_FULL, which is raised when the dirty

"This" is basically "This patch", which is generally frowned upon.  Just state
what changes are being made.

> ring of the specific VCPU becomes softly full in kvm_dirty_ring_push().
> The VCPU is enforced to exit when the request is raised and its
> dirty ring is softly full on its entrance.
> 
> The event is checked and handled in the newly introduced helper
> kvm_dirty_ring_check_request(). With this, kvm_dirty_ring_soft_full()
> becomes a private function.

None of this captures why the request is being added.  I'm guessing Marc's
motivation is to avoid having to check ring on every entry, though there might
also be a correctness issue too?

It'd also be helpful to explain that KVM re-queues the request to maintain KVM's
existing uABI, which enforces the soft_limit even if no entries have been added
to the ring since the last KVM_EXIT_DIRTY_RING_FULL exit.

And maybe call out the alternative(s) that was discussed in v2[*]?

[*] https://lore.kernel.org/all/87illlkqfu.wl-maz@kernel.org

> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> ---
>  arch/x86/kvm/x86.c             | 15 ++++++---------
>  include/linux/kvm_dirty_ring.h |  8 ++------
>  include/linux/kvm_host.h       |  1 +
>  virt/kvm/dirty_ring.c          | 19 ++++++++++++++++++-
>  4 files changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b0c47b41c264..0dd0d32073e7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10260,16 +10260,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>  	bool req_immediate_exit = false;
>  
> -	/* Forbid vmenter if vcpu dirty ring is soft-full */
> -	if (unlikely(vcpu->kvm->dirty_ring_size &&
> -		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
> -		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
> -		trace_kvm_dirty_ring_exit(vcpu);
> -		r = 0;
> -		goto out;
> -	}
> -
>  	if (kvm_request_pending(vcpu)) {
> +		/* Forbid vmenter if vcpu dirty ring is soft-full */

Eh, I'd drop the comment, pretty obvious what the code is doing

> +		if (kvm_dirty_ring_check_request(vcpu)) {

I think it makes to move this check below at KVM_REQ_VM_DEAD.  I doubt it will
ever matter in practice, but conceptually VM_DEAD is a higher priority event.

I'm pretty sure the check can be moved to the very end of the request checks,
e.g. to avoid an aborted VM-Enter attempt if one of the other request triggers
KVM_REQ_RING_SOFT_FULL.

Heh, this might actually be a bug fix of sorts.  If anything pushes to the ring
after the check at the start of vcpu_enter_guest(), then without the request, KVM
would enter the guest while at or above the soft limit, e.g. record_steal_time()
can dirty a page, and the big pile of stuff that's behind KVM_REQ_EVENT can
certainly dirty pages.

> +			r = 0;
> +			goto out;
> +		}
> +
>  		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
>  			r = -EIO;
>  			goto out;

> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -157,6 +157,7 @@ static inline bool is_error_page(struct page *page)
>  #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_UNBLOCK           2
>  #define KVM_REQ_UNHALT            3

UNHALT is gone, the new request can use '3'.

> +#define KVM_REQ_RING_SOFT_FULL    4

Any objection to calling this KVM_REQ_DIRTY_RING_SOFT_FULL?  None of the users
are in danger of having too long lines, and at first glance it's not clear that
this is specifically for the dirty ring.

It'd also give us an excuse to replace spaces with tabs in the above alignment :-)

#define KVM_REQ_TLB_FLUSH		(0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_VM_DEAD			(1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_UNBLOCK			2
#define KVM_REQ_DIRTY_RING_SOFT_FULL	3
#define KVM_REQUEST_ARCH_BASE		8

> @@ -149,6 +149,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
>  
>  void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>  {
> +	struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring);
>  	struct kvm_dirty_gfn *entry;
>  
>  	/* It should never get full */
> @@ -166,6 +167,22 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>  	kvm_dirty_gfn_set_dirtied(entry);
>  	ring->dirty_index++;
>  	trace_kvm_dirty_ring_push(ring, slot, offset);
> +
> +	if (kvm_dirty_ring_soft_full(ring))
> +		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);

Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
about the overhead of having to re-check the request, the goal would be to help
document what causes the request to go away.

E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:

	if (!kvm_dirty_ring_soft_full(ring))
		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);

> +}
> +
> +bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
> +{
> +	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
> +		kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {

Align please,

	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
	    kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {

> +		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);

A comment would be helpful to explain (a) why KVM needs to re-check on the next
KVM_RUN and (b) why this won't indefinitely prevent KVM from entering the guest.
For pretty every other request I can think of, re-queueing a request like this
will effectively hang the vCPU, i.e. this looks wrong at first glance.

> +		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
> +		trace_kvm_dirty_ring_exit(vcpu);
> +		return true;
> +	}
> +
> +	return false;
>  }
>  
>  struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
> -- 
> 2.23.0
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-11  6:14   ` Gavin Shan
@ 2022-10-20 23:44     ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-20 23:44 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon,
	shuah, andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, shan.gavin

On Tue, Oct 11, 2022, Gavin Shan wrote:
> Some architectures (such as arm64) need to dirty memory outside of the
> context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> KVM's per-vCPU dirty ring.

What is the point of using the dirty ring in this case?  KVM still burns a pile
of memory for the bitmap.  Is the benefit that userspace can get away with
scanning the bitmap fewer times, e.g. scan it once just before blackout under
the assumption that very few pages will dirty the bitmap?

Why not add a global ring to @kvm?  I assume thread safety is a problem, but the
memory overhead of the dirty_bitmap also seems like a fairly big problem.

> Introduce a new flavor of dirty ring that requires the use of both vCPU
> dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> the dirty bitmap. Userspace should scan the dirty bitmap before
> migrating the VM to the target.
> 
> Use an additional capability to advertize this behavior and require
> explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> not allow userspace to enable dirty ring if it hasn't also enabled the
> ring && bitmap capability, as a VM is likely DOA without the pages
> marked in the bitmap.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>

Co-developed-by needs Oliver's SoB.

>  #ifndef CONFIG_HAVE_KVM_DIRTY_RING
> +static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)

What about inverting the naming to better capture that this is about the dirty
bitmap, and less so about the dirty ring?  It's not obvious what "exclusive"
means, e.g. I saw this stub before reading the changelog and assumed it was
making a dirty ring exclusive to something.

Something like this?

bool kvm_use_dirty_bitmap(struct kvm *kvm)
{
	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
}

> @@ -3305,15 +3305,20 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>  	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  
>  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>  		return;
> +
> +#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	if (WARN_ON_ONCE(!vcpu))

To cut down on the #ifdefs, this can be:

	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) && !vcpu)

though that's arguably even harder to read.  Blech.

> +		return;
> +#endif
>  #endif
>  
>  	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
>  		unsigned long rel_gfn = gfn - memslot->base_gfn;
>  		u32 slot = (memslot->as_id << 16) | memslot->id;
>  
> -		if (kvm->dirty_ring_size)
> +		if (vcpu && kvm->dirty_ring_size)
>  			kvm_dirty_ring_push(&vcpu->dirty_ring,
>  					    slot, rel_gfn);
>  		else
> @@ -4485,6 +4490,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
>  #else
>  		return 0;
> +#endif
> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>  #endif
>  	case KVM_CAP_BINARY_STATS_FD:
>  	case KVM_CAP_SYSTEM_EVENT_DATA:
> @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
>  {
>  	int r;
>  
> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	if (!kvm->dirty_ring_with_bitmap)
> +		return -EINVAL;
> +#endif

This one at least is prettier with IS_ENABLED

	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
	    !kvm->dirty_ring_with_bitmap)
		return -EINVAL;

But dirty_ring_with_bitmap really shouldn't need to exist.  It's mandatory for
architectures that have HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for
architectures that don't.  In other words, the API for enabling the dirty ring
is a bit ugly.

Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been officially
released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP on top, what about
usurping bits 63:32 of cap->args[0] for flags?  E.g.

Ideally we'd use cap->flags directly, but we screwed up with KVM_CAP_DIRTY_LOG_RING
and didn't require flags to be zero :-(

Actually, what's the point of allowing KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be
enabled?  I get why KVM would enumerate this info, i.e. allowing checking, but I
don't seen any value in supporting a second method for enabling the dirty ring.

The acquire-release thing is irrelevant for x86, and no other architecture
supports the dirty ring until this series, i.e. there's no need for KVM to detect
that userspace has been updated to gain acquire-release semantics, because the
fact that userspace is enabling the dirty ring on arm64 means userspace has been
updated.

Same goes for the "with bitmap" capability.  There are no existing arm64 users,
so there's no risk of breaking existing userspace by suddenly shoving stuff into
the dirty bitmap.

KVM doesn't even get the enabling checks right, e.g. KVM_CAP_DIRTY_LOG_RING can be
enabled on architectures that select CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL but not
KVM_CAP_DIRTY_LOG_RING.  The reverse is true (ignoring that x86 selects both and
is the only arch that selects the TSO variant).

Ditto for KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP...
> +
>  	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
>  		return -EINVAL;
>  
> @@ -4588,6 +4601,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  	case KVM_CAP_DIRTY_LOG_RING:
>  	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
>  		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:

... as this should return -EINVAL if CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP=n.

So rather than add a rather useless flag and increase KVM's API surface, why not
make the capabilities informational-only?

---
 include/linux/kvm_dirty_ring.h |  6 +++---
 include/linux/kvm_host.h       |  1 -
 virt/kvm/dirty_ring.c          |  5 +++--
 virt/kvm/kvm_main.c            | 20 ++++----------------
 4 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 23b2b466aa0f..f49db42bc26a 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -28,9 +28,9 @@ struct kvm_dirty_ring {
 };
 
 #ifndef CONFIG_HAVE_KVM_DIRTY_RING
-static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
+static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
 {
-	return false;
+	return true;
 }
 
 /*
@@ -71,7 +71,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
-bool kvm_dirty_ring_exclusive(struct kvm *kvm);
+bool kvm_use_dirty_bitmap(struct kvm *kvm);
 int kvm_cpu_dirty_log_size(void);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d06fbf3e5e95..eb7b1310146d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -779,7 +779,6 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
-	bool dirty_ring_with_bitmap;
 	bool vm_bugged;
 	bool vm_dead;
 
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 9cc60af291ef..53802513de79 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -11,9 +11,10 @@
 #include <trace/events/kvm.h>
 #include "kvm_mm.h"
 
-bool kvm_dirty_ring_exclusive(struct kvm *kvm)
+bool kvm_use_dirty_bitmap(struct kvm *kvm)
 {
-	return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
+	return IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
+	       !kvm->dirty_ring_size;
 }
 
 int __weak kvm_cpu_dirty_log_size(void)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dd52b8e42307..0e8aaac5a222 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
 			new->dirty_bitmap = NULL;
 		else if (old && old->dirty_bitmap)
 			new->dirty_bitmap = old->dirty_bitmap;
-		else if (!kvm_dirty_ring_exclusive(kvm)) {
+		else if (kvm_use_dirty_bitmap(kvm)) {
 			r = kvm_alloc_dirty_bitmap(new);
 			if (r)
 				return r;
@@ -2060,8 +2060,7 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 	unsigned long n;
 	unsigned long any = 0;
 
-	/* Dirty ring tracking may be exclusive to dirty log tracking */
-	if (kvm_dirty_ring_exclusive(kvm))
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	*memslot = NULL;
@@ -2125,8 +2124,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking may be exclusive to dirty log tracking */
-	if (kvm_dirty_ring_exclusive(kvm))
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -2237,8 +2235,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking may be exclusive to dirty log tracking */
-	if (kvm_dirty_ring_exclusive(kvm))
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -4505,11 +4502,6 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
 {
 	int r;
 
-#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
-	if (!kvm->dirty_ring_with_bitmap)
-		return -EINVAL;
-#endif
-
 	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
 		return -EINVAL;
 
@@ -4597,11 +4589,7 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 		return 0;
 	}
 	case KVM_CAP_DIRTY_LOG_RING:
-	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
-	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
-		kvm->dirty_ring_with_bitmap = true;
-		return 0;
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}

base-commit: 4826e54f82ded9f54782f8e9d6bc36c7bae06c1f
-- 


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-20 23:44     ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-20 23:44 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Tue, Oct 11, 2022, Gavin Shan wrote:
> Some architectures (such as arm64) need to dirty memory outside of the
> context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> KVM's per-vCPU dirty ring.

What is the point of using the dirty ring in this case?  KVM still burns a pile
of memory for the bitmap.  Is the benefit that userspace can get away with
scanning the bitmap fewer times, e.g. scan it once just before blackout under
the assumption that very few pages will dirty the bitmap?

Why not add a global ring to @kvm?  I assume thread safety is a problem, but the
memory overhead of the dirty_bitmap also seems like a fairly big problem.

> Introduce a new flavor of dirty ring that requires the use of both vCPU
> dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> the dirty bitmap. Userspace should scan the dirty bitmap before
> migrating the VM to the target.
> 
> Use an additional capability to advertize this behavior and require
> explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> not allow userspace to enable dirty ring if it hasn't also enabled the
> ring && bitmap capability, as a VM is likely DOA without the pages
> marked in the bitmap.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>

Co-developed-by needs Oliver's SoB.

>  #ifndef CONFIG_HAVE_KVM_DIRTY_RING
> +static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)

What about inverting the naming to better capture that this is about the dirty
bitmap, and less so about the dirty ring?  It's not obvious what "exclusive"
means, e.g. I saw this stub before reading the changelog and assumed it was
making a dirty ring exclusive to something.

Something like this?

bool kvm_use_dirty_bitmap(struct kvm *kvm)
{
	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
}

> @@ -3305,15 +3305,20 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>  	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  
>  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>  		return;
> +
> +#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	if (WARN_ON_ONCE(!vcpu))

To cut down on the #ifdefs, this can be:

	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) && !vcpu)

though that's arguably even harder to read.  Blech.

> +		return;
> +#endif
>  #endif
>  
>  	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
>  		unsigned long rel_gfn = gfn - memslot->base_gfn;
>  		u32 slot = (memslot->as_id << 16) | memslot->id;
>  
> -		if (kvm->dirty_ring_size)
> +		if (vcpu && kvm->dirty_ring_size)
>  			kvm_dirty_ring_push(&vcpu->dirty_ring,
>  					    slot, rel_gfn);
>  		else
> @@ -4485,6 +4490,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
>  #else
>  		return 0;
> +#endif
> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>  #endif
>  	case KVM_CAP_BINARY_STATS_FD:
>  	case KVM_CAP_SYSTEM_EVENT_DATA:
> @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
>  {
>  	int r;
>  
> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	if (!kvm->dirty_ring_with_bitmap)
> +		return -EINVAL;
> +#endif

This one at least is prettier with IS_ENABLED

	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
	    !kvm->dirty_ring_with_bitmap)
		return -EINVAL;

But dirty_ring_with_bitmap really shouldn't need to exist.  It's mandatory for
architectures that have HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for
architectures that don't.  In other words, the API for enabling the dirty ring
is a bit ugly.

Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been officially
released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP on top, what about
usurping bits 63:32 of cap->args[0] for flags?  E.g.

Ideally we'd use cap->flags directly, but we screwed up with KVM_CAP_DIRTY_LOG_RING
and didn't require flags to be zero :-(

Actually, what's the point of allowing KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be
enabled?  I get why KVM would enumerate this info, i.e. allowing checking, but I
don't seen any value in supporting a second method for enabling the dirty ring.

The acquire-release thing is irrelevant for x86, and no other architecture
supports the dirty ring until this series, i.e. there's no need for KVM to detect
that userspace has been updated to gain acquire-release semantics, because the
fact that userspace is enabling the dirty ring on arm64 means userspace has been
updated.

Same goes for the "with bitmap" capability.  There are no existing arm64 users,
so there's no risk of breaking existing userspace by suddenly shoving stuff into
the dirty bitmap.

KVM doesn't even get the enabling checks right, e.g. KVM_CAP_DIRTY_LOG_RING can be
enabled on architectures that select CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL but not
KVM_CAP_DIRTY_LOG_RING.  The reverse is true (ignoring that x86 selects both and
is the only arch that selects the TSO variant).

Ditto for KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP...
> +
>  	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
>  		return -EINVAL;
>  
> @@ -4588,6 +4601,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  	case KVM_CAP_DIRTY_LOG_RING:
>  	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
>  		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:

... as this should return -EINVAL if CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP=n.

So rather than add a rather useless flag and increase KVM's API surface, why not
make the capabilities informational-only?

---
 include/linux/kvm_dirty_ring.h |  6 +++---
 include/linux/kvm_host.h       |  1 -
 virt/kvm/dirty_ring.c          |  5 +++--
 virt/kvm/kvm_main.c            | 20 ++++----------------
 4 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 23b2b466aa0f..f49db42bc26a 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -28,9 +28,9 @@ struct kvm_dirty_ring {
 };
 
 #ifndef CONFIG_HAVE_KVM_DIRTY_RING
-static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
+static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
 {
-	return false;
+	return true;
 }
 
 /*
@@ -71,7 +71,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
-bool kvm_dirty_ring_exclusive(struct kvm *kvm);
+bool kvm_use_dirty_bitmap(struct kvm *kvm);
 int kvm_cpu_dirty_log_size(void);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d06fbf3e5e95..eb7b1310146d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -779,7 +779,6 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
-	bool dirty_ring_with_bitmap;
 	bool vm_bugged;
 	bool vm_dead;
 
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 9cc60af291ef..53802513de79 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -11,9 +11,10 @@
 #include <trace/events/kvm.h>
 #include "kvm_mm.h"
 
-bool kvm_dirty_ring_exclusive(struct kvm *kvm)
+bool kvm_use_dirty_bitmap(struct kvm *kvm)
 {
-	return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
+	return IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
+	       !kvm->dirty_ring_size;
 }
 
 int __weak kvm_cpu_dirty_log_size(void)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dd52b8e42307..0e8aaac5a222 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
 			new->dirty_bitmap = NULL;
 		else if (old && old->dirty_bitmap)
 			new->dirty_bitmap = old->dirty_bitmap;
-		else if (!kvm_dirty_ring_exclusive(kvm)) {
+		else if (kvm_use_dirty_bitmap(kvm)) {
 			r = kvm_alloc_dirty_bitmap(new);
 			if (r)
 				return r;
@@ -2060,8 +2060,7 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 	unsigned long n;
 	unsigned long any = 0;
 
-	/* Dirty ring tracking may be exclusive to dirty log tracking */
-	if (kvm_dirty_ring_exclusive(kvm))
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	*memslot = NULL;
@@ -2125,8 +2124,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking may be exclusive to dirty log tracking */
-	if (kvm_dirty_ring_exclusive(kvm))
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -2237,8 +2235,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking may be exclusive to dirty log tracking */
-	if (kvm_dirty_ring_exclusive(kvm))
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -4505,11 +4502,6 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
 {
 	int r;
 
-#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
-	if (!kvm->dirty_ring_with_bitmap)
-		return -EINVAL;
-#endif
-
 	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
 		return -EINVAL;
 
@@ -4597,11 +4589,7 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 		return 0;
 	}
 	case KVM_CAP_DIRTY_LOG_RING:
-	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
-	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
-		kvm->dirty_ring_with_bitmap = true;
-		return 0;
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}

base-commit: 4826e54f82ded9f54782f8e9d6bc36c7bae06c1f
-- 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  2022-10-20 22:42     ` Sean Christopherson
@ 2022-10-21  5:54       ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-21  5:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvmarm, kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon,
	shuah, andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, shan.gavin

Hi Sean,

On 10/21/22 6:42 AM, Sean Christopherson wrote:
> On Tue, Oct 11, 2022, Gavin Shan wrote:
>> This adds KVM_REQ_RING_SOFT_FULL, which is raised when the dirty
> 
> "This" is basically "This patch", which is generally frowned upon.  Just state
> what changes are being made.
> 

Ok.

>> ring of the specific VCPU becomes softly full in kvm_dirty_ring_push().
>> The VCPU is enforced to exit when the request is raised and its
>> dirty ring is softly full on its entrance.
>>
>> The event is checked and handled in the newly introduced helper
>> kvm_dirty_ring_check_request(). With this, kvm_dirty_ring_soft_full()
>> becomes a private function.
> 
> None of this captures why the request is being added.  I'm guessing Marc's
> motivation is to avoid having to check ring on every entry, though there might
> also be a correctness issue too?
> 
> It'd also be helpful to explain that KVM re-queues the request to maintain KVM's
> existing uABI, which enforces the soft_limit even if no entries have been added
> to the ring since the last KVM_EXIT_DIRTY_RING_FULL exit.
> 
> And maybe call out the alternative(s) that was discussed in v2[*]?
> 
> [*] https://lore.kernel.org/all/87illlkqfu.wl-maz@kernel.org
> 

I think Marc want to make the check more generalized with a new event [1].

[1] https://lore.kernel.org/kvmarm/87fshovtu0.wl-maz@kernel.org/

Yes, the commit log will be modified accordingly after your comments are
addressed. I will add something to explain why KVM_REQ_DIRTY_RING_SOFT_FULL
needed to re-queued, to ensure we have spare space in the dirty ring before
the VCPU becomes runnable.

>> Suggested-by: Marc Zyngier <maz@kernel.org>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> ---
>>   arch/x86/kvm/x86.c             | 15 ++++++---------
>>   include/linux/kvm_dirty_ring.h |  8 ++------
>>   include/linux/kvm_host.h       |  1 +
>>   virt/kvm/dirty_ring.c          | 19 ++++++++++++++++++-
>>   4 files changed, 27 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index b0c47b41c264..0dd0d32073e7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10260,16 +10260,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>   
>>   	bool req_immediate_exit = false;
>>   
>> -	/* Forbid vmenter if vcpu dirty ring is soft-full */
>> -	if (unlikely(vcpu->kvm->dirty_ring_size &&
>> -		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
>> -		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
>> -		trace_kvm_dirty_ring_exit(vcpu);
>> -		r = 0;
>> -		goto out;
>> -	}
>> -
>>   	if (kvm_request_pending(vcpu)) {
>> +		/* Forbid vmenter if vcpu dirty ring is soft-full */
> 
> Eh, I'd drop the comment, pretty obvious what the code is doing
> 

Ok, It will be dropped in next revision.

>> +		if (kvm_dirty_ring_check_request(vcpu)) {
> 
> I think it makes to move this check below at KVM_REQ_VM_DEAD.  I doubt it will
> ever matter in practice, but conceptually VM_DEAD is a higher priority event.
> 
> I'm pretty sure the check can be moved to the very end of the request checks,
> e.g. to avoid an aborted VM-Enter attempt if one of the other request triggers
> KVM_REQ_RING_SOFT_FULL.
> 
> Heh, this might actually be a bug fix of sorts.  If anything pushes to the ring
> after the check at the start of vcpu_enter_guest(), then without the request, KVM
> would enter the guest while at or above the soft limit, e.g. record_steal_time()
> can dirty a page, and the big pile of stuff that's behind KVM_REQ_EVENT can
> certainly dirty pages.
> 

When dirty ring becomes full, the VCPU can't handle any operations, which will
bring more dirty pages. So lets move the check right after KVM_REQ_VM_DEAD, which
is obviously having higher priority than KVM_REQ_DIRTY_RING_SOFT_FULL.

>> +			r = 0;
>> +			goto out;
>> +		}
>> +
>>   		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
>>   			r = -EIO;
>>   			goto out;
> 
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -157,6 +157,7 @@ static inline bool is_error_page(struct page *page)
>>   #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>   #define KVM_REQ_UNBLOCK           2
>>   #define KVM_REQ_UNHALT            3
> 
> UNHALT is gone, the new request can use '3'.
> 

Yep :)

>> +#define KVM_REQ_RING_SOFT_FULL    4
> 
> Any objection to calling this KVM_REQ_DIRTY_RING_SOFT_FULL?  None of the users
> are in danger of having too long lines, and at first glance it's not clear that
> this is specifically for the dirty ring.
> 
> It'd also give us an excuse to replace spaces with tabs in the above alignment :-)
> 
> #define KVM_REQ_TLB_FLUSH		(0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> #define KVM_REQ_VM_DEAD			(1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> #define KVM_REQ_UNBLOCK			2
> #define KVM_REQ_DIRTY_RING_SOFT_FULL	3
> #define KVM_REQUEST_ARCH_BASE		8
> 

KVM_REQ_DIRTY_RING_SOFT_FULL is better. I will rename the event and
replace spaces with tabs in next revision.

>> @@ -149,6 +149,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
>>   
>>   void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>>   {
>> +	struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring);
>>   	struct kvm_dirty_gfn *entry;
>>   
>>   	/* It should never get full */
>> @@ -166,6 +167,22 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>>   	kvm_dirty_gfn_set_dirtied(entry);
>>   	ring->dirty_index++;
>>   	trace_kvm_dirty_ring_push(ring, slot, offset);
>> +
>> +	if (kvm_dirty_ring_soft_full(ring))
>> +		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> 
> Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
> about the overhead of having to re-check the request, the goal would be to help
> document what causes the request to go away.
> 
> E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
> 
> 	if (!kvm_dirty_ring_soft_full(ring))
> 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> 

It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
@vcpu can be achieved by container_of(..., ring).


>> +}
>> +
>> +bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
>> +{
>> +	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
>> +		kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
> 
> Align please,
> 

  Will be fixed in next revision.

> 	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
> 	    kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
> 
>> +		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> 
> A comment would be helpful to explain (a) why KVM needs to re-check on the next
> KVM_RUN and (b) why this won't indefinitely prevent KVM from entering the guest.
> For pretty every other request I can think of, re-queueing a request like this
> will effectively hang the vCPU, i.e. this looks wrong at first glance.
> 

It can indefinitely prevent the VCPU from running if the dirty pages aren't
harvested and the dirty ring is reseted by userspace. I will add something
like below to explain why we need to re-queue the event.

        /*
         * The VCPU isn't runnable when the dirty ring becomes full. The
         * KVM_REQ_DIRTY_RING_SOFT_FULL event is always set to prevent
         * the VCPU from running until the dirty pages are harvested and
         * the dirty ring is reseted by userspace.
         */


>> +		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
>> +		trace_kvm_dirty_ring_exit(vcpu);
>> +		return true;
>> +	}
>> +
>> +	return false;
>>   }
>>   
>>   struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
>> -- 
>> 2.23.0
>>
> 


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
@ 2022-10-21  5:54       ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-21  5:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

Hi Sean,

On 10/21/22 6:42 AM, Sean Christopherson wrote:
> On Tue, Oct 11, 2022, Gavin Shan wrote:
>> This adds KVM_REQ_RING_SOFT_FULL, which is raised when the dirty
> 
> "This" is basically "This patch", which is generally frowned upon.  Just state
> what changes are being made.
> 

Ok.

>> ring of the specific VCPU becomes softly full in kvm_dirty_ring_push().
>> The VCPU is enforced to exit when the request is raised and its
>> dirty ring is softly full on its entrance.
>>
>> The event is checked and handled in the newly introduced helper
>> kvm_dirty_ring_check_request(). With this, kvm_dirty_ring_soft_full()
>> becomes a private function.
> 
> None of this captures why the request is being added.  I'm guessing Marc's
> motivation is to avoid having to check ring on every entry, though there might
> also be a correctness issue too?
> 
> It'd also be helpful to explain that KVM re-queues the request to maintain KVM's
> existing uABI, which enforces the soft_limit even if no entries have been added
> to the ring since the last KVM_EXIT_DIRTY_RING_FULL exit.
> 
> And maybe call out the alternative(s) that was discussed in v2[*]?
> 
> [*] https://lore.kernel.org/all/87illlkqfu.wl-maz@kernel.org
> 

I think Marc want to make the check more generalized with a new event [1].

[1] https://lore.kernel.org/kvmarm/87fshovtu0.wl-maz@kernel.org/

Yes, the commit log will be modified accordingly after your comments are
addressed. I will add something to explain why KVM_REQ_DIRTY_RING_SOFT_FULL
needed to re-queued, to ensure we have spare space in the dirty ring before
the VCPU becomes runnable.

>> Suggested-by: Marc Zyngier <maz@kernel.org>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> ---
>>   arch/x86/kvm/x86.c             | 15 ++++++---------
>>   include/linux/kvm_dirty_ring.h |  8 ++------
>>   include/linux/kvm_host.h       |  1 +
>>   virt/kvm/dirty_ring.c          | 19 ++++++++++++++++++-
>>   4 files changed, 27 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index b0c47b41c264..0dd0d32073e7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10260,16 +10260,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>   
>>   	bool req_immediate_exit = false;
>>   
>> -	/* Forbid vmenter if vcpu dirty ring is soft-full */
>> -	if (unlikely(vcpu->kvm->dirty_ring_size &&
>> -		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
>> -		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
>> -		trace_kvm_dirty_ring_exit(vcpu);
>> -		r = 0;
>> -		goto out;
>> -	}
>> -
>>   	if (kvm_request_pending(vcpu)) {
>> +		/* Forbid vmenter if vcpu dirty ring is soft-full */
> 
> Eh, I'd drop the comment, pretty obvious what the code is doing
> 

Ok, It will be dropped in next revision.

>> +		if (kvm_dirty_ring_check_request(vcpu)) {
> 
> I think it makes to move this check below at KVM_REQ_VM_DEAD.  I doubt it will
> ever matter in practice, but conceptually VM_DEAD is a higher priority event.
> 
> I'm pretty sure the check can be moved to the very end of the request checks,
> e.g. to avoid an aborted VM-Enter attempt if one of the other request triggers
> KVM_REQ_RING_SOFT_FULL.
> 
> Heh, this might actually be a bug fix of sorts.  If anything pushes to the ring
> after the check at the start of vcpu_enter_guest(), then without the request, KVM
> would enter the guest while at or above the soft limit, e.g. record_steal_time()
> can dirty a page, and the big pile of stuff that's behind KVM_REQ_EVENT can
> certainly dirty pages.
> 

When dirty ring becomes full, the VCPU can't handle any operations, which will
bring more dirty pages. So lets move the check right after KVM_REQ_VM_DEAD, which
is obviously having higher priority than KVM_REQ_DIRTY_RING_SOFT_FULL.

>> +			r = 0;
>> +			goto out;
>> +		}
>> +
>>   		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
>>   			r = -EIO;
>>   			goto out;
> 
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -157,6 +157,7 @@ static inline bool is_error_page(struct page *page)
>>   #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>   #define KVM_REQ_UNBLOCK           2
>>   #define KVM_REQ_UNHALT            3
> 
> UNHALT is gone, the new request can use '3'.
> 

Yep :)

>> +#define KVM_REQ_RING_SOFT_FULL    4
> 
> Any objection to calling this KVM_REQ_DIRTY_RING_SOFT_FULL?  None of the users
> are in danger of having too long lines, and at first glance it's not clear that
> this is specifically for the dirty ring.
> 
> It'd also give us an excuse to replace spaces with tabs in the above alignment :-)
> 
> #define KVM_REQ_TLB_FLUSH		(0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> #define KVM_REQ_VM_DEAD			(1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> #define KVM_REQ_UNBLOCK			2
> #define KVM_REQ_DIRTY_RING_SOFT_FULL	3
> #define KVM_REQUEST_ARCH_BASE		8
> 

KVM_REQ_DIRTY_RING_SOFT_FULL is better. I will rename the event and
replace spaces with tabs in next revision.

>> @@ -149,6 +149,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
>>   
>>   void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>>   {
>> +	struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring);
>>   	struct kvm_dirty_gfn *entry;
>>   
>>   	/* It should never get full */
>> @@ -166,6 +167,22 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>>   	kvm_dirty_gfn_set_dirtied(entry);
>>   	ring->dirty_index++;
>>   	trace_kvm_dirty_ring_push(ring, slot, offset);
>> +
>> +	if (kvm_dirty_ring_soft_full(ring))
>> +		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> 
> Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
> about the overhead of having to re-check the request, the goal would be to help
> document what causes the request to go away.
> 
> E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
> 
> 	if (!kvm_dirty_ring_soft_full(ring))
> 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> 

It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
@vcpu can be achieved by container_of(..., ring).


>> +}
>> +
>> +bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
>> +{
>> +	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
>> +		kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
> 
> Align please,
> 

  Will be fixed in next revision.

> 	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
> 	    kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
> 
>> +		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> 
> A comment would be helpful to explain (a) why KVM needs to re-check on the next
> KVM_RUN and (b) why this won't indefinitely prevent KVM from entering the guest.
> For pretty every other request I can think of, re-queueing a request like this
> will effectively hang the vCPU, i.e. this looks wrong at first glance.
> 

It can indefinitely prevent the VCPU from running if the dirty pages aren't
harvested and the dirty ring is reseted by userspace. I will add something
like below to explain why we need to re-queue the event.

        /*
         * The VCPU isn't runnable when the dirty ring becomes full. The
         * KVM_REQ_DIRTY_RING_SOFT_FULL event is always set to prevent
         * the VCPU from running until the dirty pages are harvested and
         * the dirty ring is reseted by userspace.
         */


>> +		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
>> +		trace_kvm_dirty_ring_exit(vcpu);
>> +		return true;
>> +	}
>> +
>> +	return false;
>>   }
>>   
>>   struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
>> -- 
>> 2.23.0
>>
> 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-20 23:44     ` Sean Christopherson
@ 2022-10-21  8:06       ` Marc Zyngier
  -1 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-21  8:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, peterx, will, catalin.marinas,
	bgardon, shuah, andrew.jones, dmatlack, pbonzini, zhenyzha,
	james.morse, suzuki.poulose, alexandru.elisei, oliver.upton,
	shan.gavin

On Fri, 21 Oct 2022 00:44:51 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Tue, Oct 11, 2022, Gavin Shan wrote:
> > Some architectures (such as arm64) need to dirty memory outside of the
> > context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> > KVM's per-vCPU dirty ring.
> 
> What is the point of using the dirty ring in this case?  KVM still
> burns a pile of memory for the bitmap.  Is the benefit that
> userspace can get away with scanning the bitmap fewer times,
> e.g. scan it once just before blackout under the assumption that
> very few pages will dirty the bitmap?

Apparently, the throttling effect of the ring makes it easier to
converge. Someone who actually uses the feature should be able to
tell you. But that's a policy decision, and I don't see why we should
be prescriptive.

> Why not add a global ring to @kvm?  I assume thread safety is a
> problem, but the memory overhead of the dirty_bitmap also seems like
> a fairly big problem.

Because we already have a stupidly bloated API surface, and that we
could do without yet another one based on a sample of *one*? Because
dirtying memory outside of a vcpu context makes it incredibly awkward
to handle a "ring full" condition?

> 
> > Introduce a new flavor of dirty ring that requires the use of both vCPU
> > dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> > sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> > the dirty bitmap. Userspace should scan the dirty bitmap before
> > migrating the VM to the target.
> > 
> > Use an additional capability to advertize this behavior and require
> > explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> > you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> > not allow userspace to enable dirty ring if it hasn't also enabled the
> > ring && bitmap capability, as a VM is likely DOA without the pages
> > marked in the bitmap.

This is wrong. The *only* case this is useful is when there is an
in-kernel producer of data outside of the context of a vcpu, which is
so far only the ITS save mechanism. No ITS? No need for this.
Userspace knows what it has created the first place, and should be in
charge of it (i.e. I want to be able to migrate my GICv2 and
GICv3-without-ITS VMs with the rings only).

> > 
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> 
> Co-developed-by needs Oliver's SoB.
> 
> >  #ifndef CONFIG_HAVE_KVM_DIRTY_RING
> > +static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
> 
> What about inverting the naming to better capture that this is about the dirty
> bitmap, and less so about the dirty ring?  It's not obvious what "exclusive"
> means, e.g. I saw this stub before reading the changelog and assumed it was
> making a dirty ring exclusive to something.
> 
> Something like this?
> 
> bool kvm_use_dirty_bitmap(struct kvm *kvm)
> {
> 	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
> }
> 
> > @@ -3305,15 +3305,20 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> >  	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> >  
> >  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> > -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> > +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> >  		return;
> > +
> > +#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > +	if (WARN_ON_ONCE(!vcpu))
> 
> To cut down on the #ifdefs, this can be:
> 
> 	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) && !vcpu)
> 
> though that's arguably even harder to read.  Blech.
> 
> > +		return;
> > +#endif
> >  #endif
> >  
> >  	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> >  		unsigned long rel_gfn = gfn - memslot->base_gfn;
> >  		u32 slot = (memslot->as_id << 16) | memslot->id;
> >  
> > -		if (kvm->dirty_ring_size)
> > +		if (vcpu && kvm->dirty_ring_size)
> >  			kvm_dirty_ring_push(&vcpu->dirty_ring,
> >  					    slot, rel_gfn);
> >  		else
> > @@ -4485,6 +4490,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> >  		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
> >  #else
> >  		return 0;
> > +#endif
> > +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> >  #endif
> >  	case KVM_CAP_BINARY_STATS_FD:
> >  	case KVM_CAP_SYSTEM_EVENT_DATA:
> > @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
> >  {
> >  	int r;
> >  
> > +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > +	if (!kvm->dirty_ring_with_bitmap)
> > +		return -EINVAL;
> > +#endif
> 
> This one at least is prettier with IS_ENABLED
> 
> 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
> 	    !kvm->dirty_ring_with_bitmap)
> 		return -EINVAL;
> 
> But dirty_ring_with_bitmap really shouldn't need to exist.  It's
> mandatory for architectures that have
> HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for architectures
> that don't.  In other words, the API for enabling the dirty ring is
> a bit ugly.
> 
> Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been
> officially released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP
> on top, what about usurping bits 63:32 of cap->args[0] for flags?
> E.g.
> 
> Ideally we'd use cap->flags directly, but we screwed up with
> KVM_CAP_DIRTY_LOG_RING and didn't require flags to be zero :-(
>
> Actually, what's the point of allowing
> KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be enabled?  I get why KVM would
> enumerate this info, i.e. allowing checking, but I don't seen any
> value in supporting a second method for enabling the dirty ring.
> 
> The acquire-release thing is irrelevant for x86, and no other
> architecture supports the dirty ring until this series, i.e. there's
> no need for KVM to detect that userspace has been updated to gain
> acquire-release semantics, because the fact that userspace is
> enabling the dirty ring on arm64 means userspace has been updated.

Do we really need to make the API more awkward? There is an
established pattern of "enable what is advertised". Some level of
uniformity wouldn't hurt, really.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-21  8:06       ` Marc Zyngier
  0 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-21  8:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Fri, 21 Oct 2022 00:44:51 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Tue, Oct 11, 2022, Gavin Shan wrote:
> > Some architectures (such as arm64) need to dirty memory outside of the
> > context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> > KVM's per-vCPU dirty ring.
> 
> What is the point of using the dirty ring in this case?  KVM still
> burns a pile of memory for the bitmap.  Is the benefit that
> userspace can get away with scanning the bitmap fewer times,
> e.g. scan it once just before blackout under the assumption that
> very few pages will dirty the bitmap?

Apparently, the throttling effect of the ring makes it easier to
converge. Someone who actually uses the feature should be able to
tell you. But that's a policy decision, and I don't see why we should
be prescriptive.

> Why not add a global ring to @kvm?  I assume thread safety is a
> problem, but the memory overhead of the dirty_bitmap also seems like
> a fairly big problem.

Because we already have a stupidly bloated API surface, and that we
could do without yet another one based on a sample of *one*? Because
dirtying memory outside of a vcpu context makes it incredibly awkward
to handle a "ring full" condition?

> 
> > Introduce a new flavor of dirty ring that requires the use of both vCPU
> > dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> > sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> > the dirty bitmap. Userspace should scan the dirty bitmap before
> > migrating the VM to the target.
> > 
> > Use an additional capability to advertize this behavior and require
> > explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> > you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> > not allow userspace to enable dirty ring if it hasn't also enabled the
> > ring && bitmap capability, as a VM is likely DOA without the pages
> > marked in the bitmap.

This is wrong. The *only* case this is useful is when there is an
in-kernel producer of data outside of the context of a vcpu, which is
so far only the ITS save mechanism. No ITS? No need for this.
Userspace knows what it has created the first place, and should be in
charge of it (i.e. I want to be able to migrate my GICv2 and
GICv3-without-ITS VMs with the rings only).

> > 
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> 
> Co-developed-by needs Oliver's SoB.
> 
> >  #ifndef CONFIG_HAVE_KVM_DIRTY_RING
> > +static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
> 
> What about inverting the naming to better capture that this is about the dirty
> bitmap, and less so about the dirty ring?  It's not obvious what "exclusive"
> means, e.g. I saw this stub before reading the changelog and assumed it was
> making a dirty ring exclusive to something.
> 
> Something like this?
> 
> bool kvm_use_dirty_bitmap(struct kvm *kvm)
> {
> 	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
> }
> 
> > @@ -3305,15 +3305,20 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> >  	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> >  
> >  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> > -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> > +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> >  		return;
> > +
> > +#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > +	if (WARN_ON_ONCE(!vcpu))
> 
> To cut down on the #ifdefs, this can be:
> 
> 	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) && !vcpu)
> 
> though that's arguably even harder to read.  Blech.
> 
> > +		return;
> > +#endif
> >  #endif
> >  
> >  	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> >  		unsigned long rel_gfn = gfn - memslot->base_gfn;
> >  		u32 slot = (memslot->as_id << 16) | memslot->id;
> >  
> > -		if (kvm->dirty_ring_size)
> > +		if (vcpu && kvm->dirty_ring_size)
> >  			kvm_dirty_ring_push(&vcpu->dirty_ring,
> >  					    slot, rel_gfn);
> >  		else
> > @@ -4485,6 +4490,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> >  		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
> >  #else
> >  		return 0;
> > +#endif
> > +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> >  #endif
> >  	case KVM_CAP_BINARY_STATS_FD:
> >  	case KVM_CAP_SYSTEM_EVENT_DATA:
> > @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
> >  {
> >  	int r;
> >  
> > +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > +	if (!kvm->dirty_ring_with_bitmap)
> > +		return -EINVAL;
> > +#endif
> 
> This one at least is prettier with IS_ENABLED
> 
> 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
> 	    !kvm->dirty_ring_with_bitmap)
> 		return -EINVAL;
> 
> But dirty_ring_with_bitmap really shouldn't need to exist.  It's
> mandatory for architectures that have
> HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for architectures
> that don't.  In other words, the API for enabling the dirty ring is
> a bit ugly.
> 
> Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been
> officially released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP
> on top, what about usurping bits 63:32 of cap->args[0] for flags?
> E.g.
> 
> Ideally we'd use cap->flags directly, but we screwed up with
> KVM_CAP_DIRTY_LOG_RING and didn't require flags to be zero :-(
>
> Actually, what's the point of allowing
> KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be enabled?  I get why KVM would
> enumerate this info, i.e. allowing checking, but I don't seen any
> value in supporting a second method for enabling the dirty ring.
> 
> The acquire-release thing is irrelevant for x86, and no other
> architecture supports the dirty ring until this series, i.e. there's
> no need for KVM to detect that userspace has been updated to gain
> acquire-release semantics, because the fact that userspace is
> enabling the dirty ring on arm64 means userspace has been updated.

Do we really need to make the API more awkward? There is an
established pattern of "enable what is advertised". Some level of
uniformity wouldn't hurt, really.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-20 23:44     ` Sean Christopherson
@ 2022-10-21 10:13       ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-21 10:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvmarm, kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon,
	shuah, andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, shan.gavin

Hi Sean,

On 10/21/22 7:44 AM, Sean Christopherson wrote:
> On Tue, Oct 11, 2022, Gavin Shan wrote:
>> Some architectures (such as arm64) need to dirty memory outside of the
>> context of a vCPU. Of course, this simply doesn't fit with the UAPI of
>> KVM's per-vCPU dirty ring.
> 
> What is the point of using the dirty ring in this case?  KVM still burns a pile
> of memory for the bitmap.  Is the benefit that userspace can get away with
> scanning the bitmap fewer times, e.g. scan it once just before blackout under
> the assumption that very few pages will dirty the bitmap?
> 
> Why not add a global ring to @kvm?  I assume thread safety is a problem, but the
> memory overhead of the dirty_bitmap also seems like a fairly big problem.
> 

Most of the dirty pages are tracked by the per-vcpu-ring in this particular
case. It means the minority of the dirty pages are tracked by the bitmap.
The trade-off is the coexistence of dirty-ring and bitmap. The advantage of
ring is the discrete property, comparing to bitmap. With dirty ring, userspace
(QEMU) needs to copy in-kernel bitmap. In low-dirty-speed scenario, it's
efficient.

We ever discussed bitmap and per-vm-ring [1]. per-vm-ring is just too
complicated. However, bitmap uses extra memory to track dirty pages.
The bitmap will be only used in two cases (migration and quiescent system).
So the bitmap will be retrieved for limited times and time used to parse
it is limited.

[1] https://lore.kernel.org/kvmarm/320005d1-fe88-fd6a-be91-ddb56f1aa80f@redhat.com/

>> Introduce a new flavor of dirty ring that requires the use of both vCPU
>> dirty rings and a dirty bitmap. The expectation is that for non-vCPU
>> sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
>> the dirty bitmap. Userspace should scan the dirty bitmap before
>> migrating the VM to the target.
>>
>> Use an additional capability to advertize this behavior and require
>> explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
>> you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
>> not allow userspace to enable dirty ring if it hasn't also enabled the
>> ring && bitmap capability, as a VM is likely DOA without the pages
>> marked in the bitmap.
>>
>> Suggested-by: Marc Zyngier <maz@kernel.org>
>> Suggested-by: Peter Xu <peterx@redhat.com>
>> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> 
> Co-developed-by needs Oliver's SoB.
> 

Sure.

>>   #ifndef CONFIG_HAVE_KVM_DIRTY_RING
>> +static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
> 
> What about inverting the naming to better capture that this is about the dirty
> bitmap, and less so about the dirty ring?  It's not obvious what "exclusive"
> means, e.g. I saw this stub before reading the changelog and assumed it was
> making a dirty ring exclusive to something.
> 
> Something like this?
> 
> bool kvm_use_dirty_bitmap(struct kvm *kvm)
> {
> 	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
> }
> 

If you agree, I would rename is to kvm_dirty_ring_use_bitmap(). In this way,
we will have "kvm_dirty_ring" prefix for the function name, consistent with
other functions from same module.

>> @@ -3305,15 +3305,20 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>   	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>   
>>   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>> -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>> +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>>   		return;
>> +
>> +#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	if (WARN_ON_ONCE(!vcpu))
> 
> To cut down on the #ifdefs, this can be:
> 
> 	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) && !vcpu)
> 
> though that's arguably even harder to read.  Blech.
> 

Your suggestion counts :)

>> +		return;
>> +#endif
>>   #endif
>>   
>>   	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
>>   		unsigned long rel_gfn = gfn - memslot->base_gfn;
>>   		u32 slot = (memslot->as_id << 16) | memslot->id;
>>   
>> -		if (kvm->dirty_ring_size)
>> +		if (vcpu && kvm->dirty_ring_size)
>>   			kvm_dirty_ring_push(&vcpu->dirty_ring,
>>   					    slot, rel_gfn);
>>   		else
>> @@ -4485,6 +4490,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>>   		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
>>   #else
>>   		return 0;
>> +#endif
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>>   #endif
>>   	case KVM_CAP_BINARY_STATS_FD:
>>   	case KVM_CAP_SYSTEM_EVENT_DATA:
>> @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
>>   {
>>   	int r;
>>   
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	if (!kvm->dirty_ring_with_bitmap)
>> +		return -EINVAL;
>> +#endif
> 
> This one at least is prettier with IS_ENABLED
> 
> 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
> 	    !kvm->dirty_ring_with_bitmap)
> 		return -EINVAL;
> 
> But dirty_ring_with_bitmap really shouldn't need to exist.  It's mandatory for
> architectures that have HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for
> architectures that don't.  In other words, the API for enabling the dirty ring
> is a bit ugly.
> 
> Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been officially
> released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP on top, what about
> usurping bits 63:32 of cap->args[0] for flags?  E.g.
> 
> Ideally we'd use cap->flags directly, but we screwed up with KVM_CAP_DIRTY_LOG_RING
> and didn't require flags to be zero :-(
> 
> Actually, what's the point of allowing KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be
> enabled?  I get why KVM would enumerate this info, i.e. allowing checking, but I
> don't seen any value in supporting a second method for enabling the dirty ring.
> 
> The acquire-release thing is irrelevant for x86, and no other architecture
> supports the dirty ring until this series, i.e. there's no need for KVM to detect
> that userspace has been updated to gain acquire-release semantics, because the
> fact that userspace is enabling the dirty ring on arm64 means userspace has been
> updated.
> 
> Same goes for the "with bitmap" capability.  There are no existing arm64 users,
> so there's no risk of breaking existing userspace by suddenly shoving stuff into
> the dirty bitmap.
> 
> KVM doesn't even get the enabling checks right, e.g. KVM_CAP_DIRTY_LOG_RING can be
> enabled on architectures that select CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL but not
> KVM_CAP_DIRTY_LOG_RING.  The reverse is true (ignoring that x86 selects both and
> is the only arch that selects the TSO variant).
> 
> Ditto for KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP...

If I didn't miss anything in the previous discussions, we don't want to make
KVM_CAP_DIRTY_LOG_RING_ACQ_REL and KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP architecture
dependent. If they become architecture dependent, the userspace will have different
stubs (x86, arm64, other architectures to support dirty-ring in future) to enable
those capabilities. It's not friendly to userspace. So I intend to prefer the existing
pattern: advertise, enable. To enable a capability without knowing if it's supported
sounds a bit weird to me.

I think it's a good idea to enable KVM_CAP_DIRTY_LOG_RING_{ACQ_REL, WITH_BITMAP} as
flags, instead of standalone capabilities. In this way, those two capabilities can
be treated as sub-capability of KVM_CAP_DIRTY_LOG_RING. The question is how these
two flags can be exposed by kvm_vm_ioctl_check_extension_generic(), if we really
want to expose those two flags.

I don't understand your question on how KVM has wrong checks when KVM_CAP_DIRTY_LOG_RING
and KVM_CAP_DIRTY_LOG_RING_ACQ_REL are enabled.


>> +
>>   	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
>>   		return -EINVAL;
>>   
>> @@ -4588,6 +4601,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>   	case KVM_CAP_DIRTY_LOG_RING:
>>   	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
>>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> 
> ... as this should return -EINVAL if CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP=n.
> 

Yes.

> So rather than add a rather useless flag and increase KVM's API surface, why not
> make the capabilities informational-only?
> 

Please refer to the reply above. I still intend to prefer the pattern of advertising
and enabling. If architecture dependent stubs to enable those capabilities isn't a
concern to us. I think we can make those capabilities information-only, but I guess
Oliver might have different ideas?

> ---
>   include/linux/kvm_dirty_ring.h |  6 +++---
>   include/linux/kvm_host.h       |  1 -
>   virt/kvm/dirty_ring.c          |  5 +++--
>   virt/kvm/kvm_main.c            | 20 ++++----------------
>   4 files changed, 10 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
> index 23b2b466aa0f..f49db42bc26a 100644
> --- a/include/linux/kvm_dirty_ring.h
> +++ b/include/linux/kvm_dirty_ring.h
> @@ -28,9 +28,9 @@ struct kvm_dirty_ring {
>   };
>   
>   #ifndef CONFIG_HAVE_KVM_DIRTY_RING
> -static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
> +static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
>   {
> -	return false;
> +	return true;
>   }
>   
>   /*
> @@ -71,7 +71,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
>   
>   #else /* CONFIG_HAVE_KVM_DIRTY_RING */
>   
> -bool kvm_dirty_ring_exclusive(struct kvm *kvm);
> +bool kvm_use_dirty_bitmap(struct kvm *kvm);
>   int kvm_cpu_dirty_log_size(void);
>   u32 kvm_dirty_ring_get_rsvd_entries(void);
>   int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d06fbf3e5e95..eb7b1310146d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -779,7 +779,6 @@ struct kvm {
>   	pid_t userspace_pid;
>   	unsigned int max_halt_poll_ns;
>   	u32 dirty_ring_size;
> -	bool dirty_ring_with_bitmap;
>   	bool vm_bugged;
>   	bool vm_dead;
>   
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 9cc60af291ef..53802513de79 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -11,9 +11,10 @@
>   #include <trace/events/kvm.h>
>   #include "kvm_mm.h"
>   
> -bool kvm_dirty_ring_exclusive(struct kvm *kvm)
> +bool kvm_use_dirty_bitmap(struct kvm *kvm)
>   {
> -	return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
> +	return IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> +	       !kvm->dirty_ring_size;
>   }
>   
>   int __weak kvm_cpu_dirty_log_size(void)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index dd52b8e42307..0e8aaac5a222 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
>   			new->dirty_bitmap = NULL;
>   		else if (old && old->dirty_bitmap)
>   			new->dirty_bitmap = old->dirty_bitmap;
> -		else if (!kvm_dirty_ring_exclusive(kvm)) {
> +		else if (kvm_use_dirty_bitmap(kvm)) {
>   			r = kvm_alloc_dirty_bitmap(new);
>   			if (r)
>   				return r;
> @@ -2060,8 +2060,7 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>   	unsigned long n;
>   	unsigned long any = 0;
>   
> -	/* Dirty ring tracking may be exclusive to dirty log tracking */
> -	if (kvm_dirty_ring_exclusive(kvm))
> +	if (!kvm_use_dirty_bitmap(kvm))
>   		return -ENXIO;
>   
>   	*memslot = NULL;
> @@ -2125,8 +2124,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>   	unsigned long *dirty_bitmap_buffer;
>   	bool flush;
>   
> -	/* Dirty ring tracking may be exclusive to dirty log tracking */
> -	if (kvm_dirty_ring_exclusive(kvm))
> +	if (!kvm_use_dirty_bitmap(kvm))
>   		return -ENXIO;
>   
>   	as_id = log->slot >> 16;
> @@ -2237,8 +2235,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>   	unsigned long *dirty_bitmap_buffer;
>   	bool flush;
>   
> -	/* Dirty ring tracking may be exclusive to dirty log tracking */
> -	if (kvm_dirty_ring_exclusive(kvm))
> +	if (!kvm_use_dirty_bitmap(kvm))
>   		return -ENXIO;
>   
>   	as_id = log->slot >> 16;
> @@ -4505,11 +4502,6 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
>   {
>   	int r;
>   
> -#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> -	if (!kvm->dirty_ring_with_bitmap)
> -		return -EINVAL;
> -#endif
> -
>   	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
>   		return -EINVAL;
>   
> @@ -4597,11 +4589,7 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>   		return 0;
>   	}
>   	case KVM_CAP_DIRTY_LOG_RING:
> -	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> -	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> -		kvm->dirty_ring_with_bitmap = true;
> -		return 0;
>   	default:
>   		return kvm_vm_ioctl_enable_cap(kvm, cap);
>   	}
> 
> base-commit: 4826e54f82ded9f54782f8e9d6bc36c7bae06c1f
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-21 10:13       ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-21 10:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

Hi Sean,

On 10/21/22 7:44 AM, Sean Christopherson wrote:
> On Tue, Oct 11, 2022, Gavin Shan wrote:
>> Some architectures (such as arm64) need to dirty memory outside of the
>> context of a vCPU. Of course, this simply doesn't fit with the UAPI of
>> KVM's per-vCPU dirty ring.
> 
> What is the point of using the dirty ring in this case?  KVM still burns a pile
> of memory for the bitmap.  Is the benefit that userspace can get away with
> scanning the bitmap fewer times, e.g. scan it once just before blackout under
> the assumption that very few pages will dirty the bitmap?
> 
> Why not add a global ring to @kvm?  I assume thread safety is a problem, but the
> memory overhead of the dirty_bitmap also seems like a fairly big problem.
> 

Most of the dirty pages are tracked by the per-vcpu-ring in this particular
case. It means the minority of the dirty pages are tracked by the bitmap.
The trade-off is the coexistence of dirty-ring and bitmap. The advantage of
ring is the discrete property, comparing to bitmap. With dirty ring, userspace
(QEMU) needs to copy in-kernel bitmap. In low-dirty-speed scenario, it's
efficient.

We ever discussed bitmap and per-vm-ring [1]. per-vm-ring is just too
complicated. However, bitmap uses extra memory to track dirty pages.
The bitmap will be only used in two cases (migration and quiescent system).
So the bitmap will be retrieved for limited times and time used to parse
it is limited.

[1] https://lore.kernel.org/kvmarm/320005d1-fe88-fd6a-be91-ddb56f1aa80f@redhat.com/

>> Introduce a new flavor of dirty ring that requires the use of both vCPU
>> dirty rings and a dirty bitmap. The expectation is that for non-vCPU
>> sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
>> the dirty bitmap. Userspace should scan the dirty bitmap before
>> migrating the VM to the target.
>>
>> Use an additional capability to advertize this behavior and require
>> explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
>> you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
>> not allow userspace to enable dirty ring if it hasn't also enabled the
>> ring && bitmap capability, as a VM is likely DOA without the pages
>> marked in the bitmap.
>>
>> Suggested-by: Marc Zyngier <maz@kernel.org>
>> Suggested-by: Peter Xu <peterx@redhat.com>
>> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> 
> Co-developed-by needs Oliver's SoB.
> 

Sure.

>>   #ifndef CONFIG_HAVE_KVM_DIRTY_RING
>> +static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
> 
> What about inverting the naming to better capture that this is about the dirty
> bitmap, and less so about the dirty ring?  It's not obvious what "exclusive"
> means, e.g. I saw this stub before reading the changelog and assumed it was
> making a dirty ring exclusive to something.
> 
> Something like this?
> 
> bool kvm_use_dirty_bitmap(struct kvm *kvm)
> {
> 	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
> }
> 

If you agree, I would rename is to kvm_dirty_ring_use_bitmap(). In this way,
we will have "kvm_dirty_ring" prefix for the function name, consistent with
other functions from same module.

>> @@ -3305,15 +3305,20 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>   	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>   
>>   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>> -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>> +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>>   		return;
>> +
>> +#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	if (WARN_ON_ONCE(!vcpu))
> 
> To cut down on the #ifdefs, this can be:
> 
> 	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) && !vcpu)
> 
> though that's arguably even harder to read.  Blech.
> 

Your suggestion counts :)

>> +		return;
>> +#endif
>>   #endif
>>   
>>   	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
>>   		unsigned long rel_gfn = gfn - memslot->base_gfn;
>>   		u32 slot = (memslot->as_id << 16) | memslot->id;
>>   
>> -		if (kvm->dirty_ring_size)
>> +		if (vcpu && kvm->dirty_ring_size)
>>   			kvm_dirty_ring_push(&vcpu->dirty_ring,
>>   					    slot, rel_gfn);
>>   		else
>> @@ -4485,6 +4490,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>>   		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
>>   #else
>>   		return 0;
>> +#endif
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>>   #endif
>>   	case KVM_CAP_BINARY_STATS_FD:
>>   	case KVM_CAP_SYSTEM_EVENT_DATA:
>> @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
>>   {
>>   	int r;
>>   
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	if (!kvm->dirty_ring_with_bitmap)
>> +		return -EINVAL;
>> +#endif
> 
> This one at least is prettier with IS_ENABLED
> 
> 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
> 	    !kvm->dirty_ring_with_bitmap)
> 		return -EINVAL;
> 
> But dirty_ring_with_bitmap really shouldn't need to exist.  It's mandatory for
> architectures that have HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for
> architectures that don't.  In other words, the API for enabling the dirty ring
> is a bit ugly.
> 
> Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been officially
> released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP on top, what about
> usurping bits 63:32 of cap->args[0] for flags?  E.g.
> 
> Ideally we'd use cap->flags directly, but we screwed up with KVM_CAP_DIRTY_LOG_RING
> and didn't require flags to be zero :-(
> 
> Actually, what's the point of allowing KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be
> enabled?  I get why KVM would enumerate this info, i.e. allowing checking, but I
> don't seen any value in supporting a second method for enabling the dirty ring.
> 
> The acquire-release thing is irrelevant for x86, and no other architecture
> supports the dirty ring until this series, i.e. there's no need for KVM to detect
> that userspace has been updated to gain acquire-release semantics, because the
> fact that userspace is enabling the dirty ring on arm64 means userspace has been
> updated.
> 
> Same goes for the "with bitmap" capability.  There are no existing arm64 users,
> so there's no risk of breaking existing userspace by suddenly shoving stuff into
> the dirty bitmap.
> 
> KVM doesn't even get the enabling checks right, e.g. KVM_CAP_DIRTY_LOG_RING can be
> enabled on architectures that select CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL but not
> KVM_CAP_DIRTY_LOG_RING.  The reverse is true (ignoring that x86 selects both and
> is the only arch that selects the TSO variant).
> 
> Ditto for KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP...

If I didn't miss anything in the previous discussions, we don't want to make
KVM_CAP_DIRTY_LOG_RING_ACQ_REL and KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP architecture
dependent. If they become architecture dependent, the userspace will have different
stubs (x86, arm64, other architectures to support dirty-ring in future) to enable
those capabilities. It's not friendly to userspace. So I intend to prefer the existing
pattern: advertise, enable. To enable a capability without knowing if it's supported
sounds a bit weird to me.

I think it's a good idea to enable KVM_CAP_DIRTY_LOG_RING_{ACQ_REL, WITH_BITMAP} as
flags, instead of standalone capabilities. In this way, those two capabilities can
be treated as sub-capability of KVM_CAP_DIRTY_LOG_RING. The question is how these
two flags can be exposed by kvm_vm_ioctl_check_extension_generic(), if we really
want to expose those two flags.

I don't understand your question on how KVM has wrong checks when KVM_CAP_DIRTY_LOG_RING
and KVM_CAP_DIRTY_LOG_RING_ACQ_REL are enabled.


>> +
>>   	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
>>   		return -EINVAL;
>>   
>> @@ -4588,6 +4601,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>   	case KVM_CAP_DIRTY_LOG_RING:
>>   	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
>>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> 
> ... as this should return -EINVAL if CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP=n.
> 

Yes.

> So rather than add a rather useless flag and increase KVM's API surface, why not
> make the capabilities informational-only?
> 

Please refer to the reply above. I still intend to prefer the pattern of advertising
and enabling. If architecture dependent stubs to enable those capabilities isn't a
concern to us. I think we can make those capabilities information-only, but I guess
Oliver might have different ideas?

> ---
>   include/linux/kvm_dirty_ring.h |  6 +++---
>   include/linux/kvm_host.h       |  1 -
>   virt/kvm/dirty_ring.c          |  5 +++--
>   virt/kvm/kvm_main.c            | 20 ++++----------------
>   4 files changed, 10 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
> index 23b2b466aa0f..f49db42bc26a 100644
> --- a/include/linux/kvm_dirty_ring.h
> +++ b/include/linux/kvm_dirty_ring.h
> @@ -28,9 +28,9 @@ struct kvm_dirty_ring {
>   };
>   
>   #ifndef CONFIG_HAVE_KVM_DIRTY_RING
> -static inline bool kvm_dirty_ring_exclusive(struct kvm *kvm)
> +static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
>   {
> -	return false;
> +	return true;
>   }
>   
>   /*
> @@ -71,7 +71,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
>   
>   #else /* CONFIG_HAVE_KVM_DIRTY_RING */
>   
> -bool kvm_dirty_ring_exclusive(struct kvm *kvm);
> +bool kvm_use_dirty_bitmap(struct kvm *kvm);
>   int kvm_cpu_dirty_log_size(void);
>   u32 kvm_dirty_ring_get_rsvd_entries(void);
>   int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d06fbf3e5e95..eb7b1310146d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -779,7 +779,6 @@ struct kvm {
>   	pid_t userspace_pid;
>   	unsigned int max_halt_poll_ns;
>   	u32 dirty_ring_size;
> -	bool dirty_ring_with_bitmap;
>   	bool vm_bugged;
>   	bool vm_dead;
>   
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 9cc60af291ef..53802513de79 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -11,9 +11,10 @@
>   #include <trace/events/kvm.h>
>   #include "kvm_mm.h"
>   
> -bool kvm_dirty_ring_exclusive(struct kvm *kvm)
> +bool kvm_use_dirty_bitmap(struct kvm *kvm)
>   {
> -	return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
> +	return IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> +	       !kvm->dirty_ring_size;
>   }
>   
>   int __weak kvm_cpu_dirty_log_size(void)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index dd52b8e42307..0e8aaac5a222 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
>   			new->dirty_bitmap = NULL;
>   		else if (old && old->dirty_bitmap)
>   			new->dirty_bitmap = old->dirty_bitmap;
> -		else if (!kvm_dirty_ring_exclusive(kvm)) {
> +		else if (kvm_use_dirty_bitmap(kvm)) {
>   			r = kvm_alloc_dirty_bitmap(new);
>   			if (r)
>   				return r;
> @@ -2060,8 +2060,7 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>   	unsigned long n;
>   	unsigned long any = 0;
>   
> -	/* Dirty ring tracking may be exclusive to dirty log tracking */
> -	if (kvm_dirty_ring_exclusive(kvm))
> +	if (!kvm_use_dirty_bitmap(kvm))
>   		return -ENXIO;
>   
>   	*memslot = NULL;
> @@ -2125,8 +2124,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>   	unsigned long *dirty_bitmap_buffer;
>   	bool flush;
>   
> -	/* Dirty ring tracking may be exclusive to dirty log tracking */
> -	if (kvm_dirty_ring_exclusive(kvm))
> +	if (!kvm_use_dirty_bitmap(kvm))
>   		return -ENXIO;
>   
>   	as_id = log->slot >> 16;
> @@ -2237,8 +2235,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>   	unsigned long *dirty_bitmap_buffer;
>   	bool flush;
>   
> -	/* Dirty ring tracking may be exclusive to dirty log tracking */
> -	if (kvm_dirty_ring_exclusive(kvm))
> +	if (!kvm_use_dirty_bitmap(kvm))
>   		return -ENXIO;
>   
>   	as_id = log->slot >> 16;
> @@ -4505,11 +4502,6 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
>   {
>   	int r;
>   
> -#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> -	if (!kvm->dirty_ring_with_bitmap)
> -		return -EINVAL;
> -#endif
> -
>   	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
>   		return -EINVAL;
>   
> @@ -4597,11 +4589,7 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>   		return 0;
>   	}
>   	case KVM_CAP_DIRTY_LOG_RING:
> -	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> -	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> -		kvm->dirty_ring_with_bitmap = true;
> -		return 0;
>   	default:
>   		return kvm_vm_ioctl_enable_cap(kvm, cap);
>   	}
> 
> base-commit: 4826e54f82ded9f54782f8e9d6bc36c7bae06c1f
> 

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  2022-10-21  5:54       ` Gavin Shan
@ 2022-10-21 15:25         ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-21 15:25 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon,
	shuah, andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, shan.gavin

On Fri, Oct 21, 2022, Gavin Shan wrote:
> I think Marc want to make the check more generalized with a new event [1].

Generalized code can be achieved with a helper though.  The motivation is indeed
to avoid overhead on every run:

  : A seemingly approach would be to make this a request on dirty log
  : insertion, and avoid the whole "check the log size" on every run,
  : which adds pointless overhead to unsuspecting users (aka everyone).


https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org

> > I'm pretty sure the check can be moved to the very end of the request checks,
> > e.g. to avoid an aborted VM-Enter attempt if one of the other request triggers
> > KVM_REQ_RING_SOFT_FULL.
> > 
> > Heh, this might actually be a bug fix of sorts.  If anything pushes to the ring
> > after the check at the start of vcpu_enter_guest(), then without the request, KVM
> > would enter the guest while at or above the soft limit, e.g. record_steal_time()
> > can dirty a page, and the big pile of stuff that's behind KVM_REQ_EVENT can
> > certainly dirty pages.
> > 
> 
> When dirty ring becomes full, the VCPU can't handle any operations, which will
> bring more dirty pages.

Right, but there's a buffer of 64 entries on top of what the CPU can buffer (VMX's
PML can buffer 512 entries).  Hence the "soft full".  If x86 is already on the
edge of exhausting that buffer, i.e. can fill 64 entries while handling requests,
than we need to increase the buffer provided by the soft limit because sooner or
later KVM will be able to fill 65 entries, at which point errors will occur
regardless of when the "soft full" request is processed.

In other words, we can take advantage of the fact that the soft-limit buffer needs
to be quite conservative.

> > Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
> > about the overhead of having to re-check the request, the goal would be to help
> > document what causes the request to go away.
> > 
> > E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
> > 
> > 	if (!kvm_dirty_ring_soft_full(ring))
> > 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> > 
> 
> It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
> @vcpu can be achieved by container_of(..., ring).

Using container_of() is silly, there's literally one caller that does:

	kvm_for_each_vcpu(i, vcpu, kvm)
		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
@ 2022-10-21 15:25         ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-21 15:25 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Fri, Oct 21, 2022, Gavin Shan wrote:
> I think Marc want to make the check more generalized with a new event [1].

Generalized code can be achieved with a helper though.  The motivation is indeed
to avoid overhead on every run:

  : A seemingly approach would be to make this a request on dirty log
  : insertion, and avoid the whole "check the log size" on every run,
  : which adds pointless overhead to unsuspecting users (aka everyone).


https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org

> > I'm pretty sure the check can be moved to the very end of the request checks,
> > e.g. to avoid an aborted VM-Enter attempt if one of the other request triggers
> > KVM_REQ_RING_SOFT_FULL.
> > 
> > Heh, this might actually be a bug fix of sorts.  If anything pushes to the ring
> > after the check at the start of vcpu_enter_guest(), then without the request, KVM
> > would enter the guest while at or above the soft limit, e.g. record_steal_time()
> > can dirty a page, and the big pile of stuff that's behind KVM_REQ_EVENT can
> > certainly dirty pages.
> > 
> 
> When dirty ring becomes full, the VCPU can't handle any operations, which will
> bring more dirty pages.

Right, but there's a buffer of 64 entries on top of what the CPU can buffer (VMX's
PML can buffer 512 entries).  Hence the "soft full".  If x86 is already on the
edge of exhausting that buffer, i.e. can fill 64 entries while handling requests,
than we need to increase the buffer provided by the soft limit because sooner or
later KVM will be able to fill 65 entries, at which point errors will occur
regardless of when the "soft full" request is processed.

In other words, we can take advantage of the fact that the soft-limit buffer needs
to be quite conservative.

> > Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
> > about the overhead of having to re-check the request, the goal would be to help
> > document what causes the request to go away.
> > 
> > E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
> > 
> > 	if (!kvm_dirty_ring_soft_full(ring))
> > 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> > 
> 
> It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
> @vcpu can be achieved by container_of(..., ring).

Using container_of() is silly, there's literally one caller that does:

	kvm_for_each_vcpu(i, vcpu, kvm)
		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-21  8:06       ` Marc Zyngier
@ 2022-10-21 16:05         ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-21 16:05 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, peterx, will, catalin.marinas,
	bgardon, shuah, andrew.jones, dmatlack, pbonzini, zhenyzha,
	james.morse, suzuki.poulose, alexandru.elisei, oliver.upton,
	shan.gavin

On Fri, Oct 21, 2022, Marc Zyngier wrote:
> On Fri, 21 Oct 2022 00:44:51 +0100,
> Sean Christopherson <seanjc@google.com> wrote:
> > 
> > On Tue, Oct 11, 2022, Gavin Shan wrote:
> > > Some architectures (such as arm64) need to dirty memory outside of the
> > > context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> > > KVM's per-vCPU dirty ring.
> > 
> > What is the point of using the dirty ring in this case?  KVM still
> > burns a pile of memory for the bitmap.  Is the benefit that
> > userspace can get away with scanning the bitmap fewer times,
> > e.g. scan it once just before blackout under the assumption that
> > very few pages will dirty the bitmap?
> 
> Apparently, the throttling effect of the ring makes it easier to
> converge. Someone who actually uses the feature should be able to
> tell you. But that's a policy decision, and I don't see why we should
> be prescriptive.

I wasn't suggesting we be prescriptive, it was an honest question.

> > Why not add a global ring to @kvm?  I assume thread safety is a
> > problem, but the memory overhead of the dirty_bitmap also seems like
> > a fairly big problem.
> 
> Because we already have a stupidly bloated API surface, and that we
> could do without yet another one based on a sample of *one*?

But we're adding a new API regardless.  A per-VM ring would definitely be a bigger
addition, but if using the dirty_bitmap won't actually meet the needs of userspace,
then we'll have added a new API and still not have solved the problem.  That's why
I was asking why/when userspace would want to use dirty_ring+dirty_bitmap.

> Because dirtying memory outside of a vcpu context makes it incredibly awkward
> to handle a "ring full" condition?

Kicking all vCPUs with the soft-full request isn't _that_ awkward.  It's certainly
sub-optimal, but if inserting into the per-VM ring is relatively rare, then in
practice it's unlikely to impact guest performance.

> > > Introduce a new flavor of dirty ring that requires the use of both vCPU
> > > dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> > > sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> > > the dirty bitmap. Userspace should scan the dirty bitmap before
> > > migrating the VM to the target.
> > > 
> > > Use an additional capability to advertize this behavior and require
> > > explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> > > you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> > > not allow userspace to enable dirty ring if it hasn't also enabled the
> > > ring && bitmap capability, as a VM is likely DOA without the pages
> > > marked in the bitmap.
> 
> This is wrong. The *only* case this is useful is when there is an
> in-kernel producer of data outside of the context of a vcpu, which is
> so far only the ITS save mechanism. No ITS? No need for this.

How large is the ITS?  If it's a fixed, small size, could we treat the ITS as a
one-off case for now?  E.g. do something gross like shove entries into vcpu0's
dirty ring?

> Userspace knows what it has created the first place, and should be in
> charge of it (i.e. I want to be able to migrate my GICv2 and
> GICv3-without-ITS VMs with the rings only).

Ah, so enabling the dirty bitmap isn't strictly required.  That means this patch
is wrong, and it also means that we need to figure out how we want to handle the
case where mark_page_dirty_in_slot() is invoked without a running vCPU on a memslot
without a dirty_bitmap.

I.e. what's an appropriate action in the below sequence:

void mark_page_dirty_in_slot(struct kvm *kvm,
			     const struct kvm_memory_slot *memslot,
		 	     gfn_t gfn)
{
	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();

#ifdef CONFIG_HAVE_KVM_DIRTY_RING
	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
		return;

#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
	if (WARN_ON_ONCE(!vcpu))
		return;
#endif
#endif

	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
		unsigned long rel_gfn = gfn - memslot->base_gfn;
		u32 slot = (memslot->as_id << 16) | memslot->id;

		if (vcpu && kvm->dirty_ring_size)
			kvm_dirty_ring_push(&vcpu->dirty_ring,
					    slot, rel_gfn);
		else if (memslot->dirty_bitmap)
			set_bit_le(rel_gfn, memslot->dirty_bitmap);
		else
			???? <=================================================
	}
}


Would it be possible to require a dirty bitmap when an ITS is created?  That would
allow treating the above condition as a KVM bug.

> > > @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
> > >  {
> > >  	int r;
> > >  
> > > +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > > +	if (!kvm->dirty_ring_with_bitmap)
> > > +		return -EINVAL;
> > > +#endif
> > 
> > This one at least is prettier with IS_ENABLED
> > 
> > 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
> > 	    !kvm->dirty_ring_with_bitmap)
> > 		return -EINVAL;
> > 
> > But dirty_ring_with_bitmap really shouldn't need to exist.  It's
> > mandatory for architectures that have
> > HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for architectures
> > that don't.  In other words, the API for enabling the dirty ring is
> > a bit ugly.
> > 
> > Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been
> > officially released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP
> > on top, what about usurping bits 63:32 of cap->args[0] for flags?
> > E.g.

For posterity, filling in my missing idea...

Since the size is restricted to be well below a 32-bit value, and it's unlikely
that KVM will ever support 4GiB per-vCPU rings, we could usurp the upper bits for
flags:

  static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u64 arg0)
  {
	u32 flags = arg0 >> 32;
	u32 size = arg0;

However, since it sounds like enabling dirty_bitmap isn't strictly required, I
have no objection to enabling KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, my objection
was purely that KVM was adding a per-VM flag just to sanity check the configuration.

> > Ideally we'd use cap->flags directly, but we screwed up with
> > KVM_CAP_DIRTY_LOG_RING and didn't require flags to be zero :-(
> >
> > Actually, what's the point of allowing
> > KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be enabled?  I get why KVM would
> > enumerate this info, i.e. allowing checking, but I don't seen any
> > value in supporting a second method for enabling the dirty ring.
> > 
> > The acquire-release thing is irrelevant for x86, and no other
> > architecture supports the dirty ring until this series, i.e. there's
> > no need for KVM to detect that userspace has been updated to gain
> > acquire-release semantics, because the fact that userspace is
> > enabling the dirty ring on arm64 means userspace has been updated.
> 
> Do we really need to make the API more awkward? There is an
> established pattern of "enable what is advertised". Some level of
> uniformity wouldn't hurt, really.

I agree that uniformity would be nice, but for capabilities I don't think that's
ever going to happen.  I'm pretty sure supporting enabling is actually in the
minority.  E.g. of the 20 capabilities handled in kvm_vm_ioctl_check_extension_generic(),
I believe only 3 support enabling (KVM_CAP_HALT_POLL, KVM_CAP_DIRTY_LOG_RING, and
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2).

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-21 16:05         ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-21 16:05 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Fri, Oct 21, 2022, Marc Zyngier wrote:
> On Fri, 21 Oct 2022 00:44:51 +0100,
> Sean Christopherson <seanjc@google.com> wrote:
> > 
> > On Tue, Oct 11, 2022, Gavin Shan wrote:
> > > Some architectures (such as arm64) need to dirty memory outside of the
> > > context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> > > KVM's per-vCPU dirty ring.
> > 
> > What is the point of using the dirty ring in this case?  KVM still
> > burns a pile of memory for the bitmap.  Is the benefit that
> > userspace can get away with scanning the bitmap fewer times,
> > e.g. scan it once just before blackout under the assumption that
> > very few pages will dirty the bitmap?
> 
> Apparently, the throttling effect of the ring makes it easier to
> converge. Someone who actually uses the feature should be able to
> tell you. But that's a policy decision, and I don't see why we should
> be prescriptive.

I wasn't suggesting we be prescriptive, it was an honest question.

> > Why not add a global ring to @kvm?  I assume thread safety is a
> > problem, but the memory overhead of the dirty_bitmap also seems like
> > a fairly big problem.
> 
> Because we already have a stupidly bloated API surface, and that we
> could do without yet another one based on a sample of *one*?

But we're adding a new API regardless.  A per-VM ring would definitely be a bigger
addition, but if using the dirty_bitmap won't actually meet the needs of userspace,
then we'll have added a new API and still not have solved the problem.  That's why
I was asking why/when userspace would want to use dirty_ring+dirty_bitmap.

> Because dirtying memory outside of a vcpu context makes it incredibly awkward
> to handle a "ring full" condition?

Kicking all vCPUs with the soft-full request isn't _that_ awkward.  It's certainly
sub-optimal, but if inserting into the per-VM ring is relatively rare, then in
practice it's unlikely to impact guest performance.

> > > Introduce a new flavor of dirty ring that requires the use of both vCPU
> > > dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> > > sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> > > the dirty bitmap. Userspace should scan the dirty bitmap before
> > > migrating the VM to the target.
> > > 
> > > Use an additional capability to advertize this behavior and require
> > > explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> > > you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> > > not allow userspace to enable dirty ring if it hasn't also enabled the
> > > ring && bitmap capability, as a VM is likely DOA without the pages
> > > marked in the bitmap.
> 
> This is wrong. The *only* case this is useful is when there is an
> in-kernel producer of data outside of the context of a vcpu, which is
> so far only the ITS save mechanism. No ITS? No need for this.

How large is the ITS?  If it's a fixed, small size, could we treat the ITS as a
one-off case for now?  E.g. do something gross like shove entries into vcpu0's
dirty ring?

> Userspace knows what it has created the first place, and should be in
> charge of it (i.e. I want to be able to migrate my GICv2 and
> GICv3-without-ITS VMs with the rings only).

Ah, so enabling the dirty bitmap isn't strictly required.  That means this patch
is wrong, and it also means that we need to figure out how we want to handle the
case where mark_page_dirty_in_slot() is invoked without a running vCPU on a memslot
without a dirty_bitmap.

I.e. what's an appropriate action in the below sequence:

void mark_page_dirty_in_slot(struct kvm *kvm,
			     const struct kvm_memory_slot *memslot,
		 	     gfn_t gfn)
{
	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();

#ifdef CONFIG_HAVE_KVM_DIRTY_RING
	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
		return;

#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
	if (WARN_ON_ONCE(!vcpu))
		return;
#endif
#endif

	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
		unsigned long rel_gfn = gfn - memslot->base_gfn;
		u32 slot = (memslot->as_id << 16) | memslot->id;

		if (vcpu && kvm->dirty_ring_size)
			kvm_dirty_ring_push(&vcpu->dirty_ring,
					    slot, rel_gfn);
		else if (memslot->dirty_bitmap)
			set_bit_le(rel_gfn, memslot->dirty_bitmap);
		else
			???? <=================================================
	}
}


Would it be possible to require a dirty bitmap when an ITS is created?  That would
allow treating the above condition as a KVM bug.

> > > @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
> > >  {
> > >  	int r;
> > >  
> > > +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > > +	if (!kvm->dirty_ring_with_bitmap)
> > > +		return -EINVAL;
> > > +#endif
> > 
> > This one at least is prettier with IS_ENABLED
> > 
> > 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
> > 	    !kvm->dirty_ring_with_bitmap)
> > 		return -EINVAL;
> > 
> > But dirty_ring_with_bitmap really shouldn't need to exist.  It's
> > mandatory for architectures that have
> > HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for architectures
> > that don't.  In other words, the API for enabling the dirty ring is
> > a bit ugly.
> > 
> > Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been
> > officially released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP
> > on top, what about usurping bits 63:32 of cap->args[0] for flags?
> > E.g.

For posterity, filling in my missing idea...

Since the size is restricted to be well below a 32-bit value, and it's unlikely
that KVM will ever support 4GiB per-vCPU rings, we could usurp the upper bits for
flags:

  static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u64 arg0)
  {
	u32 flags = arg0 >> 32;
	u32 size = arg0;

However, since it sounds like enabling dirty_bitmap isn't strictly required, I
have no objection to enabling KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, my objection
was purely that KVM was adding a per-VM flag just to sanity check the configuration.

> > Ideally we'd use cap->flags directly, but we screwed up with
> > KVM_CAP_DIRTY_LOG_RING and didn't require flags to be zero :-(
> >
> > Actually, what's the point of allowing
> > KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be enabled?  I get why KVM would
> > enumerate this info, i.e. allowing checking, but I don't seen any
> > value in supporting a second method for enabling the dirty ring.
> > 
> > The acquire-release thing is irrelevant for x86, and no other
> > architecture supports the dirty ring until this series, i.e. there's
> > no need for KVM to detect that userspace has been updated to gain
> > acquire-release semantics, because the fact that userspace is
> > enabling the dirty ring on arm64 means userspace has been updated.
> 
> Do we really need to make the API more awkward? There is an
> established pattern of "enable what is advertised". Some level of
> uniformity wouldn't hurt, really.

I agree that uniformity would be nice, but for capabilities I don't think that's
ever going to happen.  I'm pretty sure supporting enabling is actually in the
minority.  E.g. of the 20 capabilities handled in kvm_vm_ioctl_check_extension_generic(),
I believe only 3 support enabling (KVM_CAP_HALT_POLL, KVM_CAP_DIRTY_LOG_RING, and
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2).
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  2022-10-21 15:25         ` Sean Christopherson
@ 2022-10-21 23:03           ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-21 23:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvmarm, kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon,
	shuah, andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, shan.gavin

Hi Sean,

On 10/21/22 11:25 PM, Sean Christopherson wrote:
> On Fri, Oct 21, 2022, Gavin Shan wrote:
>> I think Marc want to make the check more generalized with a new event [1].
> 
> Generalized code can be achieved with a helper though.  The motivation is indeed
> to avoid overhead on every run:
> 
>    : A seemingly approach would be to make this a request on dirty log
>    : insertion, and avoid the whole "check the log size" on every run,
>    : which adds pointless overhead to unsuspecting users (aka everyone).
> 
> 
> https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org
> 

Ok. I would say both are the motivations. I will refer to the words in the commit
log and include the link. In that way, the motivations are cleared mentioned in
the commit log.

>>> I'm pretty sure the check can be moved to the very end of the request checks,
>>> e.g. to avoid an aborted VM-Enter attempt if one of the other request triggers
>>> KVM_REQ_RING_SOFT_FULL.
>>>
>>> Heh, this might actually be a bug fix of sorts.  If anything pushes to the ring
>>> after the check at the start of vcpu_enter_guest(), then without the request, KVM
>>> would enter the guest while at or above the soft limit, e.g. record_steal_time()
>>> can dirty a page, and the big pile of stuff that's behind KVM_REQ_EVENT can
>>> certainly dirty pages.
>>>
>>
>> When dirty ring becomes full, the VCPU can't handle any operations, which will
>> bring more dirty pages.
> 
> Right, but there's a buffer of 64 entries on top of what the CPU can buffer (VMX's
> PML can buffer 512 entries).  Hence the "soft full".  If x86 is already on the
> edge of exhausting that buffer, i.e. can fill 64 entries while handling requests,
> than we need to increase the buffer provided by the soft limit because sooner or
> later KVM will be able to fill 65 entries, at which point errors will occur
> regardless of when the "soft full" request is processed.
> 
> In other words, we can take advantage of the fact that the soft-limit buffer needs
> to be quite conservative.
> 

Right, there are extra 64 entries in the ring between soft full and hard full.
Another 512 entries are reserved when PML is enabled. However, the other requests,
who produce dirty pages, are producers to the ring. We can't just have the assumption
that those producers will need less than 64 entries. So I think KVM_REQ_DIRTY_RING_SOFT_FULL
has higher priority than other requests, except KVM_REQ_VM_DEAD. KVM_REQ_VM_DEAD
needs to be handled immediately.

>>> Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
>>> about the overhead of having to re-check the request, the goal would be to help
>>> document what causes the request to go away.
>>>
>>> E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
>>>
>>> 	if (!kvm_dirty_ring_soft_full(ring))
>>> 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
>>>
>>
>> It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
>> @vcpu can be achieved by container_of(..., ring).
> 
> Using container_of() is silly, there's literally one caller that does:
> 
> 	kvm_for_each_vcpu(i, vcpu, kvm)
> 		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
> 

May I ask why it's silly by using container_of()? In order to avoid using
container_of(), kvm_dirty_ring_push() also need @vcpu. So lets change those
two functions to something like below. Please double-check if they looks good
to you?

   void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
   int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu);

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
@ 2022-10-21 23:03           ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-21 23:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

Hi Sean,

On 10/21/22 11:25 PM, Sean Christopherson wrote:
> On Fri, Oct 21, 2022, Gavin Shan wrote:
>> I think Marc want to make the check more generalized with a new event [1].
> 
> Generalized code can be achieved with a helper though.  The motivation is indeed
> to avoid overhead on every run:
> 
>    : A seemingly approach would be to make this a request on dirty log
>    : insertion, and avoid the whole "check the log size" on every run,
>    : which adds pointless overhead to unsuspecting users (aka everyone).
> 
> 
> https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org
> 

Ok. I would say both are the motivations. I will refer to the words in the commit
log and include the link. In that way, the motivations are cleared mentioned in
the commit log.

>>> I'm pretty sure the check can be moved to the very end of the request checks,
>>> e.g. to avoid an aborted VM-Enter attempt if one of the other request triggers
>>> KVM_REQ_RING_SOFT_FULL.
>>>
>>> Heh, this might actually be a bug fix of sorts.  If anything pushes to the ring
>>> after the check at the start of vcpu_enter_guest(), then without the request, KVM
>>> would enter the guest while at or above the soft limit, e.g. record_steal_time()
>>> can dirty a page, and the big pile of stuff that's behind KVM_REQ_EVENT can
>>> certainly dirty pages.
>>>
>>
>> When dirty ring becomes full, the VCPU can't handle any operations, which will
>> bring more dirty pages.
> 
> Right, but there's a buffer of 64 entries on top of what the CPU can buffer (VMX's
> PML can buffer 512 entries).  Hence the "soft full".  If x86 is already on the
> edge of exhausting that buffer, i.e. can fill 64 entries while handling requests,
> than we need to increase the buffer provided by the soft limit because sooner or
> later KVM will be able to fill 65 entries, at which point errors will occur
> regardless of when the "soft full" request is processed.
> 
> In other words, we can take advantage of the fact that the soft-limit buffer needs
> to be quite conservative.
> 

Right, there are extra 64 entries in the ring between soft full and hard full.
Another 512 entries are reserved when PML is enabled. However, the other requests,
who produce dirty pages, are producers to the ring. We can't just have the assumption
that those producers will need less than 64 entries. So I think KVM_REQ_DIRTY_RING_SOFT_FULL
has higher priority than other requests, except KVM_REQ_VM_DEAD. KVM_REQ_VM_DEAD
needs to be handled immediately.

>>> Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
>>> about the overhead of having to re-check the request, the goal would be to help
>>> document what causes the request to go away.
>>>
>>> E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
>>>
>>> 	if (!kvm_dirty_ring_soft_full(ring))
>>> 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
>>>
>>
>> It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
>> @vcpu can be achieved by container_of(..., ring).
> 
> Using container_of() is silly, there's literally one caller that does:
> 
> 	kvm_for_each_vcpu(i, vcpu, kvm)
> 		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
> 

May I ask why it's silly by using container_of()? In order to avoid using
container_of(), kvm_dirty_ring_push() also need @vcpu. So lets change those
two functions to something like below. Please double-check if they looks good
to you?

   void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
   int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu);

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-21 10:13       ` Gavin Shan
@ 2022-10-21 23:20         ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-21 23:20 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon,
	shuah, andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, shan.gavin

On Fri, Oct 21, 2022, Gavin Shan wrote:
> > What about inverting the naming to better capture that this is about the dirty
> > bitmap, and less so about the dirty ring?  It's not obvious what "exclusive"
> > means, e.g. I saw this stub before reading the changelog and assumed it was
> > making a dirty ring exclusive to something.
> > 
> > Something like this?
> > 
> > bool kvm_use_dirty_bitmap(struct kvm *kvm)
> > {
> > 	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
> > }
> > 
> 
> If you agree, I would rename is to kvm_dirty_ring_use_bitmap(). In this way,
> we will have "kvm_dirty_ring" prefix for the function name, consistent with
> other functions from same module.

I'd prefer to avoid "ring" in the name at all, because in the common case (well,
legacy case at least) the dirty ring has nothing to do with using the dirty
bitmap, e.g. this code ends up being very confusing because the "dirty_ring"
part implies that KVM _doesn't_ need to allocate the bitmap when the dirty ring
isn't being used.

		if (!(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
			new->dirty_bitmap = NULL;
		else if (old && old->dirty_bitmap)
			new->dirty_bitmap = old->dirty_bitmap;
		else if (kvm_dirty_ring_use_bitmap(kvm) {
			r = kvm_alloc_dirty_bitmap(new);
			if (r)
				return r;

			if (kvm_dirty_log_manual_protect_and_init_set(kvm))
				bitmap_set(new->dirty_bitmap, 0, new->npages);
		}

The helper exists because the dirty ring exists, but the helper is fundamentally
about the dirty bitmap, not the ring.

> > But dirty_ring_with_bitmap really shouldn't need to exist.  It's mandatory for
> > architectures that have HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for
> > architectures that don't.  In other words, the API for enabling the dirty ring
> > is a bit ugly.
> > 
> > Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been officially
> > released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP on top, what about
> > usurping bits 63:32 of cap->args[0] for flags?  E.g.
> > 
> > Ideally we'd use cap->flags directly, but we screwed up with KVM_CAP_DIRTY_LOG_RING
> > and didn't require flags to be zero :-(
> > 
> > Actually, what's the point of allowing KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be
> > enabled?  I get why KVM would enumerate this info, i.e. allowing checking, but I
> > don't seen any value in supporting a second method for enabling the dirty ring.
> > 
> > The acquire-release thing is irrelevant for x86, and no other architecture
> > supports the dirty ring until this series, i.e. there's no need for KVM to detect
> > that userspace has been updated to gain acquire-release semantics, because the
> > fact that userspace is enabling the dirty ring on arm64 means userspace has been
> > updated.
> > 
> > Same goes for the "with bitmap" capability.  There are no existing arm64 users,
> > so there's no risk of breaking existing userspace by suddenly shoving stuff into
> > the dirty bitmap.
> > 
> > KVM doesn't even get the enabling checks right, e.g. KVM_CAP_DIRTY_LOG_RING can be
> > enabled on architectures that select CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL but not
> > KVM_CAP_DIRTY_LOG_RING.  The reverse is true (ignoring that x86 selects both and
> > is the only arch that selects the TSO variant).
> > 
> > Ditto for KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP...
> 
> If I didn't miss anything in the previous discussions, we don't want to make
> KVM_CAP_DIRTY_LOG_RING_ACQ_REL and KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> architecture dependent. If they become architecture dependent, the userspace
> will have different stubs (x86, arm64, other architectures to support
> dirty-ring in future) to enable those capabilities. It's not friendly to
> userspace. So I intend to prefer the existing pattern: advertise, enable. To
> enable a capability without knowing if it's supported sounds a bit weird to
> me.

Enabling without KVM advertising that it's supported would indeed be odd.  Ugh,
and QEMU doesn't have existing checks to restrict the dirty ring to x86, i.e. we
can't make the ACQ_REL capability a true attribute without breaking userspace.

Rats.

> I think it's a good idea to enable KVM_CAP_DIRTY_LOG_RING_{ACQ_REL, WITH_BITMAP} as
> flags, instead of standalone capabilities. In this way, those two capabilities can
> be treated as sub-capability of KVM_CAP_DIRTY_LOG_RING. The question is how these
> two flags can be exposed by kvm_vm_ioctl_check_extension_generic(), if we really
> want to expose those two flags.
> 
> I don't understand your question on how KVM has wrong checks when KVM_CAP_DIRTY_LOG_RING
> and KVM_CAP_DIRTY_LOG_RING_ACQ_REL are enabled.

In the current code base, KVM only checks that _a_ form of dirty ring is supported,
by way of kvm_vm_ioctl_enable_dirty_log_ring()'s check on KVM_DIRTY_LOG_PAGE_OFFSET.

The callers don't verify that the "correct" capability is enabled.

	case KVM_CAP_DIRTY_LOG_RING:
	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);

E.g. userspace could do

	if (kvm_check(KVM_CAP_DIRTY_LOG_RING_ACQ_REL))
		kvm_enable(KVM_CAP_DIRTY_LOG_RING)

and KVM would happily enable the dirty ring.  Functionally it doesn't cause
problems, it's just weird.

Heh, we can fix without more ifdeffery by using the check internally.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e30f1b4ecfa5..300489a0eba5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4585,6 +4585,8 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
        }
        case KVM_CAP_DIRTY_LOG_RING:
        case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
+               if (!kvm_vm_ioctl_check_extension_generic(kvm, cap->cap))
+                       return -EINVAL;
                return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
        default:
                return kvm_vm_ioctl_enable_cap(kvm, cap);

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-21 23:20         ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-21 23:20 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Fri, Oct 21, 2022, Gavin Shan wrote:
> > What about inverting the naming to better capture that this is about the dirty
> > bitmap, and less so about the dirty ring?  It's not obvious what "exclusive"
> > means, e.g. I saw this stub before reading the changelog and assumed it was
> > making a dirty ring exclusive to something.
> > 
> > Something like this?
> > 
> > bool kvm_use_dirty_bitmap(struct kvm *kvm)
> > {
> > 	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
> > }
> > 
> 
> If you agree, I would rename is to kvm_dirty_ring_use_bitmap(). In this way,
> we will have "kvm_dirty_ring" prefix for the function name, consistent with
> other functions from same module.

I'd prefer to avoid "ring" in the name at all, because in the common case (well,
legacy case at least) the dirty ring has nothing to do with using the dirty
bitmap, e.g. this code ends up being very confusing because the "dirty_ring"
part implies that KVM _doesn't_ need to allocate the bitmap when the dirty ring
isn't being used.

		if (!(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
			new->dirty_bitmap = NULL;
		else if (old && old->dirty_bitmap)
			new->dirty_bitmap = old->dirty_bitmap;
		else if (kvm_dirty_ring_use_bitmap(kvm) {
			r = kvm_alloc_dirty_bitmap(new);
			if (r)
				return r;

			if (kvm_dirty_log_manual_protect_and_init_set(kvm))
				bitmap_set(new->dirty_bitmap, 0, new->npages);
		}

The helper exists because the dirty ring exists, but the helper is fundamentally
about the dirty bitmap, not the ring.

> > But dirty_ring_with_bitmap really shouldn't need to exist.  It's mandatory for
> > architectures that have HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for
> > architectures that don't.  In other words, the API for enabling the dirty ring
> > is a bit ugly.
> > 
> > Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been officially
> > released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP on top, what about
> > usurping bits 63:32 of cap->args[0] for flags?  E.g.
> > 
> > Ideally we'd use cap->flags directly, but we screwed up with KVM_CAP_DIRTY_LOG_RING
> > and didn't require flags to be zero :-(
> > 
> > Actually, what's the point of allowing KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be
> > enabled?  I get why KVM would enumerate this info, i.e. allowing checking, but I
> > don't seen any value in supporting a second method for enabling the dirty ring.
> > 
> > The acquire-release thing is irrelevant for x86, and no other architecture
> > supports the dirty ring until this series, i.e. there's no need for KVM to detect
> > that userspace has been updated to gain acquire-release semantics, because the
> > fact that userspace is enabling the dirty ring on arm64 means userspace has been
> > updated.
> > 
> > Same goes for the "with bitmap" capability.  There are no existing arm64 users,
> > so there's no risk of breaking existing userspace by suddenly shoving stuff into
> > the dirty bitmap.
> > 
> > KVM doesn't even get the enabling checks right, e.g. KVM_CAP_DIRTY_LOG_RING can be
> > enabled on architectures that select CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL but not
> > KVM_CAP_DIRTY_LOG_RING.  The reverse is true (ignoring that x86 selects both and
> > is the only arch that selects the TSO variant).
> > 
> > Ditto for KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP...
> 
> If I didn't miss anything in the previous discussions, we don't want to make
> KVM_CAP_DIRTY_LOG_RING_ACQ_REL and KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> architecture dependent. If they become architecture dependent, the userspace
> will have different stubs (x86, arm64, other architectures to support
> dirty-ring in future) to enable those capabilities. It's not friendly to
> userspace. So I intend to prefer the existing pattern: advertise, enable. To
> enable a capability without knowing if it's supported sounds a bit weird to
> me.

Enabling without KVM advertising that it's supported would indeed be odd.  Ugh,
and QEMU doesn't have existing checks to restrict the dirty ring to x86, i.e. we
can't make the ACQ_REL capability a true attribute without breaking userspace.

Rats.

> I think it's a good idea to enable KVM_CAP_DIRTY_LOG_RING_{ACQ_REL, WITH_BITMAP} as
> flags, instead of standalone capabilities. In this way, those two capabilities can
> be treated as sub-capability of KVM_CAP_DIRTY_LOG_RING. The question is how these
> two flags can be exposed by kvm_vm_ioctl_check_extension_generic(), if we really
> want to expose those two flags.
> 
> I don't understand your question on how KVM has wrong checks when KVM_CAP_DIRTY_LOG_RING
> and KVM_CAP_DIRTY_LOG_RING_ACQ_REL are enabled.

In the current code base, KVM only checks that _a_ form of dirty ring is supported,
by way of kvm_vm_ioctl_enable_dirty_log_ring()'s check on KVM_DIRTY_LOG_PAGE_OFFSET.

The callers don't verify that the "correct" capability is enabled.

	case KVM_CAP_DIRTY_LOG_RING:
	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);

E.g. userspace could do

	if (kvm_check(KVM_CAP_DIRTY_LOG_RING_ACQ_REL))
		kvm_enable(KVM_CAP_DIRTY_LOG_RING)

and KVM would happily enable the dirty ring.  Functionally it doesn't cause
problems, it's just weird.

Heh, we can fix without more ifdeffery by using the check internally.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e30f1b4ecfa5..300489a0eba5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4585,6 +4585,8 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
        }
        case KVM_CAP_DIRTY_LOG_RING:
        case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
+               if (!kvm_vm_ioctl_check_extension_generic(kvm, cap->cap))
+                       return -EINVAL;
                return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
        default:
                return kvm_vm_ioctl_enable_cap(kvm, cap);
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  2022-10-21 23:03           ` Gavin Shan
@ 2022-10-21 23:48             ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-21 23:48 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon,
	shuah, andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, shan.gavin

On Sat, Oct 22, 2022, Gavin Shan wrote:
> > > When dirty ring becomes full, the VCPU can't handle any operations, which will
> > > bring more dirty pages.
> > 
> > Right, but there's a buffer of 64 entries on top of what the CPU can buffer (VMX's
> > PML can buffer 512 entries).  Hence the "soft full".  If x86 is already on the
> > edge of exhausting that buffer, i.e. can fill 64 entries while handling requests,
> > than we need to increase the buffer provided by the soft limit because sooner or
> > later KVM will be able to fill 65 entries, at which point errors will occur
> > regardless of when the "soft full" request is processed.
> > 
> > In other words, we can take advantage of the fact that the soft-limit buffer needs
> > to be quite conservative.
> > 
> 
> Right, there are extra 64 entries in the ring between soft full and hard full.
> Another 512 entries are reserved when PML is enabled. However, the other requests,
> who produce dirty pages, are producers to the ring. We can't just have the assumption
> that those producers will need less than 64 entries.

But we're already assuming those producers will need less than 65 entries.  My point
is that if one (or even five) extra entries pushes KVM over the limit, then the
buffer provided by the soft limit needs to be jacked up regardless of when the
request is processed.

Hmm, but I suppose it's possible there's a pathological emulator path that can push
double digit entries, and servicing the request right away ensures that requests
have the full 64 entry buffer to play with.

So yeah, I agree, move it below the DEAD check, but keep it above most everything
else.

> > > > Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
> > > > about the overhead of having to re-check the request, the goal would be to help
> > > > document what causes the request to go away.
> > > > 
> > > > E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
> > > > 
> > > > 	if (!kvm_dirty_ring_soft_full(ring))
> > > > 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> > > > 
> > > 
> > > It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
> > > @vcpu can be achieved by container_of(..., ring).
> > 
> > Using container_of() is silly, there's literally one caller that does:
> > 
> > 	kvm_for_each_vcpu(i, vcpu, kvm)
> > 		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
> > 
> 
> May I ask why it's silly by using container_of()?

Because container_of() is inherently dangerous, e.g. if it's used on a pointer that
isn't contained by the expected type, the code will compile cleanly but explode
at runtime.  That's unlikely to happen in this case, e.g. doesn't look like we'll
be adding a ring to "struct kvm", but if someone wanted to add a per-VM ring,
taking the vCPU makes it very obvious that pushing to a ring _requires_ a vCPU,
and enforces that requirement at compile time.

In other words, it's preferable to avoid container_of() unless using it solves a
real problem that doesn't have a better alternative.

In these cases, passing in the vCPU is most definitely a better alternative as
each of the functions in question has a sole caller that has easy access to the
container (vCPU), i.e. it's a trivial change.

> In order to avoid using container_of(), kvm_dirty_ring_push() also need
> @vcpu.

Yep, that one should be changed too.

> So lets change those two functions to something like below. Please
> double-check if they looks good to you?
> 
>   void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
>   int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu);

Yep, looks good.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
@ 2022-10-21 23:48             ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-21 23:48 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Sat, Oct 22, 2022, Gavin Shan wrote:
> > > When dirty ring becomes full, the VCPU can't handle any operations, which will
> > > bring more dirty pages.
> > 
> > Right, but there's a buffer of 64 entries on top of what the CPU can buffer (VMX's
> > PML can buffer 512 entries).  Hence the "soft full".  If x86 is already on the
> > edge of exhausting that buffer, i.e. can fill 64 entries while handling requests,
> > than we need to increase the buffer provided by the soft limit because sooner or
> > later KVM will be able to fill 65 entries, at which point errors will occur
> > regardless of when the "soft full" request is processed.
> > 
> > In other words, we can take advantage of the fact that the soft-limit buffer needs
> > to be quite conservative.
> > 
> 
> Right, there are extra 64 entries in the ring between soft full and hard full.
> Another 512 entries are reserved when PML is enabled. However, the other requests,
> who produce dirty pages, are producers to the ring. We can't just have the assumption
> that those producers will need less than 64 entries.

But we're already assuming those producers will need less than 65 entries.  My point
is that if one (or even five) extra entries pushes KVM over the limit, then the
buffer provided by the soft limit needs to be jacked up regardless of when the
request is processed.

Hmm, but I suppose it's possible there's a pathological emulator path that can push
double digit entries, and servicing the request right away ensures that requests
have the full 64 entry buffer to play with.

So yeah, I agree, move it below the DEAD check, but keep it above most everything
else.

> > > > Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
> > > > about the overhead of having to re-check the request, the goal would be to help
> > > > document what causes the request to go away.
> > > > 
> > > > E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
> > > > 
> > > > 	if (!kvm_dirty_ring_soft_full(ring))
> > > > 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> > > > 
> > > 
> > > It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
> > > @vcpu can be achieved by container_of(..., ring).
> > 
> > Using container_of() is silly, there's literally one caller that does:
> > 
> > 	kvm_for_each_vcpu(i, vcpu, kvm)
> > 		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
> > 
> 
> May I ask why it's silly by using container_of()?

Because container_of() is inherently dangerous, e.g. if it's used on a pointer that
isn't contained by the expected type, the code will compile cleanly but explode
at runtime.  That's unlikely to happen in this case, e.g. doesn't look like we'll
be adding a ring to "struct kvm", but if someone wanted to add a per-VM ring,
taking the vCPU makes it very obvious that pushing to a ring _requires_ a vCPU,
and enforces that requirement at compile time.

In other words, it's preferable to avoid container_of() unless using it solves a
real problem that doesn't have a better alternative.

In these cases, passing in the vCPU is most definitely a better alternative as
each of the functions in question has a sole caller that has easy access to the
container (vCPU), i.e. it's a trivial change.

> In order to avoid using container_of(), kvm_dirty_ring_push() also need
> @vcpu.

Yep, that one should be changed too.

> So lets change those two functions to something like below. Please
> double-check if they looks good to you?
> 
>   void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
>   int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu);

Yep, looks good.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  2022-10-21 23:48             ` Sean Christopherson
@ 2022-10-22  0:16               ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-22  0:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvmarm, kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon,
	shuah, andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, shan.gavin

Hi Sean,

On 10/22/22 7:48 AM, Sean Christopherson wrote:
> On Sat, Oct 22, 2022, Gavin Shan wrote:
>>>> When dirty ring becomes full, the VCPU can't handle any operations, which will
>>>> bring more dirty pages.
>>>
>>> Right, but there's a buffer of 64 entries on top of what the CPU can buffer (VMX's
>>> PML can buffer 512 entries).  Hence the "soft full".  If x86 is already on the
>>> edge of exhausting that buffer, i.e. can fill 64 entries while handling requests,
>>> than we need to increase the buffer provided by the soft limit because sooner or
>>> later KVM will be able to fill 65 entries, at which point errors will occur
>>> regardless of when the "soft full" request is processed.
>>>
>>> In other words, we can take advantage of the fact that the soft-limit buffer needs
>>> to be quite conservative.
>>>
>>
>> Right, there are extra 64 entries in the ring between soft full and hard full.
>> Another 512 entries are reserved when PML is enabled. However, the other requests,
>> who produce dirty pages, are producers to the ring. We can't just have the assumption
>> that those producers will need less than 64 entries.
> 
> But we're already assuming those producers will need less than 65 entries.  My point
> is that if one (or even five) extra entries pushes KVM over the limit, then the
> buffer provided by the soft limit needs to be jacked up regardless of when the
> request is processed.
> 
> Hmm, but I suppose it's possible there's a pathological emulator path that can push
> double digit entries, and servicing the request right away ensures that requests
> have the full 64 entry buffer to play with.
> 
> So yeah, I agree, move it below the DEAD check, but keep it above most everything
> else.
> 

Ok, Thanks for double confirm on this. I will move the check after READ in next
revision.

>>>>> Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
>>>>> about the overhead of having to re-check the request, the goal would be to help
>>>>> document what causes the request to go away.
>>>>>
>>>>> E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
>>>>>
>>>>> 	if (!kvm_dirty_ring_soft_full(ring))
>>>>> 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
>>>>>
>>>>
>>>> It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
>>>> @vcpu can be achieved by container_of(..., ring).
>>>
>>> Using container_of() is silly, there's literally one caller that does:
>>>
>>> 	kvm_for_each_vcpu(i, vcpu, kvm)
>>> 		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
>>>
>>
>> May I ask why it's silly by using container_of()?
> 
> Because container_of() is inherently dangerous, e.g. if it's used on a pointer that
> isn't contained by the expected type, the code will compile cleanly but explode
> at runtime.  That's unlikely to happen in this case, e.g. doesn't look like we'll
> be adding a ring to "struct kvm", but if someone wanted to add a per-VM ring,
> taking the vCPU makes it very obvious that pushing to a ring _requires_ a vCPU,
> and enforces that requirement at compile time.
> 
> In other words, it's preferable to avoid container_of() unless using it solves a
> real problem that doesn't have a better alternative.
> 
> In these cases, passing in the vCPU is most definitely a better alternative as
> each of the functions in question has a sole caller that has easy access to the
> container (vCPU), i.e. it's a trivial change.
> 

Right, container_of() can't ensure consistence and full sanity check by itself.
It's reasonable to avoid using it if possible. Thanks for the details and
explanation.

>> In order to avoid using container_of(), kvm_dirty_ring_push() also need
>> @vcpu.
> 
> Yep, that one should be changed too.
> 

Ok.

>> So lets change those two functions to something like below. Please
>> double-check if they looks good to you?
>>
>>    void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
>>    int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu);
> 
> Yep, looks good.
> 

Ok, Thanks for your confirm.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
@ 2022-10-22  0:16               ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-22  0:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

Hi Sean,

On 10/22/22 7:48 AM, Sean Christopherson wrote:
> On Sat, Oct 22, 2022, Gavin Shan wrote:
>>>> When dirty ring becomes full, the VCPU can't handle any operations, which will
>>>> bring more dirty pages.
>>>
>>> Right, but there's a buffer of 64 entries on top of what the CPU can buffer (VMX's
>>> PML can buffer 512 entries).  Hence the "soft full".  If x86 is already on the
>>> edge of exhausting that buffer, i.e. can fill 64 entries while handling requests,
>>> than we need to increase the buffer provided by the soft limit because sooner or
>>> later KVM will be able to fill 65 entries, at which point errors will occur
>>> regardless of when the "soft full" request is processed.
>>>
>>> In other words, we can take advantage of the fact that the soft-limit buffer needs
>>> to be quite conservative.
>>>
>>
>> Right, there are extra 64 entries in the ring between soft full and hard full.
>> Another 512 entries are reserved when PML is enabled. However, the other requests,
>> who produce dirty pages, are producers to the ring. We can't just have the assumption
>> that those producers will need less than 64 entries.
> 
> But we're already assuming those producers will need less than 65 entries.  My point
> is that if one (or even five) extra entries pushes KVM over the limit, then the
> buffer provided by the soft limit needs to be jacked up regardless of when the
> request is processed.
> 
> Hmm, but I suppose it's possible there's a pathological emulator path that can push
> double digit entries, and servicing the request right away ensures that requests
> have the full 64 entry buffer to play with.
> 
> So yeah, I agree, move it below the DEAD check, but keep it above most everything
> else.
> 

Ok, Thanks for double confirm on this. I will move the check after READ in next
revision.

>>>>> Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
>>>>> about the overhead of having to re-check the request, the goal would be to help
>>>>> document what causes the request to go away.
>>>>>
>>>>> E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
>>>>>
>>>>> 	if (!kvm_dirty_ring_soft_full(ring))
>>>>> 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
>>>>>
>>>>
>>>> It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
>>>> @vcpu can be achieved by container_of(..., ring).
>>>
>>> Using container_of() is silly, there's literally one caller that does:
>>>
>>> 	kvm_for_each_vcpu(i, vcpu, kvm)
>>> 		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
>>>
>>
>> May I ask why it's silly by using container_of()?
> 
> Because container_of() is inherently dangerous, e.g. if it's used on a pointer that
> isn't contained by the expected type, the code will compile cleanly but explode
> at runtime.  That's unlikely to happen in this case, e.g. doesn't look like we'll
> be adding a ring to "struct kvm", but if someone wanted to add a per-VM ring,
> taking the vCPU makes it very obvious that pushing to a ring _requires_ a vCPU,
> and enforces that requirement at compile time.
> 
> In other words, it's preferable to avoid container_of() unless using it solves a
> real problem that doesn't have a better alternative.
> 
> In these cases, passing in the vCPU is most definitely a better alternative as
> each of the functions in question has a sole caller that has easy access to the
> container (vCPU), i.e. it's a trivial change.
> 

Right, container_of() can't ensure consistence and full sanity check by itself.
It's reasonable to avoid using it if possible. Thanks for the details and
explanation.

>> In order to avoid using container_of(), kvm_dirty_ring_push() also need
>> @vcpu.
> 
> Yep, that one should be changed too.
> 

Ok.

>> So lets change those two functions to something like below. Please
>> double-check if they looks good to you?
>>
>>    void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
>>    int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu);
> 
> Yep, looks good.
> 

Ok, Thanks for your confirm.

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-21 23:20         ` Sean Christopherson
@ 2022-10-22  0:33           ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-22  0:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvmarm, kvmarm, kvm, peterx, maz, will, catalin.marinas, bgardon,
	shuah, andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, shan.gavin

Hi Sean,

On 10/22/22 7:20 AM, Sean Christopherson wrote:
> On Fri, Oct 21, 2022, Gavin Shan wrote:
>>> What about inverting the naming to better capture that this is about the dirty
>>> bitmap, and less so about the dirty ring?  It's not obvious what "exclusive"
>>> means, e.g. I saw this stub before reading the changelog and assumed it was
>>> making a dirty ring exclusive to something.
>>>
>>> Something like this?
>>>
>>> bool kvm_use_dirty_bitmap(struct kvm *kvm)
>>> {
>>> 	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
>>> }
>>>
>>
>> If you agree, I would rename is to kvm_dirty_ring_use_bitmap(). In this way,
>> we will have "kvm_dirty_ring" prefix for the function name, consistent with
>> other functions from same module.
> 
> I'd prefer to avoid "ring" in the name at all, because in the common case (well,
> legacy case at least) the dirty ring has nothing to do with using the dirty
> bitmap, e.g. this code ends up being very confusing because the "dirty_ring"
> part implies that KVM _doesn't_ need to allocate the bitmap when the dirty ring
> isn't being used.
> 
> 		if (!(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> 			new->dirty_bitmap = NULL;
> 		else if (old && old->dirty_bitmap)
> 			new->dirty_bitmap = old->dirty_bitmap;
> 		else if (kvm_dirty_ring_use_bitmap(kvm) {
> 			r = kvm_alloc_dirty_bitmap(new);
> 			if (r)
> 				return r;
> 
> 			if (kvm_dirty_log_manual_protect_and_init_set(kvm))
> 				bitmap_set(new->dirty_bitmap, 0, new->npages);
> 		}
> 
> The helper exists because the dirty ring exists, but the helper is fundamentally
> about the dirty bitmap, not the ring.
> 

Thanks for the details. Yeah, it makes sense to avoid "ring" then. Lets use
the name kvm_use_dirty_bitmap() for the function.

>>> But dirty_ring_with_bitmap really shouldn't need to exist.  It's mandatory for
>>> architectures that have HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for
>>> architectures that don't.  In other words, the API for enabling the dirty ring
>>> is a bit ugly.
>>>
>>> Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been officially
>>> released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP on top, what about
>>> usurping bits 63:32 of cap->args[0] for flags?  E.g.
>>>
>>> Ideally we'd use cap->flags directly, but we screwed up with KVM_CAP_DIRTY_LOG_RING
>>> and didn't require flags to be zero :-(
>>>
>>> Actually, what's the point of allowing KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be
>>> enabled?  I get why KVM would enumerate this info, i.e. allowing checking, but I
>>> don't seen any value in supporting a second method for enabling the dirty ring.
>>>
>>> The acquire-release thing is irrelevant for x86, and no other architecture
>>> supports the dirty ring until this series, i.e. there's no need for KVM to detect
>>> that userspace has been updated to gain acquire-release semantics, because the
>>> fact that userspace is enabling the dirty ring on arm64 means userspace has been
>>> updated.
>>>
>>> Same goes for the "with bitmap" capability.  There are no existing arm64 users,
>>> so there's no risk of breaking existing userspace by suddenly shoving stuff into
>>> the dirty bitmap.
>>>
>>> KVM doesn't even get the enabling checks right, e.g. KVM_CAP_DIRTY_LOG_RING can be
>>> enabled on architectures that select CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL but not
>>> KVM_CAP_DIRTY_LOG_RING.  The reverse is true (ignoring that x86 selects both and
>>> is the only arch that selects the TSO variant).
>>>
>>> Ditto for KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP...
>>
>> If I didn't miss anything in the previous discussions, we don't want to make
>> KVM_CAP_DIRTY_LOG_RING_ACQ_REL and KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>> architecture dependent. If they become architecture dependent, the userspace
>> will have different stubs (x86, arm64, other architectures to support
>> dirty-ring in future) to enable those capabilities. It's not friendly to
>> userspace. So I intend to prefer the existing pattern: advertise, enable. To
>> enable a capability without knowing if it's supported sounds a bit weird to
>> me.
> 
> Enabling without KVM advertising that it's supported would indeed be odd.  Ugh,
> and QEMU doesn't have existing checks to restrict the dirty ring to x86, i.e. we
> can't make the ACQ_REL capability a true attribute without breaking userspace.
> 
> Rats.
> 

Currently, QEMU doesn't use ACQ_REL and WITH_BITMAP. After both capability are
supported by kvm, we need go ahead to change QEMU so that these two capabilities
can be enabled in QEMU.

>> I think it's a good idea to enable KVM_CAP_DIRTY_LOG_RING_{ACQ_REL, WITH_BITMAP} as
>> flags, instead of standalone capabilities. In this way, those two capabilities can
>> be treated as sub-capability of KVM_CAP_DIRTY_LOG_RING. The question is how these
>> two flags can be exposed by kvm_vm_ioctl_check_extension_generic(), if we really
>> want to expose those two flags.
>>
>> I don't understand your question on how KVM has wrong checks when KVM_CAP_DIRTY_LOG_RING
>> and KVM_CAP_DIRTY_LOG_RING_ACQ_REL are enabled.
> 
> In the current code base, KVM only checks that _a_ form of dirty ring is supported,
> by way of kvm_vm_ioctl_enable_dirty_log_ring()'s check on KVM_DIRTY_LOG_PAGE_OFFSET.
> 
> The callers don't verify that the "correct" capability is enabled.
> 
> 	case KVM_CAP_DIRTY_LOG_RING:
> 	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
> 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> 
> E.g. userspace could do
> 
> 	if (kvm_check(KVM_CAP_DIRTY_LOG_RING_ACQ_REL))
> 		kvm_enable(KVM_CAP_DIRTY_LOG_RING)
> 
> and KVM would happily enable the dirty ring.  Functionally it doesn't cause
> problems, it's just weird.
> 
> Heh, we can fix without more ifdeffery by using the check internally.
> 

Hmm, nice catch! Lets fix it up in a separate patch.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e30f1b4ecfa5..300489a0eba5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4585,6 +4585,8 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>          }
>          case KVM_CAP_DIRTY_LOG_RING:
>          case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
> +               if (!kvm_vm_ioctl_check_extension_generic(kvm, cap->cap))
> +                       return -EINVAL;
>                  return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>          default:
>                  return kvm_vm_ioctl_enable_cap(kvm, cap);
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-22  0:33           ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-22  0:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, maz, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

Hi Sean,

On 10/22/22 7:20 AM, Sean Christopherson wrote:
> On Fri, Oct 21, 2022, Gavin Shan wrote:
>>> What about inverting the naming to better capture that this is about the dirty
>>> bitmap, and less so about the dirty ring?  It's not obvious what "exclusive"
>>> means, e.g. I saw this stub before reading the changelog and assumed it was
>>> making a dirty ring exclusive to something.
>>>
>>> Something like this?
>>>
>>> bool kvm_use_dirty_bitmap(struct kvm *kvm)
>>> {
>>> 	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
>>> }
>>>
>>
>> If you agree, I would rename is to kvm_dirty_ring_use_bitmap(). In this way,
>> we will have "kvm_dirty_ring" prefix for the function name, consistent with
>> other functions from same module.
> 
> I'd prefer to avoid "ring" in the name at all, because in the common case (well,
> legacy case at least) the dirty ring has nothing to do with using the dirty
> bitmap, e.g. this code ends up being very confusing because the "dirty_ring"
> part implies that KVM _doesn't_ need to allocate the bitmap when the dirty ring
> isn't being used.
> 
> 		if (!(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> 			new->dirty_bitmap = NULL;
> 		else if (old && old->dirty_bitmap)
> 			new->dirty_bitmap = old->dirty_bitmap;
> 		else if (kvm_dirty_ring_use_bitmap(kvm) {
> 			r = kvm_alloc_dirty_bitmap(new);
> 			if (r)
> 				return r;
> 
> 			if (kvm_dirty_log_manual_protect_and_init_set(kvm))
> 				bitmap_set(new->dirty_bitmap, 0, new->npages);
> 		}
> 
> The helper exists because the dirty ring exists, but the helper is fundamentally
> about the dirty bitmap, not the ring.
> 

Thanks for the details. Yeah, it makes sense to avoid "ring" then. Lets use
the name kvm_use_dirty_bitmap() for the function.

>>> But dirty_ring_with_bitmap really shouldn't need to exist.  It's mandatory for
>>> architectures that have HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for
>>> architectures that don't.  In other words, the API for enabling the dirty ring
>>> is a bit ugly.
>>>
>>> Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been officially
>>> released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP on top, what about
>>> usurping bits 63:32 of cap->args[0] for flags?  E.g.
>>>
>>> Ideally we'd use cap->flags directly, but we screwed up with KVM_CAP_DIRTY_LOG_RING
>>> and didn't require flags to be zero :-(
>>>
>>> Actually, what's the point of allowing KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be
>>> enabled?  I get why KVM would enumerate this info, i.e. allowing checking, but I
>>> don't seen any value in supporting a second method for enabling the dirty ring.
>>>
>>> The acquire-release thing is irrelevant for x86, and no other architecture
>>> supports the dirty ring until this series, i.e. there's no need for KVM to detect
>>> that userspace has been updated to gain acquire-release semantics, because the
>>> fact that userspace is enabling the dirty ring on arm64 means userspace has been
>>> updated.
>>>
>>> Same goes for the "with bitmap" capability.  There are no existing arm64 users,
>>> so there's no risk of breaking existing userspace by suddenly shoving stuff into
>>> the dirty bitmap.
>>>
>>> KVM doesn't even get the enabling checks right, e.g. KVM_CAP_DIRTY_LOG_RING can be
>>> enabled on architectures that select CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL but not
>>> KVM_CAP_DIRTY_LOG_RING.  The reverse is true (ignoring that x86 selects both and
>>> is the only arch that selects the TSO variant).
>>>
>>> Ditto for KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP...
>>
>> If I didn't miss anything in the previous discussions, we don't want to make
>> KVM_CAP_DIRTY_LOG_RING_ACQ_REL and KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>> architecture dependent. If they become architecture dependent, the userspace
>> will have different stubs (x86, arm64, other architectures to support
>> dirty-ring in future) to enable those capabilities. It's not friendly to
>> userspace. So I intend to prefer the existing pattern: advertise, enable. To
>> enable a capability without knowing if it's supported sounds a bit weird to
>> me.
> 
> Enabling without KVM advertising that it's supported would indeed be odd.  Ugh,
> and QEMU doesn't have existing checks to restrict the dirty ring to x86, i.e. we
> can't make the ACQ_REL capability a true attribute without breaking userspace.
> 
> Rats.
> 

Currently, QEMU doesn't use ACQ_REL and WITH_BITMAP. After both capability are
supported by kvm, we need go ahead to change QEMU so that these two capabilities
can be enabled in QEMU.

>> I think it's a good idea to enable KVM_CAP_DIRTY_LOG_RING_{ACQ_REL, WITH_BITMAP} as
>> flags, instead of standalone capabilities. In this way, those two capabilities can
>> be treated as sub-capability of KVM_CAP_DIRTY_LOG_RING. The question is how these
>> two flags can be exposed by kvm_vm_ioctl_check_extension_generic(), if we really
>> want to expose those two flags.
>>
>> I don't understand your question on how KVM has wrong checks when KVM_CAP_DIRTY_LOG_RING
>> and KVM_CAP_DIRTY_LOG_RING_ACQ_REL are enabled.
> 
> In the current code base, KVM only checks that _a_ form of dirty ring is supported,
> by way of kvm_vm_ioctl_enable_dirty_log_ring()'s check on KVM_DIRTY_LOG_PAGE_OFFSET.
> 
> The callers don't verify that the "correct" capability is enabled.
> 
> 	case KVM_CAP_DIRTY_LOG_RING:
> 	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
> 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> 
> E.g. userspace could do
> 
> 	if (kvm_check(KVM_CAP_DIRTY_LOG_RING_ACQ_REL))
> 		kvm_enable(KVM_CAP_DIRTY_LOG_RING)
> 
> and KVM would happily enable the dirty ring.  Functionally it doesn't cause
> problems, it's just weird.
> 
> Heh, we can fix without more ifdeffery by using the check internally.
> 

Hmm, nice catch! Lets fix it up in a separate patch.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e30f1b4ecfa5..300489a0eba5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4585,6 +4585,8 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>          }
>          case KVM_CAP_DIRTY_LOG_RING:
>          case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
> +               if (!kvm_vm_ioctl_check_extension_generic(kvm, cap->cap))
> +                       return -EINVAL;
>                  return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>          default:
>                  return kvm_vm_ioctl_enable_cap(kvm, cap);
> 

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-21 16:05         ` Sean Christopherson
@ 2022-10-22  8:27           ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-22  8:27 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

Hi Sean,

On 10/22/22 12:05 AM, Sean Christopherson wrote:
> On Fri, Oct 21, 2022, Marc Zyngier wrote:
>> On Fri, 21 Oct 2022 00:44:51 +0100,
>> Sean Christopherson <seanjc@google.com> wrote:
>>>
>>> On Tue, Oct 11, 2022, Gavin Shan wrote:
>>>> Some architectures (such as arm64) need to dirty memory outside of the
>>>> context of a vCPU. Of course, this simply doesn't fit with the UAPI of
>>>> KVM's per-vCPU dirty ring.
>>>
>>> What is the point of using the dirty ring in this case?  KVM still
>>> burns a pile of memory for the bitmap.  Is the benefit that
>>> userspace can get away with scanning the bitmap fewer times,
>>> e.g. scan it once just before blackout under the assumption that
>>> very few pages will dirty the bitmap?
>>
>> Apparently, the throttling effect of the ring makes it easier to
>> converge. Someone who actually uses the feature should be able to
>> tell you. But that's a policy decision, and I don't see why we should
>> be prescriptive.
> 
> I wasn't suggesting we be prescriptive, it was an honest question.
> 
>>> Why not add a global ring to @kvm?  I assume thread safety is a
>>> problem, but the memory overhead of the dirty_bitmap also seems like
>>> a fairly big problem.
>>
>> Because we already have a stupidly bloated API surface, and that we
>> could do without yet another one based on a sample of *one*?
> 
> But we're adding a new API regardless.  A per-VM ring would definitely be a bigger
> addition, but if using the dirty_bitmap won't actually meet the needs of userspace,
> then we'll have added a new API and still not have solved the problem.  That's why
> I was asking why/when userspace would want to use dirty_ring+dirty_bitmap.
> 

Bitmap can help to solve the issue, but the extra memory consumption due to
the bitmap is a concern, as you mentioned previously. More information about
the issue can be found here [1]. On ARM64, multiple guest's physical pages are
used by VGIC/ITS to store its states during migration or system shutdown.

[1] https://lore.kernel.org/kvmarm/320005d1-fe88-fd6a-be91-ddb56f1aa80f@redhat.com/

>> Because dirtying memory outside of a vcpu context makes it incredibly awkward
>> to handle a "ring full" condition?
> 
> Kicking all vCPUs with the soft-full request isn't _that_ awkward.  It's certainly
> sub-optimal, but if inserting into the per-VM ring is relatively rare, then in
> practice it's unlikely to impact guest performance.
> 

It's still possible the per-vcpu-ring becomes hard full before it can be
kicked off. per-vm-ring has other issues, one of which is synchronization
between kvm and userspace to avoid overrunning per-kvm-ring. bitmap was
selected due to its simplicity.

>>>> Introduce a new flavor of dirty ring that requires the use of both vCPU
>>>> dirty rings and a dirty bitmap. The expectation is that for non-vCPU
>>>> sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
>>>> the dirty bitmap. Userspace should scan the dirty bitmap before
>>>> migrating the VM to the target.
>>>>
>>>> Use an additional capability to advertize this behavior and require
>>>> explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
>>>> you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
>>>> not allow userspace to enable dirty ring if it hasn't also enabled the
>>>> ring && bitmap capability, as a VM is likely DOA without the pages
>>>> marked in the bitmap.
>>
>> This is wrong. The *only* case this is useful is when there is an
>> in-kernel producer of data outside of the context of a vcpu, which is
>> so far only the ITS save mechanism. No ITS? No need for this.
> 
> How large is the ITS?  If it's a fixed, small size, could we treat the ITS as a
> one-off case for now?  E.g. do something gross like shove entries into vcpu0's
> dirty ring?
> 

There are several VGIC/ITS tables involved in the issue. I checked the
specification and the implementation. As the device ID is 16-bits, so
the maximal devices can be 0x10000. Each device has its ITT (Interrupt
Translation Table), looked by a 32-bits event ID. The memory used for
ITT can be large enough in theory.

     Register       Description           Max-size   Entry-size  Max-entries
     -----------------------------------------------------------------------
     GITS_BASER0    ITS Device Table      512KB      8-bytes     0x10000
     GITS_BASER1    ITS Collection Table  512KB      8-bytes     0x10000
     GITS_BASER2    (GICv4) ITS VPE Table 512KB      8-bytes(?)  0x10000

     max-devices * (1UL << event_id_shift) * entry_size =
     0x10000 * (1UL << 32) * 8                          = 1PB

>> Userspace knows what it has created the first place, and should be in
>> charge of it (i.e. I want to be able to migrate my GICv2 and
>> GICv3-without-ITS VMs with the rings only).
> 
> Ah, so enabling the dirty bitmap isn't strictly required.  That means this patch
> is wrong, and it also means that we need to figure out how we want to handle the
> case where mark_page_dirty_in_slot() is invoked without a running vCPU on a memslot
> without a dirty_bitmap.
> 
> I.e. what's an appropriate action in the below sequence:
> 
> void mark_page_dirty_in_slot(struct kvm *kvm,
> 			     const struct kvm_memory_slot *memslot,
> 		 	     gfn_t gfn)
> {
> 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> 
> #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> 	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> 		return;
> 
> #ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> 	if (WARN_ON_ONCE(!vcpu))
> 		return;
> #endif
> #endif
> 
> 	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> 		unsigned long rel_gfn = gfn - memslot->base_gfn;
> 		u32 slot = (memslot->as_id << 16) | memslot->id;
> 
> 		if (vcpu && kvm->dirty_ring_size)
> 			kvm_dirty_ring_push(&vcpu->dirty_ring,
> 					    slot, rel_gfn);
> 		else if (memslot->dirty_bitmap)
> 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> 		else
> 			???? <=================================================
> 	}
> }
> 
> 
> Would it be possible to require a dirty bitmap when an ITS is created?  That would
> allow treating the above condition as a KVM bug.
> 

According to the above calculation, it's impossible to determine the memory size for
the bitmap in advance. The memory used by ITE (Interrupt Translation Entry) tables
can be huge enough to use all guest's system memory in theory. ITE tables are scattered
in guest's system memory, but we don't know its location in advance. ITE tables are
created dynamically on requests from guest.

However, I think it's a good idea to enable the bitmap only when "arm-its-kvm" is
really used in userspace (QEMU). For example, the machine and (kvm) accelerator are
initialized like below. It's unknown if "arm-its-kvm" is used until (c). So we can
enable KVM_CAP_DIRTY_RING_WITH_BITMAP in (d) and the bitmap is created in (e) by
KVM.

   main
     qemu_init
       qemu_create_machine                   (a) machine instance is created
       configure_accelerators
         do_configure_accelerator
           accel_init_machine
             kvm_init                        (b) KVM is initialized
       :
       qmp_x_exit_preconfig
         qemu_init_board
           machine_run_board_init            (c) The board is initialized
       :
       accel_setup_post                      (d) KVM is post initialized
       :
       <migration>                           (e) Migration starts

In order to record if the bitmap is really needed, "struct kvm::dirty_ring_with_bitmap"
is still needed.

    - KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is advertised when CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
      is selected.

    - KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled in (d) only when "arm-its-kvm"
      is used in QEMU. After the capability is enabled, "struct kvm::dirty_ring_with_bitmap"
      is set to 1.

    - The bitmap is created by KVM in (e).

If the above analysis makes sense, I don't see there is anything missed from the patch
Of course, KVM_CAP_DIRTY_LOG_RING_{ACQ_REL, WITH_BITMAP} needs to be enabled separately
and don't depend on each other. the description added to "Documentation/virt/kvm/abi.rst"
need to be improved as Peter and Oliver suggested. kvm_dirty_ring_exclusive() needs to be
renamed to kvm_use_dirty_bitmap() and "#ifdef" needs to be cut down as Sean suggested.


>>>> @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
>>>>   {
>>>>   	int r;
>>>>   
>>>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>>>> +	if (!kvm->dirty_ring_with_bitmap)
>>>> +		return -EINVAL;
>>>> +#endif
>>>
>>> This one at least is prettier with IS_ENABLED
>>>
>>> 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
>>> 	    !kvm->dirty_ring_with_bitmap)
>>> 		return -EINVAL;
>>>
>>> But dirty_ring_with_bitmap really shouldn't need to exist.  It's
>>> mandatory for architectures that have
>>> HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for architectures
>>> that don't.  In other words, the API for enabling the dirty ring is
>>> a bit ugly.
>>>
>>> Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been
>>> officially released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP
>>> on top, what about usurping bits 63:32 of cap->args[0] for flags?
>>> E.g.
> 
> For posterity, filling in my missing idea...
> 
> Since the size is restricted to be well below a 32-bit value, and it's unlikely
> that KVM will ever support 4GiB per-vCPU rings, we could usurp the upper bits for
> flags:
> 
>    static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u64 arg0)
>    {
> 	u32 flags = arg0 >> 32;
> 	u32 size = arg0;
> 
> However, since it sounds like enabling dirty_bitmap isn't strictly required, I
> have no objection to enabling KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, my objection
> was purely that KVM was adding a per-VM flag just to sanity check the configuration.
> 

If KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled for "arm-its-kvm", it'd better
to allow enabling those two capability (ACQ_REL and WITH_BITMAP) separately, as I
explained above. userspace (QEMU) will gain flexibility if these two capabilities
can be enabled separately.

To QEMU, KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL are accelerator's
properties. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is board's property. Relaxing their
dependency will give flexibility to QEMU.


[...]

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-22  8:27           ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-22  8:27 UTC (permalink / raw)
  To: Sean Christopherson, Marc Zyngier
  Cc: kvmarm, kvmarm, kvm, peterx, will, catalin.marinas, bgardon,
	shuah, andrew.jones, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, shan.gavin

Hi Sean,

On 10/22/22 12:05 AM, Sean Christopherson wrote:
> On Fri, Oct 21, 2022, Marc Zyngier wrote:
>> On Fri, 21 Oct 2022 00:44:51 +0100,
>> Sean Christopherson <seanjc@google.com> wrote:
>>>
>>> On Tue, Oct 11, 2022, Gavin Shan wrote:
>>>> Some architectures (such as arm64) need to dirty memory outside of the
>>>> context of a vCPU. Of course, this simply doesn't fit with the UAPI of
>>>> KVM's per-vCPU dirty ring.
>>>
>>> What is the point of using the dirty ring in this case?  KVM still
>>> burns a pile of memory for the bitmap.  Is the benefit that
>>> userspace can get away with scanning the bitmap fewer times,
>>> e.g. scan it once just before blackout under the assumption that
>>> very few pages will dirty the bitmap?
>>
>> Apparently, the throttling effect of the ring makes it easier to
>> converge. Someone who actually uses the feature should be able to
>> tell you. But that's a policy decision, and I don't see why we should
>> be prescriptive.
> 
> I wasn't suggesting we be prescriptive, it was an honest question.
> 
>>> Why not add a global ring to @kvm?  I assume thread safety is a
>>> problem, but the memory overhead of the dirty_bitmap also seems like
>>> a fairly big problem.
>>
>> Because we already have a stupidly bloated API surface, and that we
>> could do without yet another one based on a sample of *one*?
> 
> But we're adding a new API regardless.  A per-VM ring would definitely be a bigger
> addition, but if using the dirty_bitmap won't actually meet the needs of userspace,
> then we'll have added a new API and still not have solved the problem.  That's why
> I was asking why/when userspace would want to use dirty_ring+dirty_bitmap.
> 

Bitmap can help to solve the issue, but the extra memory consumption due to
the bitmap is a concern, as you mentioned previously. More information about
the issue can be found here [1]. On ARM64, multiple guest's physical pages are
used by VGIC/ITS to store its states during migration or system shutdown.

[1] https://lore.kernel.org/kvmarm/320005d1-fe88-fd6a-be91-ddb56f1aa80f@redhat.com/

>> Because dirtying memory outside of a vcpu context makes it incredibly awkward
>> to handle a "ring full" condition?
> 
> Kicking all vCPUs with the soft-full request isn't _that_ awkward.  It's certainly
> sub-optimal, but if inserting into the per-VM ring is relatively rare, then in
> practice it's unlikely to impact guest performance.
> 

It's still possible the per-vcpu-ring becomes hard full before it can be
kicked off. per-vm-ring has other issues, one of which is synchronization
between kvm and userspace to avoid overrunning per-kvm-ring. bitmap was
selected due to its simplicity.

>>>> Introduce a new flavor of dirty ring that requires the use of both vCPU
>>>> dirty rings and a dirty bitmap. The expectation is that for non-vCPU
>>>> sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
>>>> the dirty bitmap. Userspace should scan the dirty bitmap before
>>>> migrating the VM to the target.
>>>>
>>>> Use an additional capability to advertize this behavior and require
>>>> explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
>>>> you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
>>>> not allow userspace to enable dirty ring if it hasn't also enabled the
>>>> ring && bitmap capability, as a VM is likely DOA without the pages
>>>> marked in the bitmap.
>>
>> This is wrong. The *only* case this is useful is when there is an
>> in-kernel producer of data outside of the context of a vcpu, which is
>> so far only the ITS save mechanism. No ITS? No need for this.
> 
> How large is the ITS?  If it's a fixed, small size, could we treat the ITS as a
> one-off case for now?  E.g. do something gross like shove entries into vcpu0's
> dirty ring?
> 

There are several VGIC/ITS tables involved in the issue. I checked the
specification and the implementation. As the device ID is 16-bits, so
the maximal devices can be 0x10000. Each device has its ITT (Interrupt
Translation Table), looked by a 32-bits event ID. The memory used for
ITT can be large enough in theory.

     Register       Description           Max-size   Entry-size  Max-entries
     -----------------------------------------------------------------------
     GITS_BASER0    ITS Device Table      512KB      8-bytes     0x10000
     GITS_BASER1    ITS Collection Table  512KB      8-bytes     0x10000
     GITS_BASER2    (GICv4) ITS VPE Table 512KB      8-bytes(?)  0x10000

     max-devices * (1UL << event_id_shift) * entry_size =
     0x10000 * (1UL << 32) * 8                          = 1PB

>> Userspace knows what it has created the first place, and should be in
>> charge of it (i.e. I want to be able to migrate my GICv2 and
>> GICv3-without-ITS VMs with the rings only).
> 
> Ah, so enabling the dirty bitmap isn't strictly required.  That means this patch
> is wrong, and it also means that we need to figure out how we want to handle the
> case where mark_page_dirty_in_slot() is invoked without a running vCPU on a memslot
> without a dirty_bitmap.
> 
> I.e. what's an appropriate action in the below sequence:
> 
> void mark_page_dirty_in_slot(struct kvm *kvm,
> 			     const struct kvm_memory_slot *memslot,
> 		 	     gfn_t gfn)
> {
> 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> 
> #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> 	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> 		return;
> 
> #ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> 	if (WARN_ON_ONCE(!vcpu))
> 		return;
> #endif
> #endif
> 
> 	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> 		unsigned long rel_gfn = gfn - memslot->base_gfn;
> 		u32 slot = (memslot->as_id << 16) | memslot->id;
> 
> 		if (vcpu && kvm->dirty_ring_size)
> 			kvm_dirty_ring_push(&vcpu->dirty_ring,
> 					    slot, rel_gfn);
> 		else if (memslot->dirty_bitmap)
> 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> 		else
> 			???? <=================================================
> 	}
> }
> 
> 
> Would it be possible to require a dirty bitmap when an ITS is created?  That would
> allow treating the above condition as a KVM bug.
> 

According to the above calculation, it's impossible to determine the memory size for
the bitmap in advance. The memory used by ITE (Interrupt Translation Entry) tables
can be huge enough to use all guest's system memory in theory. ITE tables are scattered
in guest's system memory, but we don't know its location in advance. ITE tables are
created dynamically on requests from guest.

However, I think it's a good idea to enable the bitmap only when "arm-its-kvm" is
really used in userspace (QEMU). For example, the machine and (kvm) accelerator are
initialized like below. It's unknown if "arm-its-kvm" is used until (c). So we can
enable KVM_CAP_DIRTY_RING_WITH_BITMAP in (d) and the bitmap is created in (e) by
KVM.

   main
     qemu_init
       qemu_create_machine                   (a) machine instance is created
       configure_accelerators
         do_configure_accelerator
           accel_init_machine
             kvm_init                        (b) KVM is initialized
       :
       qmp_x_exit_preconfig
         qemu_init_board
           machine_run_board_init            (c) The board is initialized
       :
       accel_setup_post                      (d) KVM is post initialized
       :
       <migration>                           (e) Migration starts

In order to record if the bitmap is really needed, "struct kvm::dirty_ring_with_bitmap"
is still needed.

    - KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is advertised when CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
      is selected.

    - KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled in (d) only when "arm-its-kvm"
      is used in QEMU. After the capability is enabled, "struct kvm::dirty_ring_with_bitmap"
      is set to 1.

    - The bitmap is created by KVM in (e).

If the above analysis makes sense, I don't see there is anything missed from the patch
Of course, KVM_CAP_DIRTY_LOG_RING_{ACQ_REL, WITH_BITMAP} needs to be enabled separately
and don't depend on each other. the description added to "Documentation/virt/kvm/abi.rst"
need to be improved as Peter and Oliver suggested. kvm_dirty_ring_exclusive() needs to be
renamed to kvm_use_dirty_bitmap() and "#ifdef" needs to be cut down as Sean suggested.


>>>> @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
>>>>   {
>>>>   	int r;
>>>>   
>>>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>>>> +	if (!kvm->dirty_ring_with_bitmap)
>>>> +		return -EINVAL;
>>>> +#endif
>>>
>>> This one at least is prettier with IS_ENABLED
>>>
>>> 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
>>> 	    !kvm->dirty_ring_with_bitmap)
>>> 		return -EINVAL;
>>>
>>> But dirty_ring_with_bitmap really shouldn't need to exist.  It's
>>> mandatory for architectures that have
>>> HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for architectures
>>> that don't.  In other words, the API for enabling the dirty ring is
>>> a bit ugly.
>>>
>>> Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been
>>> officially released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP
>>> on top, what about usurping bits 63:32 of cap->args[0] for flags?
>>> E.g.
> 
> For posterity, filling in my missing idea...
> 
> Since the size is restricted to be well below a 32-bit value, and it's unlikely
> that KVM will ever support 4GiB per-vCPU rings, we could usurp the upper bits for
> flags:
> 
>    static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u64 arg0)
>    {
> 	u32 flags = arg0 >> 32;
> 	u32 size = arg0;
> 
> However, since it sounds like enabling dirty_bitmap isn't strictly required, I
> have no objection to enabling KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, my objection
> was purely that KVM was adding a per-VM flag just to sanity check the configuration.
> 

If KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled for "arm-its-kvm", it'd better
to allow enabling those two capability (ACQ_REL and WITH_BITMAP) separately, as I
explained above. userspace (QEMU) will gain flexibility if these two capabilities
can be enabled separately.

To QEMU, KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL are accelerator's
properties. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is board's property. Relaxing their
dependency will give flexibility to QEMU.


[...]

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-21 16:05         ` Sean Christopherson
@ 2022-10-22 10:33           ` Marc Zyngier
  -1 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-22 10:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Fri, 21 Oct 2022 17:05:26 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Fri, Oct 21, 2022, Marc Zyngier wrote:
> > On Fri, 21 Oct 2022 00:44:51 +0100,
> > Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > On Tue, Oct 11, 2022, Gavin Shan wrote:
> > > > Some architectures (such as arm64) need to dirty memory outside of the
> > > > context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> > > > KVM's per-vCPU dirty ring.
> > > 
> > > What is the point of using the dirty ring in this case?  KVM still
> > > burns a pile of memory for the bitmap.  Is the benefit that
> > > userspace can get away with scanning the bitmap fewer times,
> > > e.g. scan it once just before blackout under the assumption that
> > > very few pages will dirty the bitmap?
> > 
> > Apparently, the throttling effect of the ring makes it easier to
> > converge. Someone who actually uses the feature should be able to
> > tell you. But that's a policy decision, and I don't see why we should
> > be prescriptive.
> 
> I wasn't suggesting we be prescriptive, it was an honest question.
> 
> > > Why not add a global ring to @kvm?  I assume thread safety is a
> > > problem, but the memory overhead of the dirty_bitmap also seems like
> > > a fairly big problem.
> > 
> > Because we already have a stupidly bloated API surface, and that we
> > could do without yet another one based on a sample of *one*?
> 
> But we're adding a new API regardless.  A per-VM ring would
> definitely be a bigger addition, but if using the dirty_bitmap won't
> actually meet the needs of userspace, then we'll have added a new
> API and still not have solved the problem.  That's why I was asking
> why/when userspace would want to use dirty_ring+dirty_bitmap.

Whenever dirty pages can be generated outside of the context of a vcpu
running. And that's anything that comes from *devices*.

> 
> > Because dirtying memory outside of a vcpu context makes it
> > incredibly awkward to handle a "ring full" condition?
> 
> Kicking all vCPUs with the soft-full request isn't _that_ awkward.
> It's certainly sub-optimal, but if inserting into the per-VM ring is
> relatively rare, then in practice it's unlikely to impact guest
> performance.

But there is *nothing* to kick here. The kernel is dirtying pages,
devices are dirtying pages (DMA), and there is no context associated
with that. Which is why a finite ring is the wrong abstraction.

> 
> > > > Introduce a new flavor of dirty ring that requires the use of both vCPU
> > > > dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> > > > sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> > > > the dirty bitmap. Userspace should scan the dirty bitmap before
> > > > migrating the VM to the target.
> > > > 
> > > > Use an additional capability to advertize this behavior and require
> > > > explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> > > > you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> > > > not allow userspace to enable dirty ring if it hasn't also enabled the
> > > > ring && bitmap capability, as a VM is likely DOA without the pages
> > > > marked in the bitmap.
> > 
> > This is wrong. The *only* case this is useful is when there is an
> > in-kernel producer of data outside of the context of a vcpu, which is
> > so far only the ITS save mechanism. No ITS? No need for this.
> 
> How large is the ITS?  If it's a fixed, small size, could we treat
> the ITS as a one-off case for now?  E.g. do something gross like
> shove entries into vcpu0's dirty ring?

The tables can be arbitrarily large, sparse, and are under control of
the guest anyway. And no, I'm not entertaining anything that gross.
I'm actually quite happy with not supporting the dirty ring and stick
with the bitmap that doesn't have any of these problems.

> 
> > Userspace knows what it has created the first place, and should be in
> > charge of it (i.e. I want to be able to migrate my GICv2 and
> > GICv3-without-ITS VMs with the rings only).
> 
> Ah, so enabling the dirty bitmap isn't strictly required.  That
> means this patch is wrong, and it also means that we need to figure
> out how we want to handle the case where mark_page_dirty_in_slot()
> is invoked without a running vCPU on a memslot without a
> dirty_bitmap.
> 
> I.e. what's an appropriate action in the below sequence:
> 
> void mark_page_dirty_in_slot(struct kvm *kvm,
> 			     const struct kvm_memory_slot *memslot,
> 		 	     gfn_t gfn)
> {
> 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> 
> #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> 	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> 		return;
> 
> #ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> 	if (WARN_ON_ONCE(!vcpu))
> 		return;
> #endif
> #endif
> 
> 	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> 		unsigned long rel_gfn = gfn - memslot->base_gfn;
> 		u32 slot = (memslot->as_id << 16) | memslot->id;
> 
> 		if (vcpu && kvm->dirty_ring_size)
> 			kvm_dirty_ring_push(&vcpu->dirty_ring,
> 					    slot, rel_gfn);
> 		else if (memslot->dirty_bitmap)
> 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> 		else
> 			???? <=================================================
> 	}
> }
> 
> 
> Would it be possible to require a dirty bitmap when an ITS is
> created?  That would allow treating the above condition as a KVM
> bug.

No. This should be optional. Everything about migration should be
absolutely optional (I run plenty of concurrent VMs on sub-2GB
systems). You want to migrate a VM that has an ITS or will collect
dirty bits originating from a SMMU with HTTU, you enable the dirty
bitmap. You want to have *vcpu* based dirty rings, you enable them.

In short, there shouldn't be any reason for the two are either
mandatory or conflated. Both should be optional, independent, because
they cover completely disjoined use cases. *userspace* should be in
charge of deciding this.

> 
> > > > @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
> > > >  {
> > > >  	int r;
> > > >  
> > > > +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > > > +	if (!kvm->dirty_ring_with_bitmap)
> > > > +		return -EINVAL;
> > > > +#endif
> > > 
> > > This one at least is prettier with IS_ENABLED
> > > 
> > > 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
> > > 	    !kvm->dirty_ring_with_bitmap)
> > > 		return -EINVAL;
> > > 
> > > But dirty_ring_with_bitmap really shouldn't need to exist.  It's
> > > mandatory for architectures that have
> > > HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for architectures
> > > that don't.  In other words, the API for enabling the dirty ring is
> > > a bit ugly.
> > > 
> > > Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been
> > > officially released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP
> > > on top, what about usurping bits 63:32 of cap->args[0] for flags?
> > > E.g.
> 
> For posterity, filling in my missing idea...
> 
> Since the size is restricted to be well below a 32-bit value, and it's unlikely
> that KVM will ever support 4GiB per-vCPU rings, we could usurp the upper bits for
> flags:
> 
>   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u64 arg0)
>   {
> 	u32 flags = arg0 >> 32;
> 	u32 size = arg0;
> 
> However, since it sounds like enabling dirty_bitmap isn't strictly required, I
> have no objection to enabling KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, my objection
> was purely that KVM was adding a per-VM flag just to sanity check the configuration.
> 
> > > Ideally we'd use cap->flags directly, but we screwed up with
> > > KVM_CAP_DIRTY_LOG_RING and didn't require flags to be zero :-(
> > >
> > > Actually, what's the point of allowing
> > > KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be enabled?  I get why KVM would
> > > enumerate this info, i.e. allowing checking, but I don't seen any
> > > value in supporting a second method for enabling the dirty ring.
> > > 
> > > The acquire-release thing is irrelevant for x86, and no other
> > > architecture supports the dirty ring until this series, i.e. there's
> > > no need for KVM to detect that userspace has been updated to gain
> > > acquire-release semantics, because the fact that userspace is
> > > enabling the dirty ring on arm64 means userspace has been updated.
> > 
> > Do we really need to make the API more awkward? There is an
> > established pattern of "enable what is advertised". Some level of
> > uniformity wouldn't hurt, really.
> 
> I agree that uniformity would be nice, but for capabilities I don't
> think that's ever going to happen.  I'm pretty sure supporting
> enabling is actually in the minority.  E.g. of the 20 capabilities
> handled in kvm_vm_ioctl_check_extension_generic(), I believe only 3
> support enabling (KVM_CAP_HALT_POLL, KVM_CAP_DIRTY_LOG_RING, and
> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2).

I understood that you were advocating that a check for KVM_CAP_FOO
could result in enabling KVM_CAP_BAR. That I definitely object to.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-22 10:33           ` Marc Zyngier
  0 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-22 10:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, peterx, will, catalin.marinas,
	bgardon, shuah, andrew.jones, dmatlack, pbonzini, zhenyzha,
	james.morse, suzuki.poulose, alexandru.elisei, oliver.upton,
	shan.gavin

On Fri, 21 Oct 2022 17:05:26 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Fri, Oct 21, 2022, Marc Zyngier wrote:
> > On Fri, 21 Oct 2022 00:44:51 +0100,
> > Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > On Tue, Oct 11, 2022, Gavin Shan wrote:
> > > > Some architectures (such as arm64) need to dirty memory outside of the
> > > > context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> > > > KVM's per-vCPU dirty ring.
> > > 
> > > What is the point of using the dirty ring in this case?  KVM still
> > > burns a pile of memory for the bitmap.  Is the benefit that
> > > userspace can get away with scanning the bitmap fewer times,
> > > e.g. scan it once just before blackout under the assumption that
> > > very few pages will dirty the bitmap?
> > 
> > Apparently, the throttling effect of the ring makes it easier to
> > converge. Someone who actually uses the feature should be able to
> > tell you. But that's a policy decision, and I don't see why we should
> > be prescriptive.
> 
> I wasn't suggesting we be prescriptive, it was an honest question.
> 
> > > Why not add a global ring to @kvm?  I assume thread safety is a
> > > problem, but the memory overhead of the dirty_bitmap also seems like
> > > a fairly big problem.
> > 
> > Because we already have a stupidly bloated API surface, and that we
> > could do without yet another one based on a sample of *one*?
> 
> But we're adding a new API regardless.  A per-VM ring would
> definitely be a bigger addition, but if using the dirty_bitmap won't
> actually meet the needs of userspace, then we'll have added a new
> API and still not have solved the problem.  That's why I was asking
> why/when userspace would want to use dirty_ring+dirty_bitmap.

Whenever dirty pages can be generated outside of the context of a vcpu
running. And that's anything that comes from *devices*.

> 
> > Because dirtying memory outside of a vcpu context makes it
> > incredibly awkward to handle a "ring full" condition?
> 
> Kicking all vCPUs with the soft-full request isn't _that_ awkward.
> It's certainly sub-optimal, but if inserting into the per-VM ring is
> relatively rare, then in practice it's unlikely to impact guest
> performance.

But there is *nothing* to kick here. The kernel is dirtying pages,
devices are dirtying pages (DMA), and there is no context associated
with that. Which is why a finite ring is the wrong abstraction.

> 
> > > > Introduce a new flavor of dirty ring that requires the use of both vCPU
> > > > dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> > > > sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> > > > the dirty bitmap. Userspace should scan the dirty bitmap before
> > > > migrating the VM to the target.
> > > > 
> > > > Use an additional capability to advertize this behavior and require
> > > > explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> > > > you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> > > > not allow userspace to enable dirty ring if it hasn't also enabled the
> > > > ring && bitmap capability, as a VM is likely DOA without the pages
> > > > marked in the bitmap.
> > 
> > This is wrong. The *only* case this is useful is when there is an
> > in-kernel producer of data outside of the context of a vcpu, which is
> > so far only the ITS save mechanism. No ITS? No need for this.
> 
> How large is the ITS?  If it's a fixed, small size, could we treat
> the ITS as a one-off case for now?  E.g. do something gross like
> shove entries into vcpu0's dirty ring?

The tables can be arbitrarily large, sparse, and are under control of
the guest anyway. And no, I'm not entertaining anything that gross.
I'm actually quite happy with not supporting the dirty ring and stick
with the bitmap that doesn't have any of these problems.

> 
> > Userspace knows what it has created the first place, and should be in
> > charge of it (i.e. I want to be able to migrate my GICv2 and
> > GICv3-without-ITS VMs with the rings only).
> 
> Ah, so enabling the dirty bitmap isn't strictly required.  That
> means this patch is wrong, and it also means that we need to figure
> out how we want to handle the case where mark_page_dirty_in_slot()
> is invoked without a running vCPU on a memslot without a
> dirty_bitmap.
> 
> I.e. what's an appropriate action in the below sequence:
> 
> void mark_page_dirty_in_slot(struct kvm *kvm,
> 			     const struct kvm_memory_slot *memslot,
> 		 	     gfn_t gfn)
> {
> 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> 
> #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> 	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> 		return;
> 
> #ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> 	if (WARN_ON_ONCE(!vcpu))
> 		return;
> #endif
> #endif
> 
> 	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> 		unsigned long rel_gfn = gfn - memslot->base_gfn;
> 		u32 slot = (memslot->as_id << 16) | memslot->id;
> 
> 		if (vcpu && kvm->dirty_ring_size)
> 			kvm_dirty_ring_push(&vcpu->dirty_ring,
> 					    slot, rel_gfn);
> 		else if (memslot->dirty_bitmap)
> 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> 		else
> 			???? <=================================================
> 	}
> }
> 
> 
> Would it be possible to require a dirty bitmap when an ITS is
> created?  That would allow treating the above condition as a KVM
> bug.

No. This should be optional. Everything about migration should be
absolutely optional (I run plenty of concurrent VMs on sub-2GB
systems). You want to migrate a VM that has an ITS or will collect
dirty bits originating from a SMMU with HTTU, you enable the dirty
bitmap. You want to have *vcpu* based dirty rings, you enable them.

In short, there shouldn't be any reason for the two are either
mandatory or conflated. Both should be optional, independent, because
they cover completely disjoined use cases. *userspace* should be in
charge of deciding this.

> 
> > > > @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
> > > >  {
> > > >  	int r;
> > > >  
> > > > +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > > > +	if (!kvm->dirty_ring_with_bitmap)
> > > > +		return -EINVAL;
> > > > +#endif
> > > 
> > > This one at least is prettier with IS_ENABLED
> > > 
> > > 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
> > > 	    !kvm->dirty_ring_with_bitmap)
> > > 		return -EINVAL;
> > > 
> > > But dirty_ring_with_bitmap really shouldn't need to exist.  It's
> > > mandatory for architectures that have
> > > HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for architectures
> > > that don't.  In other words, the API for enabling the dirty ring is
> > > a bit ugly.
> > > 
> > > Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been
> > > officially released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP
> > > on top, what about usurping bits 63:32 of cap->args[0] for flags?
> > > E.g.
> 
> For posterity, filling in my missing idea...
> 
> Since the size is restricted to be well below a 32-bit value, and it's unlikely
> that KVM will ever support 4GiB per-vCPU rings, we could usurp the upper bits for
> flags:
> 
>   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u64 arg0)
>   {
> 	u32 flags = arg0 >> 32;
> 	u32 size = arg0;
> 
> However, since it sounds like enabling dirty_bitmap isn't strictly required, I
> have no objection to enabling KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, my objection
> was purely that KVM was adding a per-VM flag just to sanity check the configuration.
> 
> > > Ideally we'd use cap->flags directly, but we screwed up with
> > > KVM_CAP_DIRTY_LOG_RING and didn't require flags to be zero :-(
> > >
> > > Actually, what's the point of allowing
> > > KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be enabled?  I get why KVM would
> > > enumerate this info, i.e. allowing checking, but I don't seen any
> > > value in supporting a second method for enabling the dirty ring.
> > > 
> > > The acquire-release thing is irrelevant for x86, and no other
> > > architecture supports the dirty ring until this series, i.e. there's
> > > no need for KVM to detect that userspace has been updated to gain
> > > acquire-release semantics, because the fact that userspace is
> > > enabling the dirty ring on arm64 means userspace has been updated.
> > 
> > Do we really need to make the API more awkward? There is an
> > established pattern of "enable what is advertised". Some level of
> > uniformity wouldn't hurt, really.
> 
> I agree that uniformity would be nice, but for capabilities I don't
> think that's ever going to happen.  I'm pretty sure supporting
> enabling is actually in the minority.  E.g. of the 20 capabilities
> handled in kvm_vm_ioctl_check_extension_generic(), I believe only 3
> support enabling (KVM_CAP_HALT_POLL, KVM_CAP_DIRTY_LOG_RING, and
> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2).

I understood that you were advocating that a check for KVM_CAP_FOO
could result in enabling KVM_CAP_BAR. That I definitely object to.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-22  8:27           ` Gavin Shan
@ 2022-10-22 10:54             ` Marc Zyngier
  -1 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-22 10:54 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, kvm, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Sat, 22 Oct 2022 09:27:41 +0100,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Hi Sean,
> 
> On 10/22/22 12:05 AM, Sean Christopherson wrote:
> > On Fri, Oct 21, 2022, Marc Zyngier wrote:
> >> On Fri, 21 Oct 2022 00:44:51 +0100,
> >> Sean Christopherson <seanjc@google.com> wrote:
> >>> 
> >>> On Tue, Oct 11, 2022, Gavin Shan wrote:
> >>>> Some architectures (such as arm64) need to dirty memory outside of the
> >>>> context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> >>>> KVM's per-vCPU dirty ring.
> >>> 
> >>> What is the point of using the dirty ring in this case?  KVM still
> >>> burns a pile of memory for the bitmap.  Is the benefit that
> >>> userspace can get away with scanning the bitmap fewer times,
> >>> e.g. scan it once just before blackout under the assumption that
> >>> very few pages will dirty the bitmap?
> >> 
> >> Apparently, the throttling effect of the ring makes it easier to
> >> converge. Someone who actually uses the feature should be able to
> >> tell you. But that's a policy decision, and I don't see why we should
> >> be prescriptive.
> > 
> > I wasn't suggesting we be prescriptive, it was an honest question.
> > 
> >>> Why not add a global ring to @kvm?  I assume thread safety is a
> >>> problem, but the memory overhead of the dirty_bitmap also seems like
> >>> a fairly big problem.
> >> 
> >> Because we already have a stupidly bloated API surface, and that we
> >> could do without yet another one based on a sample of *one*?
> > 
> > But we're adding a new API regardless.  A per-VM ring would definitely be a bigger
> > addition, but if using the dirty_bitmap won't actually meet the needs of userspace,
> > then we'll have added a new API and still not have solved the problem.  That's why
> > I was asking why/when userspace would want to use dirty_ring+dirty_bitmap.
> > 
> 
> Bitmap can help to solve the issue, but the extra memory consumption due to
> the bitmap is a concern, as you mentioned previously. More information about
> the issue can be found here [1]. On ARM64, multiple guest's physical pages are
> used by VGIC/ITS to store its states during migration or system shutdown.
> 
> [1] https://lore.kernel.org/kvmarm/320005d1-fe88-fd6a-be91-ddb56f1aa80f@redhat.com/
> 
> >> Because dirtying memory outside of a vcpu context makes it incredibly awkward
> >> to handle a "ring full" condition?
> > 
> > Kicking all vCPUs with the soft-full request isn't _that_ awkward.  It's certainly
> > sub-optimal, but if inserting into the per-VM ring is relatively rare, then in
> > practice it's unlikely to impact guest performance.
> > 
> 
> It's still possible the per-vcpu-ring becomes hard full before it can be
> kicked off. per-vm-ring has other issues, one of which is synchronization
> between kvm and userspace to avoid overrunning per-kvm-ring. bitmap was
> selected due to its simplicity.

Exactly. And once you overflow a ring because the device generate too
much data, what do you do? Return an error to the device?

> 
> >>>> Introduce a new flavor of dirty ring that requires the use of both vCPU
> >>>> dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> >>>> sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> >>>> the dirty bitmap. Userspace should scan the dirty bitmap before
> >>>> migrating the VM to the target.
> >>>> 
> >>>> Use an additional capability to advertize this behavior and require
> >>>> explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> >>>> you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> >>>> not allow userspace to enable dirty ring if it hasn't also enabled the
> >>>> ring && bitmap capability, as a VM is likely DOA without the pages
> >>>> marked in the bitmap.
> >> 
> >> This is wrong. The *only* case this is useful is when there is an
> >> in-kernel producer of data outside of the context of a vcpu, which is
> >> so far only the ITS save mechanism. No ITS? No need for this.
> > 
> > How large is the ITS?  If it's a fixed, small size, could we treat the ITS as a
> > one-off case for now?  E.g. do something gross like shove entries into vcpu0's
> > dirty ring?
> > 
> 
> There are several VGIC/ITS tables involved in the issue. I checked the
> specification and the implementation. As the device ID is 16-bits, so
> the maximal devices can be 0x10000. Each device has its ITT (Interrupt
> Translation Table), looked by a 32-bits event ID. The memory used for
> ITT can be large enough in theory.
> 
>     Register       Description           Max-size   Entry-size  Max-entries
>     -----------------------------------------------------------------------
>     GITS_BASER0    ITS Device Table      512KB      8-bytes     0x10000
>     GITS_BASER1    ITS Collection Table  512KB      8-bytes     0x10000

Both can be two levels. So you can multiply the max size by 64K. The
entry size also depends on the revision of the ABI and can be changed
anytime we see fit.

>     GITS_BASER2    (GICv4) ITS VPE Table 512KB      8-bytes(?)  0x10000

We don't virtualise GICv4. We use GICv4 to virtualise a GICv3. So this
table will never be saved (the guest never sees it, and only KVM
manages it).

>     max-devices * (1UL << event_id_shift) * entry_size =
>     0x10000 * (1UL << 32) * 8                          = 1PB
> 
> >> Userspace knows what it has created the first place, and should be in
> >> charge of it (i.e. I want to be able to migrate my GICv2 and
> >> GICv3-without-ITS VMs with the rings only).
> > 
> > Ah, so enabling the dirty bitmap isn't strictly required.  That means this patch
> > is wrong, and it also means that we need to figure out how we want to handle the
> > case where mark_page_dirty_in_slot() is invoked without a running vCPU on a memslot
> > without a dirty_bitmap.
> > 
> > I.e. what's an appropriate action in the below sequence:
> > 
> > void mark_page_dirty_in_slot(struct kvm *kvm,
> > 			     const struct kvm_memory_slot *memslot,
> > 		 	     gfn_t gfn)
> > {
> > 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> > 
> > #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> > 	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> > 		return;
> > 
> > #ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > 	if (WARN_ON_ONCE(!vcpu))
> > 		return;
> > #endif
> > #endif
> > 
> > 	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> > 		unsigned long rel_gfn = gfn - memslot->base_gfn;
> > 		u32 slot = (memslot->as_id << 16) | memslot->id;
> > 
> > 		if (vcpu && kvm->dirty_ring_size)
> > 			kvm_dirty_ring_push(&vcpu->dirty_ring,
> > 					    slot, rel_gfn);
> > 		else if (memslot->dirty_bitmap)
> > 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> > 		else
> > 			???? <=================================================
> > 	}
> > }
> > 
> > 
> > Would it be possible to require a dirty bitmap when an ITS is
> > created?  That would allow treating the above condition as a KVM
> > bug.
> > 
> 
> According to the above calculation, it's impossible to determine the
> memory size for the bitmap in advance. The memory used by ITE
> (Interrupt Translation Entry) tables can be huge enough to use all
> guest's system memory in theory. ITE tables are scattered in guest's
> system memory, but we don't know its location in advance. ITE tables
> are created dynamically on requests from guest.
> 
> However, I think it's a good idea to enable the bitmap only when
> "arm-its-kvm" is really used in userspace (QEMU). For example, the
> machine and (kvm) accelerator are initialized like below. It's
> unknown if "arm-its-kvm" is used until (c). So we can enable
> KVM_CAP_DIRTY_RING_WITH_BITMAP in (d) and the bitmap is created in
> (e) by KVM.
> 
>   main
>     qemu_init
>       qemu_create_machine                   (a) machine instance is created
>       configure_accelerators
>         do_configure_accelerator
>           accel_init_machine
>             kvm_init                        (b) KVM is initialized
>       :
>       qmp_x_exit_preconfig
>         qemu_init_board
>           machine_run_board_init            (c) The board is initialized
>       :
>       accel_setup_post                      (d) KVM is post initialized
>       :
>       <migration>                           (e) Migration starts
> 
> In order to record if the bitmap is really needed, "struct
> kvm::dirty_ring_with_bitmap" is still needed.
> 
>    - KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is advertised when
>      CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP is selected.
> 
>    - KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled in (d) only when
>      "arm-its-kvm" is used in QEMU. After the capability is enabled,
>      "struct kvm::dirty_ring_with_bitmap" is set to 1.
> 
>    - The bitmap is created by KVM in (e).
> 
> If the above analysis makes sense, I don't see there is anything
> missed from the patch Of course, KVM_CAP_DIRTY_LOG_RING_{ACQ_REL,
> WITH_BITMAP} needs to be enabled separately and don't depend on each
> other. the description added to "Documentation/virt/kvm/abi.rst"
> need to be improved as Peter and Oliver suggested.
> kvm_dirty_ring_exclusive() needs to be renamed to
> kvm_use_dirty_bitmap() and "#ifdef" needs to be cut down as Sean
> suggested.

Frankly, I really hate the "mayo and ketchup" approach. Both dirty
tracking approaches serve different purpose, and I really don't see
the point in merging them behind a single cap. Userspace should be
able to chose if and when it wants to use a logging method or another.
We should document how they interact, but that's about it.


> 
> 
> >>>> @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
> >>>>   {
> >>>>   	int r;
> >>>>   +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> >>>> +	if (!kvm->dirty_ring_with_bitmap)
> >>>> +		return -EINVAL;
> >>>> +#endif
> >>> 
> >>> This one at least is prettier with IS_ENABLED
> >>> 
> >>> 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
> >>> 	    !kvm->dirty_ring_with_bitmap)
> >>> 		return -EINVAL;
> >>> 
> >>> But dirty_ring_with_bitmap really shouldn't need to exist.  It's
> >>> mandatory for architectures that have
> >>> HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for architectures
> >>> that don't.  In other words, the API for enabling the dirty ring is
> >>> a bit ugly.
> >>> 
> >>> Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been
> >>> officially released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP
> >>> on top, what about usurping bits 63:32 of cap->args[0] for flags?
> >>> E.g.
> > 
> > For posterity, filling in my missing idea...
> > 
> > Since the size is restricted to be well below a 32-bit value, and it's unlikely
> > that KVM will ever support 4GiB per-vCPU rings, we could usurp the upper bits for
> > flags:
> > 
> >    static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u64 arg0)
> >    {
> > 	u32 flags = arg0 >> 32;
> > 	u32 size = arg0;
> > 
> > However, since it sounds like enabling dirty_bitmap isn't strictly
> > required, I have no objection to enabling
> > KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, my objection was purely that
> > KVM was adding a per-VM flag just to sanity check the
> > configuration.
> > 
> 
> If KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled for "arm-its-kvm",
> it'd better to allow enabling those two capability (ACQ_REL and
> WITH_BITMAP) separately, as I explained above. userspace (QEMU) will
> gain flexibility if these two capabilities can be enabled
> separately.
> 
> To QEMU, KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> are accelerator's properties. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is
> board's property. Relaxing their dependency will give flexibility to
> QEMU.

That.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-22 10:54             ` Marc Zyngier
  0 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-22 10:54 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Sean Christopherson, kvmarm, kvmarm, kvm, peterx, will,
	catalin.marinas, bgardon, shuah, andrew.jones, dmatlack,
	pbonzini, zhenyzha, james.morse, suzuki.poulose,
	alexandru.elisei, oliver.upton, shan.gavin

On Sat, 22 Oct 2022 09:27:41 +0100,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Hi Sean,
> 
> On 10/22/22 12:05 AM, Sean Christopherson wrote:
> > On Fri, Oct 21, 2022, Marc Zyngier wrote:
> >> On Fri, 21 Oct 2022 00:44:51 +0100,
> >> Sean Christopherson <seanjc@google.com> wrote:
> >>> 
> >>> On Tue, Oct 11, 2022, Gavin Shan wrote:
> >>>> Some architectures (such as arm64) need to dirty memory outside of the
> >>>> context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> >>>> KVM's per-vCPU dirty ring.
> >>> 
> >>> What is the point of using the dirty ring in this case?  KVM still
> >>> burns a pile of memory for the bitmap.  Is the benefit that
> >>> userspace can get away with scanning the bitmap fewer times,
> >>> e.g. scan it once just before blackout under the assumption that
> >>> very few pages will dirty the bitmap?
> >> 
> >> Apparently, the throttling effect of the ring makes it easier to
> >> converge. Someone who actually uses the feature should be able to
> >> tell you. But that's a policy decision, and I don't see why we should
> >> be prescriptive.
> > 
> > I wasn't suggesting we be prescriptive, it was an honest question.
> > 
> >>> Why not add a global ring to @kvm?  I assume thread safety is a
> >>> problem, but the memory overhead of the dirty_bitmap also seems like
> >>> a fairly big problem.
> >> 
> >> Because we already have a stupidly bloated API surface, and that we
> >> could do without yet another one based on a sample of *one*?
> > 
> > But we're adding a new API regardless.  A per-VM ring would definitely be a bigger
> > addition, but if using the dirty_bitmap won't actually meet the needs of userspace,
> > then we'll have added a new API and still not have solved the problem.  That's why
> > I was asking why/when userspace would want to use dirty_ring+dirty_bitmap.
> > 
> 
> Bitmap can help to solve the issue, but the extra memory consumption due to
> the bitmap is a concern, as you mentioned previously. More information about
> the issue can be found here [1]. On ARM64, multiple guest's physical pages are
> used by VGIC/ITS to store its states during migration or system shutdown.
> 
> [1] https://lore.kernel.org/kvmarm/320005d1-fe88-fd6a-be91-ddb56f1aa80f@redhat.com/
> 
> >> Because dirtying memory outside of a vcpu context makes it incredibly awkward
> >> to handle a "ring full" condition?
> > 
> > Kicking all vCPUs with the soft-full request isn't _that_ awkward.  It's certainly
> > sub-optimal, but if inserting into the per-VM ring is relatively rare, then in
> > practice it's unlikely to impact guest performance.
> > 
> 
> It's still possible the per-vcpu-ring becomes hard full before it can be
> kicked off. per-vm-ring has other issues, one of which is synchronization
> between kvm and userspace to avoid overrunning per-kvm-ring. bitmap was
> selected due to its simplicity.

Exactly. And once you overflow a ring because the device generate too
much data, what do you do? Return an error to the device?

> 
> >>>> Introduce a new flavor of dirty ring that requires the use of both vCPU
> >>>> dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> >>>> sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> >>>> the dirty bitmap. Userspace should scan the dirty bitmap before
> >>>> migrating the VM to the target.
> >>>> 
> >>>> Use an additional capability to advertize this behavior and require
> >>>> explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> >>>> you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> >>>> not allow userspace to enable dirty ring if it hasn't also enabled the
> >>>> ring && bitmap capability, as a VM is likely DOA without the pages
> >>>> marked in the bitmap.
> >> 
> >> This is wrong. The *only* case this is useful is when there is an
> >> in-kernel producer of data outside of the context of a vcpu, which is
> >> so far only the ITS save mechanism. No ITS? No need for this.
> > 
> > How large is the ITS?  If it's a fixed, small size, could we treat the ITS as a
> > one-off case for now?  E.g. do something gross like shove entries into vcpu0's
> > dirty ring?
> > 
> 
> There are several VGIC/ITS tables involved in the issue. I checked the
> specification and the implementation. As the device ID is 16-bits, so
> the maximal devices can be 0x10000. Each device has its ITT (Interrupt
> Translation Table), looked by a 32-bits event ID. The memory used for
> ITT can be large enough in theory.
> 
>     Register       Description           Max-size   Entry-size  Max-entries
>     -----------------------------------------------------------------------
>     GITS_BASER0    ITS Device Table      512KB      8-bytes     0x10000
>     GITS_BASER1    ITS Collection Table  512KB      8-bytes     0x10000

Both can be two levels. So you can multiply the max size by 64K. The
entry size also depends on the revision of the ABI and can be changed
anytime we see fit.

>     GITS_BASER2    (GICv4) ITS VPE Table 512KB      8-bytes(?)  0x10000

We don't virtualise GICv4. We use GICv4 to virtualise a GICv3. So this
table will never be saved (the guest never sees it, and only KVM
manages it).

>     max-devices * (1UL << event_id_shift) * entry_size =
>     0x10000 * (1UL << 32) * 8                          = 1PB
> 
> >> Userspace knows what it has created the first place, and should be in
> >> charge of it (i.e. I want to be able to migrate my GICv2 and
> >> GICv3-without-ITS VMs with the rings only).
> > 
> > Ah, so enabling the dirty bitmap isn't strictly required.  That means this patch
> > is wrong, and it also means that we need to figure out how we want to handle the
> > case where mark_page_dirty_in_slot() is invoked without a running vCPU on a memslot
> > without a dirty_bitmap.
> > 
> > I.e. what's an appropriate action in the below sequence:
> > 
> > void mark_page_dirty_in_slot(struct kvm *kvm,
> > 			     const struct kvm_memory_slot *memslot,
> > 		 	     gfn_t gfn)
> > {
> > 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> > 
> > #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> > 	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> > 		return;
> > 
> > #ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > 	if (WARN_ON_ONCE(!vcpu))
> > 		return;
> > #endif
> > #endif
> > 
> > 	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> > 		unsigned long rel_gfn = gfn - memslot->base_gfn;
> > 		u32 slot = (memslot->as_id << 16) | memslot->id;
> > 
> > 		if (vcpu && kvm->dirty_ring_size)
> > 			kvm_dirty_ring_push(&vcpu->dirty_ring,
> > 					    slot, rel_gfn);
> > 		else if (memslot->dirty_bitmap)
> > 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> > 		else
> > 			???? <=================================================
> > 	}
> > }
> > 
> > 
> > Would it be possible to require a dirty bitmap when an ITS is
> > created?  That would allow treating the above condition as a KVM
> > bug.
> > 
> 
> According to the above calculation, it's impossible to determine the
> memory size for the bitmap in advance. The memory used by ITE
> (Interrupt Translation Entry) tables can be huge enough to use all
> guest's system memory in theory. ITE tables are scattered in guest's
> system memory, but we don't know its location in advance. ITE tables
> are created dynamically on requests from guest.
> 
> However, I think it's a good idea to enable the bitmap only when
> "arm-its-kvm" is really used in userspace (QEMU). For example, the
> machine and (kvm) accelerator are initialized like below. It's
> unknown if "arm-its-kvm" is used until (c). So we can enable
> KVM_CAP_DIRTY_RING_WITH_BITMAP in (d) and the bitmap is created in
> (e) by KVM.
> 
>   main
>     qemu_init
>       qemu_create_machine                   (a) machine instance is created
>       configure_accelerators
>         do_configure_accelerator
>           accel_init_machine
>             kvm_init                        (b) KVM is initialized
>       :
>       qmp_x_exit_preconfig
>         qemu_init_board
>           machine_run_board_init            (c) The board is initialized
>       :
>       accel_setup_post                      (d) KVM is post initialized
>       :
>       <migration>                           (e) Migration starts
> 
> In order to record if the bitmap is really needed, "struct
> kvm::dirty_ring_with_bitmap" is still needed.
> 
>    - KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is advertised when
>      CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP is selected.
> 
>    - KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled in (d) only when
>      "arm-its-kvm" is used in QEMU. After the capability is enabled,
>      "struct kvm::dirty_ring_with_bitmap" is set to 1.
> 
>    - The bitmap is created by KVM in (e).
> 
> If the above analysis makes sense, I don't see there is anything
> missed from the patch Of course, KVM_CAP_DIRTY_LOG_RING_{ACQ_REL,
> WITH_BITMAP} needs to be enabled separately and don't depend on each
> other. the description added to "Documentation/virt/kvm/abi.rst"
> need to be improved as Peter and Oliver suggested.
> kvm_dirty_ring_exclusive() needs to be renamed to
> kvm_use_dirty_bitmap() and "#ifdef" needs to be cut down as Sean
> suggested.

Frankly, I really hate the "mayo and ketchup" approach. Both dirty
tracking approaches serve different purpose, and I really don't see
the point in merging them behind a single cap. Userspace should be
able to chose if and when it wants to use a logging method or another.
We should document how they interact, but that's about it.


> 
> 
> >>>> @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
> >>>>   {
> >>>>   	int r;
> >>>>   +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> >>>> +	if (!kvm->dirty_ring_with_bitmap)
> >>>> +		return -EINVAL;
> >>>> +#endif
> >>> 
> >>> This one at least is prettier with IS_ENABLED
> >>> 
> >>> 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
> >>> 	    !kvm->dirty_ring_with_bitmap)
> >>> 		return -EINVAL;
> >>> 
> >>> But dirty_ring_with_bitmap really shouldn't need to exist.  It's
> >>> mandatory for architectures that have
> >>> HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for architectures
> >>> that don't.  In other words, the API for enabling the dirty ring is
> >>> a bit ugly.
> >>> 
> >>> Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been
> >>> officially released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP
> >>> on top, what about usurping bits 63:32 of cap->args[0] for flags?
> >>> E.g.
> > 
> > For posterity, filling in my missing idea...
> > 
> > Since the size is restricted to be well below a 32-bit value, and it's unlikely
> > that KVM will ever support 4GiB per-vCPU rings, we could usurp the upper bits for
> > flags:
> > 
> >    static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u64 arg0)
> >    {
> > 	u32 flags = arg0 >> 32;
> > 	u32 size = arg0;
> > 
> > However, since it sounds like enabling dirty_bitmap isn't strictly
> > required, I have no objection to enabling
> > KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, my objection was purely that
> > KVM was adding a per-VM flag just to sanity check the
> > configuration.
> > 
> 
> If KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled for "arm-its-kvm",
> it'd better to allow enabling those two capability (ACQ_REL and
> WITH_BITMAP) separately, as I explained above. userspace (QEMU) will
> gain flexibility if these two capabilities can be enabled
> separately.
> 
> To QEMU, KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> are accelerator's properties. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is
> board's property. Relaxing their dependency will give flexibility to
> QEMU.

That.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-22 10:33           ` Marc Zyngier
@ 2022-10-24 23:50             ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-24 23:50 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Sat, Oct 22, 2022, Marc Zyngier wrote:
> On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > 
> > On Fri, Oct 21, 2022, Marc Zyngier wrote:
> > > Because dirtying memory outside of a vcpu context makes it
> > > incredibly awkward to handle a "ring full" condition?
> > 
> > Kicking all vCPUs with the soft-full request isn't _that_ awkward.
> > It's certainly sub-optimal, but if inserting into the per-VM ring is
> > relatively rare, then in practice it's unlikely to impact guest
> > performance.
> 
> But there is *nothing* to kick here. The kernel is dirtying pages,
> devices are dirtying pages (DMA), and there is no context associated
> with that. Which is why a finite ring is the wrong abstraction.

I don't follow.  If there's a VM, KVM can always kick all vCPUs.  Again, might
be far from optimal, but it's an option.  If there's literally no VM, then KVM
isn't involved at all and there's no "ring vs. bitmap" decision.

> > Would it be possible to require a dirty bitmap when an ITS is
> > created?  That would allow treating the above condition as a KVM
> > bug.
> 
> No. This should be optional. Everything about migration should be
> absolutely optional (I run plenty of concurrent VMs on sub-2GB
> systems). You want to migrate a VM that has an ITS or will collect
> dirty bits originating from a SMMU with HTTU, you enable the dirty
> bitmap. You want to have *vcpu* based dirty rings, you enable them.
> 
> In short, there shouldn't be any reason for the two are either
> mandatory or conflated. Both should be optional, independent, because
> they cover completely disjoined use cases. *userspace* should be in
> charge of deciding this.

I agree about userspace being in control, what I want to avoid is letting userspace
put KVM into a bad state without any indication from KVM that the setup is wrong
until something actually dirties a page.

Specifically, if mark_page_dirty_in_slot() is invoked without a running vCPU, on
a memslot with dirty tracking enabled but without a dirty bitmap, then the migration
is doomed.  Dropping the dirty page isn't a sane response as that'd all but
guaranatee memory corruption in the guest.  At best, KVM could kick all vCPUs out
to userspace with a new exit reason, but that's not a very good experience for
userspace as either the VM is unexpectedly unmigratable or the VM ends up being
killed (or I suppose userspace could treat the exit as a per-VM dirty ring of
size '1').

That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
might end up collecting dirty information without a vCPU.  KVM is still
technically prescribing a solution to userspace, but only because there's only
one solution.

> > > > The acquire-release thing is irrelevant for x86, and no other
> > > > architecture supports the dirty ring until this series, i.e. there's
> > > > no need for KVM to detect that userspace has been updated to gain
> > > > acquire-release semantics, because the fact that userspace is
> > > > enabling the dirty ring on arm64 means userspace has been updated.
> > > 
> > > Do we really need to make the API more awkward? There is an
> > > established pattern of "enable what is advertised". Some level of
> > > uniformity wouldn't hurt, really.
> > 
> > I agree that uniformity would be nice, but for capabilities I don't
> > think that's ever going to happen.  I'm pretty sure supporting
> > enabling is actually in the minority.  E.g. of the 20 capabilities
> > handled in kvm_vm_ioctl_check_extension_generic(), I believe only 3
> > support enabling (KVM_CAP_HALT_POLL, KVM_CAP_DIRTY_LOG_RING, and
> > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2).
> 
> I understood that you were advocating that a check for KVM_CAP_FOO
> could result in enabling KVM_CAP_BAR. That I definitely object to.

I was hoping KVM could make the ACQ_REL capability an extension of DIRTY_LOG_RING,
i.e. userspace would DIRTY_LOG_RING _and_ DIRTY_LOG_RING_ACQ_REL for ARM and other
architectures, e.g.

  int enable_dirty_ring(void)
  {
	if (!kvm_check(KVM_CAP_DIRTY_LOG_RING))
		return -EINVAL;

	if (!tso && !kvm_check(KVM_CAP_DIRTY_LOG_RING_ACQ_REL))
		return -EINVAL;

	return kvm_enable(KVM_CAP_DIRTY_LOG_RING);
  }

But I failed to consider that userspace might try to enable DIRTY_LOG_RING on
all architectures, i.e. wouldn't arbitrarily restrict DIRTY_LOG_RING to x86.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-24 23:50             ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-24 23:50 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, peterx, will, catalin.marinas,
	bgardon, shuah, andrew.jones, dmatlack, pbonzini, zhenyzha,
	james.morse, suzuki.poulose, alexandru.elisei, oliver.upton,
	shan.gavin

On Sat, Oct 22, 2022, Marc Zyngier wrote:
> On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > 
> > On Fri, Oct 21, 2022, Marc Zyngier wrote:
> > > Because dirtying memory outside of a vcpu context makes it
> > > incredibly awkward to handle a "ring full" condition?
> > 
> > Kicking all vCPUs with the soft-full request isn't _that_ awkward.
> > It's certainly sub-optimal, but if inserting into the per-VM ring is
> > relatively rare, then in practice it's unlikely to impact guest
> > performance.
> 
> But there is *nothing* to kick here. The kernel is dirtying pages,
> devices are dirtying pages (DMA), and there is no context associated
> with that. Which is why a finite ring is the wrong abstraction.

I don't follow.  If there's a VM, KVM can always kick all vCPUs.  Again, might
be far from optimal, but it's an option.  If there's literally no VM, then KVM
isn't involved at all and there's no "ring vs. bitmap" decision.

> > Would it be possible to require a dirty bitmap when an ITS is
> > created?  That would allow treating the above condition as a KVM
> > bug.
> 
> No. This should be optional. Everything about migration should be
> absolutely optional (I run plenty of concurrent VMs on sub-2GB
> systems). You want to migrate a VM that has an ITS or will collect
> dirty bits originating from a SMMU with HTTU, you enable the dirty
> bitmap. You want to have *vcpu* based dirty rings, you enable them.
> 
> In short, there shouldn't be any reason for the two are either
> mandatory or conflated. Both should be optional, independent, because
> they cover completely disjoined use cases. *userspace* should be in
> charge of deciding this.

I agree about userspace being in control, what I want to avoid is letting userspace
put KVM into a bad state without any indication from KVM that the setup is wrong
until something actually dirties a page.

Specifically, if mark_page_dirty_in_slot() is invoked without a running vCPU, on
a memslot with dirty tracking enabled but without a dirty bitmap, then the migration
is doomed.  Dropping the dirty page isn't a sane response as that'd all but
guaranatee memory corruption in the guest.  At best, KVM could kick all vCPUs out
to userspace with a new exit reason, but that's not a very good experience for
userspace as either the VM is unexpectedly unmigratable or the VM ends up being
killed (or I suppose userspace could treat the exit as a per-VM dirty ring of
size '1').

That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
might end up collecting dirty information without a vCPU.  KVM is still
technically prescribing a solution to userspace, but only because there's only
one solution.

> > > > The acquire-release thing is irrelevant for x86, and no other
> > > > architecture supports the dirty ring until this series, i.e. there's
> > > > no need for KVM to detect that userspace has been updated to gain
> > > > acquire-release semantics, because the fact that userspace is
> > > > enabling the dirty ring on arm64 means userspace has been updated.
> > > 
> > > Do we really need to make the API more awkward? There is an
> > > established pattern of "enable what is advertised". Some level of
> > > uniformity wouldn't hurt, really.
> > 
> > I agree that uniformity would be nice, but for capabilities I don't
> > think that's ever going to happen.  I'm pretty sure supporting
> > enabling is actually in the minority.  E.g. of the 20 capabilities
> > handled in kvm_vm_ioctl_check_extension_generic(), I believe only 3
> > support enabling (KVM_CAP_HALT_POLL, KVM_CAP_DIRTY_LOG_RING, and
> > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2).
> 
> I understood that you were advocating that a check for KVM_CAP_FOO
> could result in enabling KVM_CAP_BAR. That I definitely object to.

I was hoping KVM could make the ACQ_REL capability an extension of DIRTY_LOG_RING,
i.e. userspace would DIRTY_LOG_RING _and_ DIRTY_LOG_RING_ACQ_REL for ARM and other
architectures, e.g.

  int enable_dirty_ring(void)
  {
	if (!kvm_check(KVM_CAP_DIRTY_LOG_RING))
		return -EINVAL;

	if (!tso && !kvm_check(KVM_CAP_DIRTY_LOG_RING_ACQ_REL))
		return -EINVAL;

	return kvm_enable(KVM_CAP_DIRTY_LOG_RING);
  }

But I failed to consider that userspace might try to enable DIRTY_LOG_RING on
all architectures, i.e. wouldn't arbitrarily restrict DIRTY_LOG_RING to x86.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-24 23:50             ` Sean Christopherson
@ 2022-10-25  0:08               ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-25  0:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Mon, Oct 24, 2022, Sean Christopherson wrote:
> On Sat, Oct 22, 2022, Marc Zyngier wrote:
> > On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > On Fri, Oct 21, 2022, Marc Zyngier wrote:
> > > > Because dirtying memory outside of a vcpu context makes it
> > > > incredibly awkward to handle a "ring full" condition?
> > > 
> > > Kicking all vCPUs with the soft-full request isn't _that_ awkward.
> > > It's certainly sub-optimal, but if inserting into the per-VM ring is
> > > relatively rare, then in practice it's unlikely to impact guest
> > > performance.
> > 
> > But there is *nothing* to kick here. The kernel is dirtying pages,
> > devices are dirtying pages (DMA), and there is no context associated
> > with that. Which is why a finite ring is the wrong abstraction.
> 
> I don't follow.  If there's a VM, KVM can always kick all vCPUs.  Again, might
> be far from optimal, but it's an option.  If there's literally no VM, then KVM
> isn't involved at all and there's no "ring vs. bitmap" decision.

Finally caught up in the other part of the thread that calls out that the devices
can't be stalled.

https://lore.kernel.org/all/87czakgmc0.wl-maz@kernel.org
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-25  0:08               ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-25  0:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, peterx, will, catalin.marinas,
	bgardon, shuah, andrew.jones, dmatlack, pbonzini, zhenyzha,
	james.morse, suzuki.poulose, alexandru.elisei, oliver.upton,
	shan.gavin

On Mon, Oct 24, 2022, Sean Christopherson wrote:
> On Sat, Oct 22, 2022, Marc Zyngier wrote:
> > On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > On Fri, Oct 21, 2022, Marc Zyngier wrote:
> > > > Because dirtying memory outside of a vcpu context makes it
> > > > incredibly awkward to handle a "ring full" condition?
> > > 
> > > Kicking all vCPUs with the soft-full request isn't _that_ awkward.
> > > It's certainly sub-optimal, but if inserting into the per-VM ring is
> > > relatively rare, then in practice it's unlikely to impact guest
> > > performance.
> > 
> > But there is *nothing* to kick here. The kernel is dirtying pages,
> > devices are dirtying pages (DMA), and there is no context associated
> > with that. Which is why a finite ring is the wrong abstraction.
> 
> I don't follow.  If there's a VM, KVM can always kick all vCPUs.  Again, might
> be far from optimal, but it's an option.  If there's literally no VM, then KVM
> isn't involved at all and there's no "ring vs. bitmap" decision.

Finally caught up in the other part of the thread that calls out that the devices
can't be stalled.

https://lore.kernel.org/all/87czakgmc0.wl-maz@kernel.org

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-24 23:50             ` Sean Christopherson
@ 2022-10-25  0:24               ` Oliver Upton
  -1 siblings, 0 replies; 86+ messages in thread
From: Oliver Upton @ 2022-10-25  0:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, Marc Zyngier, bgardon, andrew.jones, dmatlack,
	shan.gavin, catalin.marinas, kvmarm, pbonzini, zhenyzha, will,
	kvmarm

On Mon, Oct 24, 2022 at 11:50:29PM +0000, Sean Christopherson wrote:
> On Sat, Oct 22, 2022, Marc Zyngier wrote:
> > On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@google.com> wrote:

[...]

> > > Would it be possible to require a dirty bitmap when an ITS is
> > > created?  That would allow treating the above condition as a KVM
> > > bug.
> > 
> > No. This should be optional. Everything about migration should be
> > absolutely optional (I run plenty of concurrent VMs on sub-2GB
> > systems). You want to migrate a VM that has an ITS or will collect
> > dirty bits originating from a SMMU with HTTU, you enable the dirty
> > bitmap. You want to have *vcpu* based dirty rings, you enable them.
> > 
> > In short, there shouldn't be any reason for the two are either
> > mandatory or conflated. Both should be optional, independent, because
> > they cover completely disjoined use cases. *userspace* should be in
> > charge of deciding this.
> 
> I agree about userspace being in control, what I want to avoid is letting userspace
> put KVM into a bad state without any indication from KVM that the setup is wrong
> until something actually dirties a page.
> 
> Specifically, if mark_page_dirty_in_slot() is invoked without a running vCPU, on
> a memslot with dirty tracking enabled but without a dirty bitmap, then the migration
> is doomed.  Dropping the dirty page isn't a sane response as that'd all but
> guaranatee memory corruption in the guest.  At best, KVM could kick all vCPUs out
> to userspace with a new exit reason, but that's not a very good experience for
> userspace as either the VM is unexpectedly unmigratable or the VM ends up being
> killed (or I suppose userspace could treat the exit as a per-VM dirty ring of
> size '1').

This only works on the assumption that the VM is in fact running. In the
case of the GIC ITS, I would expect that the VM has already been paused
in preparation for serialization. So, there would never be a vCPU thread
that would ever detect the kick.

> That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
> might end up collecting dirty information without a vCPU.  KVM is still
> technically prescribing a solution to userspace, but only because there's only
> one solution.

I was trying to allude to something like this by flat-out requiring
ring + bitmap on arm64.

Otherwise, we'd either need to:

 (1) Document the features that explicitly depend on ring + bitmap (i.e.
 GIC ITS, whatever else may come) such that userspace sets up the
 correct configuration based on what its using. The combined likelihood
 of both KVM and userspace getting this right seems low.

 (2) Outright reject the use of features that require ring + bitmap.
 This pulls in ordering around caps and other UAPI.

> > > > > The acquire-release thing is irrelevant for x86, and no other
> > > > > architecture supports the dirty ring until this series, i.e. there's
> > > > > no need for KVM to detect that userspace has been updated to gain
> > > > > acquire-release semantics, because the fact that userspace is
> > > > > enabling the dirty ring on arm64 means userspace has been updated.
> > > > 
> > > > Do we really need to make the API more awkward? There is an
> > > > established pattern of "enable what is advertised". Some level of
> > > > uniformity wouldn't hurt, really.
> > > 
> > > I agree that uniformity would be nice, but for capabilities I don't
> > > think that's ever going to happen.  I'm pretty sure supporting
> > > enabling is actually in the minority.  E.g. of the 20 capabilities
> > > handled in kvm_vm_ioctl_check_extension_generic(), I believe only 3
> > > support enabling (KVM_CAP_HALT_POLL, KVM_CAP_DIRTY_LOG_RING, and
> > > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2).
> > 
> > I understood that you were advocating that a check for KVM_CAP_FOO
> > could result in enabling KVM_CAP_BAR. That I definitely object to.
> 
> I was hoping KVM could make the ACQ_REL capability an extension of DIRTY_LOG_RING,
> i.e. userspace would DIRTY_LOG_RING _and_ DIRTY_LOG_RING_ACQ_REL for ARM and other
> architectures, e.g.
> 
>   int enable_dirty_ring(void)
>   {
> 	if (!kvm_check(KVM_CAP_DIRTY_LOG_RING))
> 		return -EINVAL;
> 
> 	if (!tso && !kvm_check(KVM_CAP_DIRTY_LOG_RING_ACQ_REL))
> 		return -EINVAL;
> 
> 	return kvm_enable(KVM_CAP_DIRTY_LOG_RING);
>   }
> 
> But I failed to consider that userspace might try to enable DIRTY_LOG_RING on
> all architectures, i.e. wouldn't arbitrarily restrict DIRTY_LOG_RING to x86.

The third option would be to toss DIRTY_LOG_RING_ACQ_REL this release
and instead add DIRTY_LOG_RING2, this time checking the flags.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-25  0:24               ` Oliver Upton
  0 siblings, 0 replies; 86+ messages in thread
From: Oliver Upton @ 2022-10-25  0:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, Gavin Shan, kvmarm, kvmarm, kvm, peterx, will,
	catalin.marinas, bgardon, shuah, andrew.jones, dmatlack,
	pbonzini, zhenyzha, james.morse, suzuki.poulose,
	alexandru.elisei, shan.gavin

On Mon, Oct 24, 2022 at 11:50:29PM +0000, Sean Christopherson wrote:
> On Sat, Oct 22, 2022, Marc Zyngier wrote:
> > On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@google.com> wrote:

[...]

> > > Would it be possible to require a dirty bitmap when an ITS is
> > > created?  That would allow treating the above condition as a KVM
> > > bug.
> > 
> > No. This should be optional. Everything about migration should be
> > absolutely optional (I run plenty of concurrent VMs on sub-2GB
> > systems). You want to migrate a VM that has an ITS or will collect
> > dirty bits originating from a SMMU with HTTU, you enable the dirty
> > bitmap. You want to have *vcpu* based dirty rings, you enable them.
> > 
> > In short, there shouldn't be any reason for the two are either
> > mandatory or conflated. Both should be optional, independent, because
> > they cover completely disjoined use cases. *userspace* should be in
> > charge of deciding this.
> 
> I agree about userspace being in control, what I want to avoid is letting userspace
> put KVM into a bad state without any indication from KVM that the setup is wrong
> until something actually dirties a page.
> 
> Specifically, if mark_page_dirty_in_slot() is invoked without a running vCPU, on
> a memslot with dirty tracking enabled but without a dirty bitmap, then the migration
> is doomed.  Dropping the dirty page isn't a sane response as that'd all but
> guaranatee memory corruption in the guest.  At best, KVM could kick all vCPUs out
> to userspace with a new exit reason, but that's not a very good experience for
> userspace as either the VM is unexpectedly unmigratable or the VM ends up being
> killed (or I suppose userspace could treat the exit as a per-VM dirty ring of
> size '1').

This only works on the assumption that the VM is in fact running. In the
case of the GIC ITS, I would expect that the VM has already been paused
in preparation for serialization. So, there would never be a vCPU thread
that would ever detect the kick.

> That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
> might end up collecting dirty information without a vCPU.  KVM is still
> technically prescribing a solution to userspace, but only because there's only
> one solution.

I was trying to allude to something like this by flat-out requiring
ring + bitmap on arm64.

Otherwise, we'd either need to:

 (1) Document the features that explicitly depend on ring + bitmap (i.e.
 GIC ITS, whatever else may come) such that userspace sets up the
 correct configuration based on what its using. The combined likelihood
 of both KVM and userspace getting this right seems low.

 (2) Outright reject the use of features that require ring + bitmap.
 This pulls in ordering around caps and other UAPI.

> > > > > The acquire-release thing is irrelevant for x86, and no other
> > > > > architecture supports the dirty ring until this series, i.e. there's
> > > > > no need for KVM to detect that userspace has been updated to gain
> > > > > acquire-release semantics, because the fact that userspace is
> > > > > enabling the dirty ring on arm64 means userspace has been updated.
> > > > 
> > > > Do we really need to make the API more awkward? There is an
> > > > established pattern of "enable what is advertised". Some level of
> > > > uniformity wouldn't hurt, really.
> > > 
> > > I agree that uniformity would be nice, but for capabilities I don't
> > > think that's ever going to happen.  I'm pretty sure supporting
> > > enabling is actually in the minority.  E.g. of the 20 capabilities
> > > handled in kvm_vm_ioctl_check_extension_generic(), I believe only 3
> > > support enabling (KVM_CAP_HALT_POLL, KVM_CAP_DIRTY_LOG_RING, and
> > > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2).
> > 
> > I understood that you were advocating that a check for KVM_CAP_FOO
> > could result in enabling KVM_CAP_BAR. That I definitely object to.
> 
> I was hoping KVM could make the ACQ_REL capability an extension of DIRTY_LOG_RING,
> i.e. userspace would DIRTY_LOG_RING _and_ DIRTY_LOG_RING_ACQ_REL for ARM and other
> architectures, e.g.
> 
>   int enable_dirty_ring(void)
>   {
> 	if (!kvm_check(KVM_CAP_DIRTY_LOG_RING))
> 		return -EINVAL;
> 
> 	if (!tso && !kvm_check(KVM_CAP_DIRTY_LOG_RING_ACQ_REL))
> 		return -EINVAL;
> 
> 	return kvm_enable(KVM_CAP_DIRTY_LOG_RING);
>   }
> 
> But I failed to consider that userspace might try to enable DIRTY_LOG_RING on
> all architectures, i.e. wouldn't arbitrarily restrict DIRTY_LOG_RING to x86.

The third option would be to toss DIRTY_LOG_RING_ACQ_REL this release
and instead add DIRTY_LOG_RING2, this time checking the flags.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-24 23:50             ` Sean Christopherson
@ 2022-10-25  7:22               ` Marc Zyngier
  -1 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-25  7:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, peterx, will, catalin.marinas,
	bgardon, shuah, andrew.jones, dmatlack, pbonzini, zhenyzha,
	james.morse, suzuki.poulose, alexandru.elisei, oliver.upton,
	shan.gavin

On Tue, 25 Oct 2022 00:50:29 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Sat, Oct 22, 2022, Marc Zyngier wrote:
> > On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > On Fri, Oct 21, 2022, Marc Zyngier wrote:
> > > > Because dirtying memory outside of a vcpu context makes it
> > > > incredibly awkward to handle a "ring full" condition?
> > > 
> > > Kicking all vCPUs with the soft-full request isn't _that_ awkward.
> > > It's certainly sub-optimal, but if inserting into the per-VM ring is
> > > relatively rare, then in practice it's unlikely to impact guest
> > > performance.
> > 
> > But there is *nothing* to kick here. The kernel is dirtying pages,
> > devices are dirtying pages (DMA), and there is no context associated
> > with that. Which is why a finite ring is the wrong abstraction.
> 
> I don't follow.  If there's a VM, KVM can always kick all vCPUs.
> Again, might be far from optimal, but it's an option.  If there's
> literally no VM, then KVM isn't involved at all and there's no "ring
> vs. bitmap" decision.

The key word is *device*. No vcpu is involved here. Actually, we
actively prevent save/restore of the ITS while vcpus are running. How
could you even expect to snapshot a consistent state if the interrupt
state is changing under your feet?

> 
> > > Would it be possible to require a dirty bitmap when an ITS is
> > > created?  That would allow treating the above condition as a KVM
> > > bug.
> > 
> > No. This should be optional. Everything about migration should be
> > absolutely optional (I run plenty of concurrent VMs on sub-2GB
> > systems). You want to migrate a VM that has an ITS or will collect
> > dirty bits originating from a SMMU with HTTU, you enable the dirty
> > bitmap. You want to have *vcpu* based dirty rings, you enable them.
> > 
> > In short, there shouldn't be any reason for the two are either
> > mandatory or conflated. Both should be optional, independent, because
> > they cover completely disjoined use cases. *userspace* should be in
> > charge of deciding this.
> 
> I agree about userspace being in control, what I want to avoid is
> letting userspace put KVM into a bad state without any indication
> from KVM that the setup is wrong until something actually dirties a
> page.

I can't see how that can result in a bad state for KVM itself. All you
lack is a way for userspace to *track* the dirtying. Just like we
don't have a way to track the dirtying of a page from the VMM.

> Specifically, if mark_page_dirty_in_slot() is invoked without a
> running vCPU, on a memslot with dirty tracking enabled but without a
> dirty bitmap, then the migration is doomed.

Yup, and that's a luser error. Too bad. Userspace can still transfer
all the memory, and all will be fine.

> Dropping the dirty page isn't a sane response as that'd all but
> guaranatee memory corruption in the guest.

Again, user error. Userspace can readily write over all the guest
memory (virtio), and no amount of KVM-side tracking will help. What
are you going to do about it?

At the end of the day, what are you trying to do? All the dirty
tracking muck (bitmap and ring) is only a way for userspace to track
dirty pages more easily and accelerate the transfer. If userspace
doesn't tell KVM to track these writes, tough luck. If the author of a
VMM doesn't understand that, then maybe they shouldn't be in charge of
the VMM. Worse case, they can still transfer the whole thing, no harm
done.

> At best, KVM could kick all vCPUs out to userspace
> with a new exit reason, but that's not a very good experience for
> userspace as either the VM is unexpectedly unmigratable or the VM
> ends up being killed (or I suppose userspace could treat the exit as
> a per-VM dirty ring of size '1').

Can we please stop the exit nonsense? There is no vcpu involved
here. This is a device (emulated or not) writing to memory, triggered
by an ioctl from userspace. If you're thinking vcpu, you have the
wrong end of the stick.

Memory gets dirtied system wide, not just by CPUs, and no amount of
per-vcpu resource is going to solve this problem. VM-based rings can
help with if they provide a way to recover from an overflow. But that
obviously doesn't work here as we can't checkpoint and restart the
saving process on overflow.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-25  7:22               ` Marc Zyngier
  0 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-25  7:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Tue, 25 Oct 2022 00:50:29 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Sat, Oct 22, 2022, Marc Zyngier wrote:
> > On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > On Fri, Oct 21, 2022, Marc Zyngier wrote:
> > > > Because dirtying memory outside of a vcpu context makes it
> > > > incredibly awkward to handle a "ring full" condition?
> > > 
> > > Kicking all vCPUs with the soft-full request isn't _that_ awkward.
> > > It's certainly sub-optimal, but if inserting into the per-VM ring is
> > > relatively rare, then in practice it's unlikely to impact guest
> > > performance.
> > 
> > But there is *nothing* to kick here. The kernel is dirtying pages,
> > devices are dirtying pages (DMA), and there is no context associated
> > with that. Which is why a finite ring is the wrong abstraction.
> 
> I don't follow.  If there's a VM, KVM can always kick all vCPUs.
> Again, might be far from optimal, but it's an option.  If there's
> literally no VM, then KVM isn't involved at all and there's no "ring
> vs. bitmap" decision.

The key word is *device*. No vcpu is involved here. Actually, we
actively prevent save/restore of the ITS while vcpus are running. How
could you even expect to snapshot a consistent state if the interrupt
state is changing under your feet?

> 
> > > Would it be possible to require a dirty bitmap when an ITS is
> > > created?  That would allow treating the above condition as a KVM
> > > bug.
> > 
> > No. This should be optional. Everything about migration should be
> > absolutely optional (I run plenty of concurrent VMs on sub-2GB
> > systems). You want to migrate a VM that has an ITS or will collect
> > dirty bits originating from a SMMU with HTTU, you enable the dirty
> > bitmap. You want to have *vcpu* based dirty rings, you enable them.
> > 
> > In short, there shouldn't be any reason for the two are either
> > mandatory or conflated. Both should be optional, independent, because
> > they cover completely disjoined use cases. *userspace* should be in
> > charge of deciding this.
> 
> I agree about userspace being in control, what I want to avoid is
> letting userspace put KVM into a bad state without any indication
> from KVM that the setup is wrong until something actually dirties a
> page.

I can't see how that can result in a bad state for KVM itself. All you
lack is a way for userspace to *track* the dirtying. Just like we
don't have a way to track the dirtying of a page from the VMM.

> Specifically, if mark_page_dirty_in_slot() is invoked without a
> running vCPU, on a memslot with dirty tracking enabled but without a
> dirty bitmap, then the migration is doomed.

Yup, and that's a luser error. Too bad. Userspace can still transfer
all the memory, and all will be fine.

> Dropping the dirty page isn't a sane response as that'd all but
> guaranatee memory corruption in the guest.

Again, user error. Userspace can readily write over all the guest
memory (virtio), and no amount of KVM-side tracking will help. What
are you going to do about it?

At the end of the day, what are you trying to do? All the dirty
tracking muck (bitmap and ring) is only a way for userspace to track
dirty pages more easily and accelerate the transfer. If userspace
doesn't tell KVM to track these writes, tough luck. If the author of a
VMM doesn't understand that, then maybe they shouldn't be in charge of
the VMM. Worse case, they can still transfer the whole thing, no harm
done.

> At best, KVM could kick all vCPUs out to userspace
> with a new exit reason, but that's not a very good experience for
> userspace as either the VM is unexpectedly unmigratable or the VM
> ends up being killed (or I suppose userspace could treat the exit as
> a per-VM dirty ring of size '1').

Can we please stop the exit nonsense? There is no vcpu involved
here. This is a device (emulated or not) writing to memory, triggered
by an ioctl from userspace. If you're thinking vcpu, you have the
wrong end of the stick.

Memory gets dirtied system wide, not just by CPUs, and no amount of
per-vcpu resource is going to solve this problem. VM-based rings can
help with if they provide a way to recover from an overflow. But that
obviously doesn't work here as we can't checkpoint and restart the
saving process on overflow.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-25  0:24               ` Oliver Upton
@ 2022-10-25  7:31                 ` Marc Zyngier
  -1 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-25  7:31 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Sean Christopherson, Gavin Shan, kvmarm, kvmarm, kvm, peterx,
	will, catalin.marinas, bgardon, shuah, andrew.jones, dmatlack,
	pbonzini, zhenyzha, james.morse, suzuki.poulose,
	alexandru.elisei, shan.gavin

On Tue, 25 Oct 2022 01:24:19 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Mon, Oct 24, 2022 at 11:50:29PM +0000, Sean Christopherson wrote:
> > On Sat, Oct 22, 2022, Marc Zyngier wrote:
> > > On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@google.com> wrote:
> 
> [...]
> 
> > > > Would it be possible to require a dirty bitmap when an ITS is
> > > > created?  That would allow treating the above condition as a KVM
> > > > bug.
> > > 
> > > No. This should be optional. Everything about migration should be
> > > absolutely optional (I run plenty of concurrent VMs on sub-2GB
> > > systems). You want to migrate a VM that has an ITS or will collect
> > > dirty bits originating from a SMMU with HTTU, you enable the dirty
> > > bitmap. You want to have *vcpu* based dirty rings, you enable them.
> > > 
> > > In short, there shouldn't be any reason for the two are either
> > > mandatory or conflated. Both should be optional, independent, because
> > > they cover completely disjoined use cases. *userspace* should be in
> > > charge of deciding this.
> > 
> > I agree about userspace being in control, what I want to avoid is letting userspace
> > put KVM into a bad state without any indication from KVM that the setup is wrong
> > until something actually dirties a page.
> > 
> > Specifically, if mark_page_dirty_in_slot() is invoked without a running vCPU, on
> > a memslot with dirty tracking enabled but without a dirty bitmap, then the migration
> > is doomed.  Dropping the dirty page isn't a sane response as that'd all but
> > guaranatee memory corruption in the guest.  At best, KVM could kick all vCPUs out
> > to userspace with a new exit reason, but that's not a very good experience for
> > userspace as either the VM is unexpectedly unmigratable or the VM ends up being
> > killed (or I suppose userspace could treat the exit as a per-VM dirty ring of
> > size '1').
> 
> This only works on the assumption that the VM is in fact running. In the
> case of the GIC ITS, I would expect that the VM has already been paused
> in preparation for serialization. So, there would never be a vCPU thread
> that would ever detect the kick.

This is indeed the case. The ioctl will return -EBUSY if any vcpu is
running.

> 
> > That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
> > might end up collecting dirty information without a vCPU.  KVM is still
> > technically prescribing a solution to userspace, but only because there's only
> > one solution.
> 
> I was trying to allude to something like this by flat-out requiring
> ring + bitmap on arm64.

And I claim that this is wrong. It may suit a particular use case, but
that's definitely not a universal truth.

> 
> Otherwise, we'd either need to:
> 
>  (1) Document the features that explicitly depend on ring + bitmap (i.e.
>  GIC ITS, whatever else may come) such that userspace sets up the
>  correct configuration based on what its using. The combined likelihood
>  of both KVM and userspace getting this right seems low.

But what is there to get wrong? Absolutely nothing. Today, you can
save/restore a GICv3-ITS VM without a bitmap at all. Just dump all of
the memory. The bitmap only allows you to do it while the vcpus are
running. Do you want a dirty ring because it makes things faster?
Fine. But you need to understand what this does.

Yes, this may require some additional documentation. But more
importantly, it requires VMM authors to pay attention to what is
happening. At least the QEMU folks are doing that.

>  (2) Outright reject the use of features that require ring + bitmap.
>  This pulls in ordering around caps and other UAPI.

I don't think this makes any sense. Neither bitmap nor ring should be
a prerequisite for *anything*.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-25  7:31                 ` Marc Zyngier
  0 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-25  7:31 UTC (permalink / raw)
  To: Oliver Upton
  Cc: shuah, kvm, bgardon, andrew.jones, dmatlack, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Tue, 25 Oct 2022 01:24:19 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Mon, Oct 24, 2022 at 11:50:29PM +0000, Sean Christopherson wrote:
> > On Sat, Oct 22, 2022, Marc Zyngier wrote:
> > > On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@google.com> wrote:
> 
> [...]
> 
> > > > Would it be possible to require a dirty bitmap when an ITS is
> > > > created?  That would allow treating the above condition as a KVM
> > > > bug.
> > > 
> > > No. This should be optional. Everything about migration should be
> > > absolutely optional (I run plenty of concurrent VMs on sub-2GB
> > > systems). You want to migrate a VM that has an ITS or will collect
> > > dirty bits originating from a SMMU with HTTU, you enable the dirty
> > > bitmap. You want to have *vcpu* based dirty rings, you enable them.
> > > 
> > > In short, there shouldn't be any reason for the two are either
> > > mandatory or conflated. Both should be optional, independent, because
> > > they cover completely disjoined use cases. *userspace* should be in
> > > charge of deciding this.
> > 
> > I agree about userspace being in control, what I want to avoid is letting userspace
> > put KVM into a bad state without any indication from KVM that the setup is wrong
> > until something actually dirties a page.
> > 
> > Specifically, if mark_page_dirty_in_slot() is invoked without a running vCPU, on
> > a memslot with dirty tracking enabled but without a dirty bitmap, then the migration
> > is doomed.  Dropping the dirty page isn't a sane response as that'd all but
> > guaranatee memory corruption in the guest.  At best, KVM could kick all vCPUs out
> > to userspace with a new exit reason, but that's not a very good experience for
> > userspace as either the VM is unexpectedly unmigratable or the VM ends up being
> > killed (or I suppose userspace could treat the exit as a per-VM dirty ring of
> > size '1').
> 
> This only works on the assumption that the VM is in fact running. In the
> case of the GIC ITS, I would expect that the VM has already been paused
> in preparation for serialization. So, there would never be a vCPU thread
> that would ever detect the kick.

This is indeed the case. The ioctl will return -EBUSY if any vcpu is
running.

> 
> > That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
> > might end up collecting dirty information without a vCPU.  KVM is still
> > technically prescribing a solution to userspace, but only because there's only
> > one solution.
> 
> I was trying to allude to something like this by flat-out requiring
> ring + bitmap on arm64.

And I claim that this is wrong. It may suit a particular use case, but
that's definitely not a universal truth.

> 
> Otherwise, we'd either need to:
> 
>  (1) Document the features that explicitly depend on ring + bitmap (i.e.
>  GIC ITS, whatever else may come) such that userspace sets up the
>  correct configuration based on what its using. The combined likelihood
>  of both KVM and userspace getting this right seems low.

But what is there to get wrong? Absolutely nothing. Today, you can
save/restore a GICv3-ITS VM without a bitmap at all. Just dump all of
the memory. The bitmap only allows you to do it while the vcpus are
running. Do you want a dirty ring because it makes things faster?
Fine. But you need to understand what this does.

Yes, this may require some additional documentation. But more
importantly, it requires VMM authors to pay attention to what is
happening. At least the QEMU folks are doing that.

>  (2) Outright reject the use of features that require ring + bitmap.
>  This pulls in ordering around caps and other UAPI.

I don't think this makes any sense. Neither bitmap nor ring should be
a prerequisite for *anything*.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-25  7:31                 ` Marc Zyngier
@ 2022-10-25 17:47                   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-25 17:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, Gavin Shan, kvmarm, kvmarm, kvm, peterx, will,
	catalin.marinas, bgardon, shuah, andrew.jones, dmatlack,
	pbonzini, zhenyzha, james.morse, suzuki.poulose,
	alexandru.elisei, shan.gavin

On Tue, Oct 25, 2022, Marc Zyngier wrote:
> On Tue, 25 Oct 2022 01:24:19 +0100, Oliver Upton <oliver.upton@linux.dev> wrote:
> > > That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
> > > might end up collecting dirty information without a vCPU.  KVM is still
> > > technically prescribing a solution to userspace, but only because there's only
> > > one solution.
> > 
> > I was trying to allude to something like this by flat-out requiring
> > ring + bitmap on arm64.
> 
> And I claim that this is wrong. It may suit a particular use case, but
> that's definitely not a universal truth.

Agreed, KVM should not unconditionally require a dirty bitmap for arm64.

> > Otherwise, we'd either need to:
> > 
> >  (1) Document the features that explicitly depend on ring + bitmap (i.e.
> >  GIC ITS, whatever else may come) such that userspace sets up the
> >  correct configuration based on what its using. The combined likelihood
> >  of both KVM and userspace getting this right seems low.
> 
> But what is there to get wrong? Absolutely nothing.

I strongly disagree.  On x86, we've had two bugs escape where KVM attempted to
mark a page dirty without an active vCPU.

  2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without an active vCPU") 
  42dcbe7d8bac ("KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU")

Call us incompetent, but I have zero confidence that KVM will never unintentionally
add a path that invokes mark_page_dirty_in_slot() without a running vCPU.

By completely dropping the rule that KVM must have an active vCPU on architectures
that support ring+bitmap, those types of bugs will go silently unnoticed, and will
manifest as guest data corruption after live migration.

And ideally such bugs would detected without relying on userspace to enabling
dirty logging, e.g. the Hyper-V bug lurked for quite some time and was only found
when mark_page_dirty_in_slot() started WARNing.

I'm ok if arm64 wants to let userspace shoot itself in the foot with the ITS, but
I'm not ok dropping the protections in the common mark_page_dirty_in_slot().

One somewhat gross idea would be to let architectures override the "there must be
a running vCPU" rule, e.g. arm64 could toggle a flag in kvm->arch in its
kvm_write_guest_lock() to note that an expected write without a vCPU is in-progress:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8c5c69ba47a7..d1da8914f749 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
        struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
 #ifdef CONFIG_HAVE_KVM_DIRTY_RING
-       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
+               return;
+
+       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
                return;
 #endif
 
@@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
                unsigned long rel_gfn = gfn - memslot->base_gfn;
                u32 slot = (memslot->as_id << 16) | memslot->id;
 
-               if (kvm->dirty_ring_size)
+               if (kvm->dirty_ring_size && vcpu)
                        kvm_dirty_ring_push(&vcpu->dirty_ring,
                                            slot, rel_gfn);
-               else
+               else if (memslot->dirty_bitmap)
                        set_bit_le(rel_gfn, memslot->dirty_bitmap);
        }
 }


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-25 17:47                   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-25 17:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Tue, Oct 25, 2022, Marc Zyngier wrote:
> On Tue, 25 Oct 2022 01:24:19 +0100, Oliver Upton <oliver.upton@linux.dev> wrote:
> > > That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
> > > might end up collecting dirty information without a vCPU.  KVM is still
> > > technically prescribing a solution to userspace, but only because there's only
> > > one solution.
> > 
> > I was trying to allude to something like this by flat-out requiring
> > ring + bitmap on arm64.
> 
> And I claim that this is wrong. It may suit a particular use case, but
> that's definitely not a universal truth.

Agreed, KVM should not unconditionally require a dirty bitmap for arm64.

> > Otherwise, we'd either need to:
> > 
> >  (1) Document the features that explicitly depend on ring + bitmap (i.e.
> >  GIC ITS, whatever else may come) such that userspace sets up the
> >  correct configuration based on what its using. The combined likelihood
> >  of both KVM and userspace getting this right seems low.
> 
> But what is there to get wrong? Absolutely nothing.

I strongly disagree.  On x86, we've had two bugs escape where KVM attempted to
mark a page dirty without an active vCPU.

  2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without an active vCPU") 
  42dcbe7d8bac ("KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU")

Call us incompetent, but I have zero confidence that KVM will never unintentionally
add a path that invokes mark_page_dirty_in_slot() without a running vCPU.

By completely dropping the rule that KVM must have an active vCPU on architectures
that support ring+bitmap, those types of bugs will go silently unnoticed, and will
manifest as guest data corruption after live migration.

And ideally such bugs would detected without relying on userspace to enabling
dirty logging, e.g. the Hyper-V bug lurked for quite some time and was only found
when mark_page_dirty_in_slot() started WARNing.

I'm ok if arm64 wants to let userspace shoot itself in the foot with the ITS, but
I'm not ok dropping the protections in the common mark_page_dirty_in_slot().

One somewhat gross idea would be to let architectures override the "there must be
a running vCPU" rule, e.g. arm64 could toggle a flag in kvm->arch in its
kvm_write_guest_lock() to note that an expected write without a vCPU is in-progress:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8c5c69ba47a7..d1da8914f749 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
        struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
 #ifdef CONFIG_HAVE_KVM_DIRTY_RING
-       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
+               return;
+
+       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
                return;
 #endif
 
@@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
                unsigned long rel_gfn = gfn - memslot->base_gfn;
                u32 slot = (memslot->as_id << 16) | memslot->id;
 
-               if (kvm->dirty_ring_size)
+               if (kvm->dirty_ring_size && vcpu)
                        kvm_dirty_ring_push(&vcpu->dirty_ring,
                                            slot, rel_gfn);
-               else
+               else if (memslot->dirty_bitmap)
                        set_bit_le(rel_gfn, memslot->dirty_bitmap);
        }
 }

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-25 17:47                   ` Sean Christopherson
@ 2022-10-27  8:29                     ` Marc Zyngier
  -1 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-27  8:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Oliver Upton, Gavin Shan, kvmarm, kvmarm, kvm, peterx, will,
	catalin.marinas, bgardon, shuah, andrew.jones, dmatlack,
	pbonzini, zhenyzha, james.morse, suzuki.poulose,
	alexandru.elisei, shan.gavin

On Tue, 25 Oct 2022 18:47:12 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Tue, Oct 25, 2022, Marc Zyngier wrote:
> > On Tue, 25 Oct 2022 01:24:19 +0100, Oliver Upton <oliver.upton@linux.dev> wrote:
> > > > That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
> > > > might end up collecting dirty information without a vCPU.  KVM is still
> > > > technically prescribing a solution to userspace, but only because there's only
> > > > one solution.
> > > 
> > > I was trying to allude to something like this by flat-out requiring
> > > ring + bitmap on arm64.
> > 
> > And I claim that this is wrong. It may suit a particular use case, but
> > that's definitely not a universal truth.
> 
> Agreed, KVM should not unconditionally require a dirty bitmap for arm64.
> 
> > > Otherwise, we'd either need to:
> > > 
> > >  (1) Document the features that explicitly depend on ring + bitmap (i.e.
> > >  GIC ITS, whatever else may come) such that userspace sets up the
> > >  correct configuration based on what its using. The combined likelihood
> > >  of both KVM and userspace getting this right seems low.
> > 
> > But what is there to get wrong? Absolutely nothing.
> 
> I strongly disagree.  On x86, we've had two bugs escape where KVM
> attempted to mark a page dirty without an active vCPU.
> 
>   2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without an active vCPU") 
>   42dcbe7d8bac ("KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU")
> 
> Call us incompetent, but I have zero confidence that KVM will never
> unintentionally add a path that invokes mark_page_dirty_in_slot()
> without a running vCPU.

Well, maybe it is time that KVM acknowledges there is a purpose to
dirtying memory outside of a vcpu context, and that if a write happens
in a vcpu context, this vcpu must be explicitly passed down rather
than obtained from kvm_get_running_vcpu(). Yes, this requires some
heavy surgery.

> By completely dropping the rule that KVM must have an active vCPU on
> architectures that support ring+bitmap, those types of bugs will go
> silently unnoticed, and will manifest as guest data corruption after
> live migration.

The elephant in the room is still userspace writing to its view of the
guest memory for device emulation. Do they get it right? I doubt it.

> And ideally such bugs would detected without relying on userspace to
> enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
> time and was only found when mark_page_dirty_in_slot() started
> WARNing.
> 
> I'm ok if arm64 wants to let userspace shoot itself in the foot with
> the ITS, but I'm not ok dropping the protections in the common
> mark_page_dirty_in_slot().
> 
> One somewhat gross idea would be to let architectures override the
> "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
> in kvm->arch in its kvm_write_guest_lock() to note that an expected
> write without a vCPU is in-progress:
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8c5c69ba47a7..d1da8914f749 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>         struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  
>  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
> +               return;
> +
> +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>                 return;
>  #endif
>  
> @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>                 unsigned long rel_gfn = gfn - memslot->base_gfn;
>                 u32 slot = (memslot->as_id << 16) | memslot->id;
>  
> -               if (kvm->dirty_ring_size)
> +               if (kvm->dirty_ring_size && vcpu)
>                         kvm_dirty_ring_push(&vcpu->dirty_ring,
>                                             slot, rel_gfn);
> -               else
> +               else if (memslot->dirty_bitmap)
>                         set_bit_le(rel_gfn, memslot->dirty_bitmap);
>         }
>  }

I think this is equally wrong. Writes occur from both CPUs and devices
*concurrently*, and I don't see why KVM should keep ignoring this
pretty obvious fact.

Yes, your patch papers over the problem, and it can probably work if
the kvm->arch flag only gets set in the ITS saving code, which is
already exclusive of vcpus running.

But in the long run, with dirty bits being collected from the IOMMU
page tables or directly from devices, we will need a way to reconcile
the dirty tracking. The above doesn't quite cut it, unfortunately.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-27  8:29                     ` Marc Zyngier
  0 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-27  8:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Tue, 25 Oct 2022 18:47:12 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Tue, Oct 25, 2022, Marc Zyngier wrote:
> > On Tue, 25 Oct 2022 01:24:19 +0100, Oliver Upton <oliver.upton@linux.dev> wrote:
> > > > That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
> > > > might end up collecting dirty information without a vCPU.  KVM is still
> > > > technically prescribing a solution to userspace, but only because there's only
> > > > one solution.
> > > 
> > > I was trying to allude to something like this by flat-out requiring
> > > ring + bitmap on arm64.
> > 
> > And I claim that this is wrong. It may suit a particular use case, but
> > that's definitely not a universal truth.
> 
> Agreed, KVM should not unconditionally require a dirty bitmap for arm64.
> 
> > > Otherwise, we'd either need to:
> > > 
> > >  (1) Document the features that explicitly depend on ring + bitmap (i.e.
> > >  GIC ITS, whatever else may come) such that userspace sets up the
> > >  correct configuration based on what its using. The combined likelihood
> > >  of both KVM and userspace getting this right seems low.
> > 
> > But what is there to get wrong? Absolutely nothing.
> 
> I strongly disagree.  On x86, we've had two bugs escape where KVM
> attempted to mark a page dirty without an active vCPU.
> 
>   2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without an active vCPU") 
>   42dcbe7d8bac ("KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU")
> 
> Call us incompetent, but I have zero confidence that KVM will never
> unintentionally add a path that invokes mark_page_dirty_in_slot()
> without a running vCPU.

Well, maybe it is time that KVM acknowledges there is a purpose to
dirtying memory outside of a vcpu context, and that if a write happens
in a vcpu context, this vcpu must be explicitly passed down rather
than obtained from kvm_get_running_vcpu(). Yes, this requires some
heavy surgery.

> By completely dropping the rule that KVM must have an active vCPU on
> architectures that support ring+bitmap, those types of bugs will go
> silently unnoticed, and will manifest as guest data corruption after
> live migration.

The elephant in the room is still userspace writing to its view of the
guest memory for device emulation. Do they get it right? I doubt it.

> And ideally such bugs would detected without relying on userspace to
> enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
> time and was only found when mark_page_dirty_in_slot() started
> WARNing.
> 
> I'm ok if arm64 wants to let userspace shoot itself in the foot with
> the ITS, but I'm not ok dropping the protections in the common
> mark_page_dirty_in_slot().
> 
> One somewhat gross idea would be to let architectures override the
> "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
> in kvm->arch in its kvm_write_guest_lock() to note that an expected
> write without a vCPU is in-progress:
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8c5c69ba47a7..d1da8914f749 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>         struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  
>  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
> +               return;
> +
> +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>                 return;
>  #endif
>  
> @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>                 unsigned long rel_gfn = gfn - memslot->base_gfn;
>                 u32 slot = (memslot->as_id << 16) | memslot->id;
>  
> -               if (kvm->dirty_ring_size)
> +               if (kvm->dirty_ring_size && vcpu)
>                         kvm_dirty_ring_push(&vcpu->dirty_ring,
>                                             slot, rel_gfn);
> -               else
> +               else if (memslot->dirty_bitmap)
>                         set_bit_le(rel_gfn, memslot->dirty_bitmap);
>         }
>  }

I think this is equally wrong. Writes occur from both CPUs and devices
*concurrently*, and I don't see why KVM should keep ignoring this
pretty obvious fact.

Yes, your patch papers over the problem, and it can probably work if
the kvm->arch flag only gets set in the ITS saving code, which is
already exclusive of vcpus running.

But in the long run, with dirty bits being collected from the IOMMU
page tables or directly from devices, we will need a way to reconcile
the dirty tracking. The above doesn't quite cut it, unfortunately.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-27  8:29                     ` Marc Zyngier
@ 2022-10-27 17:44                       ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-27 17:44 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, Gavin Shan, kvmarm, kvmarm, kvm, peterx, will,
	catalin.marinas, bgardon, shuah, andrew.jones, dmatlack,
	pbonzini, zhenyzha, james.morse, suzuki.poulose,
	alexandru.elisei, shan.gavin

On Thu, Oct 27, 2022, Marc Zyngier wrote:
> On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > Call us incompetent, but I have zero confidence that KVM will never
> > unintentionally add a path that invokes mark_page_dirty_in_slot()
> > without a running vCPU.
> 
> Well, maybe it is time that KVM acknowledges there is a purpose to
> dirtying memory outside of a vcpu context, and that if a write happens
> in a vcpu context, this vcpu must be explicitly passed down rather
> than obtained from kvm_get_running_vcpu(). Yes, this requires some
> heavy surgery.

Heh, preaching to the choir on this one.

  On Mon, Dec 02, 2019 at 12:10:36PM -0800, Sean Christopherson wrote:
  > IMO, adding kvm_get_running_vcpu() is a hack that is just asking for future
  > abuse and the vcpu/vm/as_id interactions in mark_page_dirty_in_ring() look
  > extremely fragile.

I'm all in favor of not using kvm_get_running_vcpu() in this path.

That said, it's somewhat of an orthogonal issue, as I would still want a sanity
check in mark_page_dirty_in_slot() that a vCPU is provided when there is no
dirty bitmap.

> > By completely dropping the rule that KVM must have an active vCPU on
> > architectures that support ring+bitmap, those types of bugs will go
> > silently unnoticed, and will manifest as guest data corruption after
> > live migration.
> 
> The elephant in the room is still userspace writing to its view of the
> guest memory for device emulation. Do they get it right? I doubt it.

I don't see what that has to do with KVM though.  There are many things userspace
needs to get right, that doesn't mean that KVM shouldn't strive to provide
safeguards for the functionality that KVM provides.

> > And ideally such bugs would detected without relying on userspace to
> > enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
> > time and was only found when mark_page_dirty_in_slot() started
> > WARNing.
> > 
> > I'm ok if arm64 wants to let userspace shoot itself in the foot with
> > the ITS, but I'm not ok dropping the protections in the common
> > mark_page_dirty_in_slot().
> > 
> > One somewhat gross idea would be to let architectures override the
> > "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
> > in kvm->arch in its kvm_write_guest_lock() to note that an expected
> > write without a vCPU is in-progress:
> > 
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 8c5c69ba47a7..d1da8914f749 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> >         struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> >  
> >  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> > -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> > +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
> > +               return;
> > +
> > +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> >                 return;
> >  #endif
> >  
> > @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> >                 unsigned long rel_gfn = gfn - memslot->base_gfn;
> >                 u32 slot = (memslot->as_id << 16) | memslot->id;
> >  
> > -               if (kvm->dirty_ring_size)
> > +               if (kvm->dirty_ring_size && vcpu)
> >                         kvm_dirty_ring_push(&vcpu->dirty_ring,
> >                                             slot, rel_gfn);
> > -               else
> > +               else if (memslot->dirty_bitmap)
> >                         set_bit_le(rel_gfn, memslot->dirty_bitmap);
> >         }
> >  }
> 
> I think this is equally wrong. Writes occur from both CPUs and devices
> *concurrently*, and I don't see why KVM should keep ignoring this
> pretty obvious fact.
>
> Yes, your patch papers over the problem, and it can probably work if
> the kvm->arch flag only gets set in the ITS saving code, which is
> already exclusive of vcpus running.
> 
> But in the long run, with dirty bits being collected from the IOMMU
> page tables or directly from devices, we will need a way to reconcile
> the dirty tracking. The above doesn't quite cut it, unfortunately.

Oooh, are you referring to IOMMU page tables and devices _in the guest_?  E.g. if
KVM itself were to emulate a vIOMMU, then KVM would be responsible for updating
dirty bits in the vIOMMU page tables.

Not that it really matters, but do we actually expect KVM to ever emulate a vIOMMU?
On x86 at least, in-kernel acceleration of vIOMMU emulation seems more like VFIO
territory.

Regardless, I don't think the above idea makes it any more difficult to support
in-KVM emulation of non-CPU stuff, which IIUC is the ITS case.  I 100% agree that
the above is a hack, but that's largely due to the use of kvm_get_running_vcpu().

A slightly different alternative would be have a completely separate API for writing
guest memory without an associated vCPU.  I.e. start building up proper device emulation
support.  Then the vCPU-based APIs could yell if a vCPU isn't provided (or there
is no running vCPU in the current mess).  And the deviced-based API could be
provided if and only if the architecture actually supports emulating writes from
devices, i.e. x86 would not opt-in and so would even have access to the API.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-27 17:44                       ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-27 17:44 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Thu, Oct 27, 2022, Marc Zyngier wrote:
> On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > Call us incompetent, but I have zero confidence that KVM will never
> > unintentionally add a path that invokes mark_page_dirty_in_slot()
> > without a running vCPU.
> 
> Well, maybe it is time that KVM acknowledges there is a purpose to
> dirtying memory outside of a vcpu context, and that if a write happens
> in a vcpu context, this vcpu must be explicitly passed down rather
> than obtained from kvm_get_running_vcpu(). Yes, this requires some
> heavy surgery.

Heh, preaching to the choir on this one.

  On Mon, Dec 02, 2019 at 12:10:36PM -0800, Sean Christopherson wrote:
  > IMO, adding kvm_get_running_vcpu() is a hack that is just asking for future
  > abuse and the vcpu/vm/as_id interactions in mark_page_dirty_in_ring() look
  > extremely fragile.

I'm all in favor of not using kvm_get_running_vcpu() in this path.

That said, it's somewhat of an orthogonal issue, as I would still want a sanity
check in mark_page_dirty_in_slot() that a vCPU is provided when there is no
dirty bitmap.

> > By completely dropping the rule that KVM must have an active vCPU on
> > architectures that support ring+bitmap, those types of bugs will go
> > silently unnoticed, and will manifest as guest data corruption after
> > live migration.
> 
> The elephant in the room is still userspace writing to its view of the
> guest memory for device emulation. Do they get it right? I doubt it.

I don't see what that has to do with KVM though.  There are many things userspace
needs to get right, that doesn't mean that KVM shouldn't strive to provide
safeguards for the functionality that KVM provides.

> > And ideally such bugs would detected without relying on userspace to
> > enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
> > time and was only found when mark_page_dirty_in_slot() started
> > WARNing.
> > 
> > I'm ok if arm64 wants to let userspace shoot itself in the foot with
> > the ITS, but I'm not ok dropping the protections in the common
> > mark_page_dirty_in_slot().
> > 
> > One somewhat gross idea would be to let architectures override the
> > "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
> > in kvm->arch in its kvm_write_guest_lock() to note that an expected
> > write without a vCPU is in-progress:
> > 
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 8c5c69ba47a7..d1da8914f749 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> >         struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> >  
> >  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> > -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> > +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
> > +               return;
> > +
> > +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> >                 return;
> >  #endif
> >  
> > @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> >                 unsigned long rel_gfn = gfn - memslot->base_gfn;
> >                 u32 slot = (memslot->as_id << 16) | memslot->id;
> >  
> > -               if (kvm->dirty_ring_size)
> > +               if (kvm->dirty_ring_size && vcpu)
> >                         kvm_dirty_ring_push(&vcpu->dirty_ring,
> >                                             slot, rel_gfn);
> > -               else
> > +               else if (memslot->dirty_bitmap)
> >                         set_bit_le(rel_gfn, memslot->dirty_bitmap);
> >         }
> >  }
> 
> I think this is equally wrong. Writes occur from both CPUs and devices
> *concurrently*, and I don't see why KVM should keep ignoring this
> pretty obvious fact.
>
> Yes, your patch papers over the problem, and it can probably work if
> the kvm->arch flag only gets set in the ITS saving code, which is
> already exclusive of vcpus running.
> 
> But in the long run, with dirty bits being collected from the IOMMU
> page tables or directly from devices, we will need a way to reconcile
> the dirty tracking. The above doesn't quite cut it, unfortunately.

Oooh, are you referring to IOMMU page tables and devices _in the guest_?  E.g. if
KVM itself were to emulate a vIOMMU, then KVM would be responsible for updating
dirty bits in the vIOMMU page tables.

Not that it really matters, but do we actually expect KVM to ever emulate a vIOMMU?
On x86 at least, in-kernel acceleration of vIOMMU emulation seems more like VFIO
territory.

Regardless, I don't think the above idea makes it any more difficult to support
in-KVM emulation of non-CPU stuff, which IIUC is the ITS case.  I 100% agree that
the above is a hack, but that's largely due to the use of kvm_get_running_vcpu().

A slightly different alternative would be have a completely separate API for writing
guest memory without an associated vCPU.  I.e. start building up proper device emulation
support.  Then the vCPU-based APIs could yell if a vCPU isn't provided (or there
is no running vCPU in the current mess).  And the deviced-based API could be
provided if and only if the architecture actually supports emulating writes from
devices, i.e. x86 would not opt-in and so would even have access to the API.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-27 17:44                       ` Sean Christopherson
@ 2022-10-27 18:30                         ` Marc Zyngier
  -1 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-27 18:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Oliver Upton, Gavin Shan, kvmarm, kvmarm, kvm, peterx, will,
	catalin.marinas, bgardon, shuah, andrew.jones, dmatlack,
	pbonzini, zhenyzha, james.morse, suzuki.poulose,
	alexandru.elisei, shan.gavin

On Thu, 27 Oct 2022 18:44:51 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Thu, Oct 27, 2022, Marc Zyngier wrote:
> > On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > > Call us incompetent, but I have zero confidence that KVM will never
> > > unintentionally add a path that invokes mark_page_dirty_in_slot()
> > > without a running vCPU.
> > 
> > Well, maybe it is time that KVM acknowledges there is a purpose to
> > dirtying memory outside of a vcpu context, and that if a write happens
> > in a vcpu context, this vcpu must be explicitly passed down rather
> > than obtained from kvm_get_running_vcpu(). Yes, this requires some
> > heavy surgery.
> 
> Heh, preaching to the choir on this one.
> 
>   On Mon, Dec 02, 2019 at 12:10:36PM -0800, Sean Christopherson wrote:
>   > IMO, adding kvm_get_running_vcpu() is a hack that is just asking for future
>   > abuse and the vcpu/vm/as_id interactions in mark_page_dirty_in_ring() look
>   > extremely fragile.
> 
> I'm all in favor of not using kvm_get_running_vcpu() in this path.
> 
> That said, it's somewhat of an orthogonal issue, as I would still
> want a sanity check in mark_page_dirty_in_slot() that a vCPU is
> provided when there is no dirty bitmap.

If we have a separate context and/or API, then all these checks become
a lot less controversial, and we can start reasoning about these
things. At the moment, this is just a mess.

> 
> > > By completely dropping the rule that KVM must have an active vCPU on
> > > architectures that support ring+bitmap, those types of bugs will go
> > > silently unnoticed, and will manifest as guest data corruption after
> > > live migration.
> > 
> > The elephant in the room is still userspace writing to its view of the
> > guest memory for device emulation. Do they get it right? I doubt it.
> 
> I don't see what that has to do with KVM though.  There are many
> things userspace needs to get right, that doesn't mean that KVM
> shouldn't strive to provide safeguards for the functionality that
> KVM provides.

I guess we have different expectations of what KVM should provide. My
take is that userspace doesn't need a nanny, and that a decent level
of documentation should make it obvious what feature captures which
state.

But we've argued for a while now, and I don't see that we're getting
any closer to a resolution. So let's at least make some forward
progress with the opt-out mechanism you mentioned, and arm64 will buy
into it when snapshoting the ITS.

> 
> > > And ideally such bugs would detected without relying on userspace to
> > > enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
> > > time and was only found when mark_page_dirty_in_slot() started
> > > WARNing.
> > > 
> > > I'm ok if arm64 wants to let userspace shoot itself in the foot with
> > > the ITS, but I'm not ok dropping the protections in the common
> > > mark_page_dirty_in_slot().
> > > 
> > > One somewhat gross idea would be to let architectures override the
> > > "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
> > > in kvm->arch in its kvm_write_guest_lock() to note that an expected
> > > write without a vCPU is in-progress:
> > > 
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 8c5c69ba47a7..d1da8914f749 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> > >         struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> > >  
> > >  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> > > -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> > > +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
> > > +               return;
> > > +
> > > +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> > >                 return;
> > >  #endif
> > >  
> > > @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> > >                 unsigned long rel_gfn = gfn - memslot->base_gfn;
> > >                 u32 slot = (memslot->as_id << 16) | memslot->id;
> > >  
> > > -               if (kvm->dirty_ring_size)
> > > +               if (kvm->dirty_ring_size && vcpu)
> > >                         kvm_dirty_ring_push(&vcpu->dirty_ring,
> > >                                             slot, rel_gfn);
> > > -               else
> > > +               else if (memslot->dirty_bitmap)
> > >                         set_bit_le(rel_gfn, memslot->dirty_bitmap);
> > >         }
> > >  }
> > 
> > I think this is equally wrong. Writes occur from both CPUs and devices
> > *concurrently*, and I don't see why KVM should keep ignoring this
> > pretty obvious fact.
> >
> > Yes, your patch papers over the problem, and it can probably work if
> > the kvm->arch flag only gets set in the ITS saving code, which is
> > already exclusive of vcpus running.
> > 
> > But in the long run, with dirty bits being collected from the IOMMU
> > page tables or directly from devices, we will need a way to reconcile
> > the dirty tracking. The above doesn't quite cut it, unfortunately.
> 
> Oooh, are you referring to IOMMU page tables and devices _in the
> guest_?  E.g. if KVM itself were to emulate a vIOMMU, then KVM would
> be responsible for updating dirty bits in the vIOMMU page tables.

No. I'm talking about the *physical* IOMMU, which is (with the correct
architecture revision and feature set) capable of providing its own
set of dirty bits, on a per-device, per-PTE basis. Once we enable
that, we'll need to be able to sink these bits into the bitmap and
provide a unified view of the dirty state to userspace.

> Not that it really matters, but do we actually expect KVM to ever
> emulate a vIOMMU?  On x86 at least, in-kernel acceleration of vIOMMU
> emulation seems more like VFIO territory.

I don't expect KVM/arm64 to fully emulate an IOMMU, but at least to
eventually provide the required filtering to enable a stage-1 SMMU to
be passed to a guest. This is the sort of things pKVM needs to
implement for the host anyway, and going the extra mile to support
arbitrary guests outside of the pKVM context isn't much more work.

> Regardless, I don't think the above idea makes it any more difficult
> to support in-KVM emulation of non-CPU stuff, which IIUC is the ITS
> case.  I 100% agree that the above is a hack, but that's largely due
> to the use of kvm_get_running_vcpu().

That I agree.

> A slightly different alternative would be have a completely separate
> API for writing guest memory without an associated vCPU.  I.e. start
> building up proper device emulation support.  Then the vCPU-based
> APIs could yell if a vCPU isn't provided (or there is no running
> vCPU in the current mess).  And the deviced-based API could be
> provided if and only if the architecture actually supports emulating
> writes from devices, i.e. x86 would not opt-in and so would even
> have access to the API.

Which is what I was putting under the "major surgery" label in my
previous email.

Anyhow, for the purpose of unblocking Gavin's series, I suggest to
adopt your per-arch opt-out suggestion as a stop gap measure, and we
will then be able to bike-shed for weeks on what the shape of the
device-originated memory dirtying API should be.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-27 18:30                         ` Marc Zyngier
  0 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-27 18:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Thu, 27 Oct 2022 18:44:51 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Thu, Oct 27, 2022, Marc Zyngier wrote:
> > On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > > Call us incompetent, but I have zero confidence that KVM will never
> > > unintentionally add a path that invokes mark_page_dirty_in_slot()
> > > without a running vCPU.
> > 
> > Well, maybe it is time that KVM acknowledges there is a purpose to
> > dirtying memory outside of a vcpu context, and that if a write happens
> > in a vcpu context, this vcpu must be explicitly passed down rather
> > than obtained from kvm_get_running_vcpu(). Yes, this requires some
> > heavy surgery.
> 
> Heh, preaching to the choir on this one.
> 
>   On Mon, Dec 02, 2019 at 12:10:36PM -0800, Sean Christopherson wrote:
>   > IMO, adding kvm_get_running_vcpu() is a hack that is just asking for future
>   > abuse and the vcpu/vm/as_id interactions in mark_page_dirty_in_ring() look
>   > extremely fragile.
> 
> I'm all in favor of not using kvm_get_running_vcpu() in this path.
> 
> That said, it's somewhat of an orthogonal issue, as I would still
> want a sanity check in mark_page_dirty_in_slot() that a vCPU is
> provided when there is no dirty bitmap.

If we have a separate context and/or API, then all these checks become
a lot less controversial, and we can start reasoning about these
things. At the moment, this is just a mess.

> 
> > > By completely dropping the rule that KVM must have an active vCPU on
> > > architectures that support ring+bitmap, those types of bugs will go
> > > silently unnoticed, and will manifest as guest data corruption after
> > > live migration.
> > 
> > The elephant in the room is still userspace writing to its view of the
> > guest memory for device emulation. Do they get it right? I doubt it.
> 
> I don't see what that has to do with KVM though.  There are many
> things userspace needs to get right, that doesn't mean that KVM
> shouldn't strive to provide safeguards for the functionality that
> KVM provides.

I guess we have different expectations of what KVM should provide. My
take is that userspace doesn't need a nanny, and that a decent level
of documentation should make it obvious what feature captures which
state.

But we've argued for a while now, and I don't see that we're getting
any closer to a resolution. So let's at least make some forward
progress with the opt-out mechanism you mentioned, and arm64 will buy
into it when snapshoting the ITS.

> 
> > > And ideally such bugs would detected without relying on userspace to
> > > enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
> > > time and was only found when mark_page_dirty_in_slot() started
> > > WARNing.
> > > 
> > > I'm ok if arm64 wants to let userspace shoot itself in the foot with
> > > the ITS, but I'm not ok dropping the protections in the common
> > > mark_page_dirty_in_slot().
> > > 
> > > One somewhat gross idea would be to let architectures override the
> > > "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
> > > in kvm->arch in its kvm_write_guest_lock() to note that an expected
> > > write without a vCPU is in-progress:
> > > 
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 8c5c69ba47a7..d1da8914f749 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> > >         struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> > >  
> > >  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> > > -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> > > +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
> > > +               return;
> > > +
> > > +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> > >                 return;
> > >  #endif
> > >  
> > > @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> > >                 unsigned long rel_gfn = gfn - memslot->base_gfn;
> > >                 u32 slot = (memslot->as_id << 16) | memslot->id;
> > >  
> > > -               if (kvm->dirty_ring_size)
> > > +               if (kvm->dirty_ring_size && vcpu)
> > >                         kvm_dirty_ring_push(&vcpu->dirty_ring,
> > >                                             slot, rel_gfn);
> > > -               else
> > > +               else if (memslot->dirty_bitmap)
> > >                         set_bit_le(rel_gfn, memslot->dirty_bitmap);
> > >         }
> > >  }
> > 
> > I think this is equally wrong. Writes occur from both CPUs and devices
> > *concurrently*, and I don't see why KVM should keep ignoring this
> > pretty obvious fact.
> >
> > Yes, your patch papers over the problem, and it can probably work if
> > the kvm->arch flag only gets set in the ITS saving code, which is
> > already exclusive of vcpus running.
> > 
> > But in the long run, with dirty bits being collected from the IOMMU
> > page tables or directly from devices, we will need a way to reconcile
> > the dirty tracking. The above doesn't quite cut it, unfortunately.
> 
> Oooh, are you referring to IOMMU page tables and devices _in the
> guest_?  E.g. if KVM itself were to emulate a vIOMMU, then KVM would
> be responsible for updating dirty bits in the vIOMMU page tables.

No. I'm talking about the *physical* IOMMU, which is (with the correct
architecture revision and feature set) capable of providing its own
set of dirty bits, on a per-device, per-PTE basis. Once we enable
that, we'll need to be able to sink these bits into the bitmap and
provide a unified view of the dirty state to userspace.

> Not that it really matters, but do we actually expect KVM to ever
> emulate a vIOMMU?  On x86 at least, in-kernel acceleration of vIOMMU
> emulation seems more like VFIO territory.

I don't expect KVM/arm64 to fully emulate an IOMMU, but at least to
eventually provide the required filtering to enable a stage-1 SMMU to
be passed to a guest. This is the sort of things pKVM needs to
implement for the host anyway, and going the extra mile to support
arbitrary guests outside of the pKVM context isn't much more work.

> Regardless, I don't think the above idea makes it any more difficult
> to support in-KVM emulation of non-CPU stuff, which IIUC is the ITS
> case.  I 100% agree that the above is a hack, but that's largely due
> to the use of kvm_get_running_vcpu().

That I agree.

> A slightly different alternative would be have a completely separate
> API for writing guest memory without an associated vCPU.  I.e. start
> building up proper device emulation support.  Then the vCPU-based
> APIs could yell if a vCPU isn't provided (or there is no running
> vCPU in the current mess).  And the deviced-based API could be
> provided if and only if the architecture actually supports emulating
> writes from devices, i.e. x86 would not opt-in and so would even
> have access to the API.

Which is what I was putting under the "major surgery" label in my
previous email.

Anyhow, for the purpose of unblocking Gavin's series, I suggest to
adopt your per-arch opt-out suggestion as a stop gap measure, and we
will then be able to bike-shed for weeks on what the shape of the
device-originated memory dirtying API should be.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-27 18:30                         ` Marc Zyngier
@ 2022-10-27 19:09                           ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-27 19:09 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, Gavin Shan, kvmarm, kvmarm, kvm, peterx, will,
	catalin.marinas, bgardon, shuah, andrew.jones, dmatlack,
	pbonzini, zhenyzha, james.morse, suzuki.poulose,
	alexandru.elisei, shan.gavin

On Thu, Oct 27, 2022, Marc Zyngier wrote:
> On Thu, 27 Oct 2022 18:44:51 +0100,
> Sean Christopherson <seanjc@google.com> wrote:
> > 
> > On Thu, Oct 27, 2022, Marc Zyngier wrote:
> > > But in the long run, with dirty bits being collected from the IOMMU
> > > page tables or directly from devices, we will need a way to reconcile
> > > the dirty tracking. The above doesn't quite cut it, unfortunately.
> > 
> > Oooh, are you referring to IOMMU page tables and devices _in the
> > guest_?  E.g. if KVM itself were to emulate a vIOMMU, then KVM would
> > be responsible for updating dirty bits in the vIOMMU page tables.
> 
> No. I'm talking about the *physical* IOMMU, which is (with the correct
> architecture revision and feature set) capable of providing its own
> set of dirty bits, on a per-device, per-PTE basis. Once we enable
> that, we'll need to be able to sink these bits into the bitmap and
> provide a unified view of the dirty state to userspace.

Isn't that already handled by VFIO, e.g. via VFIO_IOMMU_DIRTY_PAGES?  There may
be "duplicate" information if a page is dirty in both the IOMMU page tables and
the CPU page tables, but that's ok in that the worst case scenario is that the
VMM performs a redundant unnecessary transfer.

A unified dirty bitmap would potentially reduce the memory footprint needed for
dirty logging, but presumably IOMMU-mapped memory is a small subset of CPU-mapped
memory in most use cases.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-27 19:09                           ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-27 19:09 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On Thu, Oct 27, 2022, Marc Zyngier wrote:
> On Thu, 27 Oct 2022 18:44:51 +0100,
> Sean Christopherson <seanjc@google.com> wrote:
> > 
> > On Thu, Oct 27, 2022, Marc Zyngier wrote:
> > > But in the long run, with dirty bits being collected from the IOMMU
> > > page tables or directly from devices, we will need a way to reconcile
> > > the dirty tracking. The above doesn't quite cut it, unfortunately.
> > 
> > Oooh, are you referring to IOMMU page tables and devices _in the
> > guest_?  E.g. if KVM itself were to emulate a vIOMMU, then KVM would
> > be responsible for updating dirty bits in the vIOMMU page tables.
> 
> No. I'm talking about the *physical* IOMMU, which is (with the correct
> architecture revision and feature set) capable of providing its own
> set of dirty bits, on a per-device, per-PTE basis. Once we enable
> that, we'll need to be able to sink these bits into the bitmap and
> provide a unified view of the dirty state to userspace.

Isn't that already handled by VFIO, e.g. via VFIO_IOMMU_DIRTY_PAGES?  There may
be "duplicate" information if a page is dirty in both the IOMMU page tables and
the CPU page tables, but that's ok in that the worst case scenario is that the
VMM performs a redundant unnecessary transfer.

A unified dirty bitmap would potentially reduce the memory footprint needed for
dirty logging, but presumably IOMMU-mapped memory is a small subset of CPU-mapped
memory in most use cases.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-27 18:30                         ` Marc Zyngier
@ 2022-10-28  6:43                           ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-28  6:43 UTC (permalink / raw)
  To: Marc Zyngier, Sean Christopherson
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

Hi Sean and Marc,

On 10/28/22 2:30 AM, Marc Zyngier wrote:
> On Thu, 27 Oct 2022 18:44:51 +0100,
> Sean Christopherson <seanjc@google.com> wrote:
>>
>> On Thu, Oct 27, 2022, Marc Zyngier wrote:
>>> On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:

[...]
  
>>
>>>> And ideally such bugs would detected without relying on userspace to
>>>> enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
>>>> time and was only found when mark_page_dirty_in_slot() started
>>>> WARNing.
>>>>
>>>> I'm ok if arm64 wants to let userspace shoot itself in the foot with
>>>> the ITS, but I'm not ok dropping the protections in the common
>>>> mark_page_dirty_in_slot().
>>>>
>>>> One somewhat gross idea would be to let architectures override the
>>>> "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
>>>> in kvm->arch in its kvm_write_guest_lock() to note that an expected
>>>> write without a vCPU is in-progress:
>>>>
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index 8c5c69ba47a7..d1da8914f749 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>>>          struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>>>   
>>>>   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>>>> -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>>>> +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
>>>> +               return;
>>>> +
>>>> +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>>>>                  return;
>>>>   #endif
>>>>   
>>>> @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>>>                  unsigned long rel_gfn = gfn - memslot->base_gfn;
>>>>                  u32 slot = (memslot->as_id << 16) | memslot->id;
>>>>   
>>>> -               if (kvm->dirty_ring_size)
>>>> +               if (kvm->dirty_ring_size && vcpu)
>>>>                          kvm_dirty_ring_push(&vcpu->dirty_ring,
>>>>                                              slot, rel_gfn);
>>>> -               else
>>>> +               else if (memslot->dirty_bitmap)
>>>>                          set_bit_le(rel_gfn, memslot->dirty_bitmap);
>>>>          }
>>>>   }
>>>
>>> I think this is equally wrong. Writes occur from both CPUs and devices
>>> *concurrently*, and I don't see why KVM should keep ignoring this
>>> pretty obvious fact.
>>>
>>> Yes, your patch papers over the problem, and it can probably work if
>>> the kvm->arch flag only gets set in the ITS saving code, which is
>>> already exclusive of vcpus running.
>>>
>>> But in the long run, with dirty bits being collected from the IOMMU
>>> page tables or directly from devices, we will need a way to reconcile
>>> the dirty tracking. The above doesn't quite cut it, unfortunately.
>>
>> Oooh, are you referring to IOMMU page tables and devices _in the
>> guest_?  E.g. if KVM itself were to emulate a vIOMMU, then KVM would
>> be responsible for updating dirty bits in the vIOMMU page tables.
> 
> No. I'm talking about the *physical* IOMMU, which is (with the correct
> architecture revision and feature set) capable of providing its own
> set of dirty bits, on a per-device, per-PTE basis. Once we enable
> that, we'll need to be able to sink these bits into the bitmap and
> provide a unified view of the dirty state to userspace.
> 
>> Not that it really matters, but do we actually expect KVM to ever
>> emulate a vIOMMU?  On x86 at least, in-kernel acceleration of vIOMMU
>> emulation seems more like VFIO territory.
> 
> I don't expect KVM/arm64 to fully emulate an IOMMU, but at least to
> eventually provide the required filtering to enable a stage-1 SMMU to
> be passed to a guest. This is the sort of things pKVM needs to
> implement for the host anyway, and going the extra mile to support
> arbitrary guests outside of the pKVM context isn't much more work.
> 
>> Regardless, I don't think the above idea makes it any more difficult
>> to support in-KVM emulation of non-CPU stuff, which IIUC is the ITS
>> case.  I 100% agree that the above is a hack, but that's largely due
>> to the use of kvm_get_running_vcpu().
> 
> That I agree.
> 
>> A slightly different alternative would be have a completely separate
>> API for writing guest memory without an associated vCPU.  I.e. start
>> building up proper device emulation support.  Then the vCPU-based
>> APIs could yell if a vCPU isn't provided (or there is no running
>> vCPU in the current mess).  And the deviced-based API could be
>> provided if and only if the architecture actually supports emulating
>> writes from devices, i.e. x86 would not opt-in and so would even
>> have access to the API.
> 
> Which is what I was putting under the "major surgery" label in my
> previous email.
> 
> Anyhow, for the purpose of unblocking Gavin's series, I suggest to
> adopt your per-arch opt-out suggestion as a stop gap measure, and we
> will then be able to bike-shed for weeks on what the shape of the
> device-originated memory dirtying API should be.
> 

It's really a 'major surgery' and I would like to make sure I fully understand
'a completely separate API for writing guest memory without an associated vCPU",
before I'm going to working on v7 for this.

There are 7 functions and 2 macros involved as below. I assume Sean is suggesting
to add another argument, whose name can be 'has_vcpu', for these functions and macros?
Sean, could you please double confirm?

If I'm understanding correctly, and 'has_vcpu' argument will be added for these
functions and macros. Except the call sites in vgic/its, 'has_vcpu' is set to 'true',
and passed to these functions. It means we need a 'false' for the argument in vgic/its
call sites. Please correct me if I'm wrong.

   int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
                            int offset, int len);
   int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data,
                       unsigned long len);
   int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
                              void *data, unsigned long len);
   int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
                                     void *data, unsigned int offset,
                                     unsigned long len);

   void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
   void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
   void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn);

   #define __kvm_put_guest(kvm, gfn, offset, v)
   #define kvm_put_guest(kvm, gpa, v)
   
Thanks,
Gavin


_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-28  6:43                           ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-28  6:43 UTC (permalink / raw)
  To: Marc Zyngier, Sean Christopherson
  Cc: Oliver Upton, kvmarm, kvmarm, kvm, peterx, will, catalin.marinas,
	bgardon, shuah, andrew.jones, dmatlack, pbonzini, zhenyzha,
	james.morse, suzuki.poulose, alexandru.elisei, shan.gavin

Hi Sean and Marc,

On 10/28/22 2:30 AM, Marc Zyngier wrote:
> On Thu, 27 Oct 2022 18:44:51 +0100,
> Sean Christopherson <seanjc@google.com> wrote:
>>
>> On Thu, Oct 27, 2022, Marc Zyngier wrote:
>>> On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:

[...]
  
>>
>>>> And ideally such bugs would detected without relying on userspace to
>>>> enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
>>>> time and was only found when mark_page_dirty_in_slot() started
>>>> WARNing.
>>>>
>>>> I'm ok if arm64 wants to let userspace shoot itself in the foot with
>>>> the ITS, but I'm not ok dropping the protections in the common
>>>> mark_page_dirty_in_slot().
>>>>
>>>> One somewhat gross idea would be to let architectures override the
>>>> "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
>>>> in kvm->arch in its kvm_write_guest_lock() to note that an expected
>>>> write without a vCPU is in-progress:
>>>>
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index 8c5c69ba47a7..d1da8914f749 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>>>          struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>>>   
>>>>   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>>>> -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>>>> +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
>>>> +               return;
>>>> +
>>>> +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>>>>                  return;
>>>>   #endif
>>>>   
>>>> @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>>>                  unsigned long rel_gfn = gfn - memslot->base_gfn;
>>>>                  u32 slot = (memslot->as_id << 16) | memslot->id;
>>>>   
>>>> -               if (kvm->dirty_ring_size)
>>>> +               if (kvm->dirty_ring_size && vcpu)
>>>>                          kvm_dirty_ring_push(&vcpu->dirty_ring,
>>>>                                              slot, rel_gfn);
>>>> -               else
>>>> +               else if (memslot->dirty_bitmap)
>>>>                          set_bit_le(rel_gfn, memslot->dirty_bitmap);
>>>>          }
>>>>   }
>>>
>>> I think this is equally wrong. Writes occur from both CPUs and devices
>>> *concurrently*, and I don't see why KVM should keep ignoring this
>>> pretty obvious fact.
>>>
>>> Yes, your patch papers over the problem, and it can probably work if
>>> the kvm->arch flag only gets set in the ITS saving code, which is
>>> already exclusive of vcpus running.
>>>
>>> But in the long run, with dirty bits being collected from the IOMMU
>>> page tables or directly from devices, we will need a way to reconcile
>>> the dirty tracking. The above doesn't quite cut it, unfortunately.
>>
>> Oooh, are you referring to IOMMU page tables and devices _in the
>> guest_?  E.g. if KVM itself were to emulate a vIOMMU, then KVM would
>> be responsible for updating dirty bits in the vIOMMU page tables.
> 
> No. I'm talking about the *physical* IOMMU, which is (with the correct
> architecture revision and feature set) capable of providing its own
> set of dirty bits, on a per-device, per-PTE basis. Once we enable
> that, we'll need to be able to sink these bits into the bitmap and
> provide a unified view of the dirty state to userspace.
> 
>> Not that it really matters, but do we actually expect KVM to ever
>> emulate a vIOMMU?  On x86 at least, in-kernel acceleration of vIOMMU
>> emulation seems more like VFIO territory.
> 
> I don't expect KVM/arm64 to fully emulate an IOMMU, but at least to
> eventually provide the required filtering to enable a stage-1 SMMU to
> be passed to a guest. This is the sort of things pKVM needs to
> implement for the host anyway, and going the extra mile to support
> arbitrary guests outside of the pKVM context isn't much more work.
> 
>> Regardless, I don't think the above idea makes it any more difficult
>> to support in-KVM emulation of non-CPU stuff, which IIUC is the ITS
>> case.  I 100% agree that the above is a hack, but that's largely due
>> to the use of kvm_get_running_vcpu().
> 
> That I agree.
> 
>> A slightly different alternative would be have a completely separate
>> API for writing guest memory without an associated vCPU.  I.e. start
>> building up proper device emulation support.  Then the vCPU-based
>> APIs could yell if a vCPU isn't provided (or there is no running
>> vCPU in the current mess).  And the deviced-based API could be
>> provided if and only if the architecture actually supports emulating
>> writes from devices, i.e. x86 would not opt-in and so would even
>> have access to the API.
> 
> Which is what I was putting under the "major surgery" label in my
> previous email.
> 
> Anyhow, for the purpose of unblocking Gavin's series, I suggest to
> adopt your per-arch opt-out suggestion as a stop gap measure, and we
> will then be able to bike-shed for weeks on what the shape of the
> device-originated memory dirtying API should be.
> 

It's really a 'major surgery' and I would like to make sure I fully understand
'a completely separate API for writing guest memory without an associated vCPU",
before I'm going to working on v7 for this.

There are 7 functions and 2 macros involved as below. I assume Sean is suggesting
to add another argument, whose name can be 'has_vcpu', for these functions and macros?
Sean, could you please double confirm?

If I'm understanding correctly, and 'has_vcpu' argument will be added for these
functions and macros. Except the call sites in vgic/its, 'has_vcpu' is set to 'true',
and passed to these functions. It means we need a 'false' for the argument in vgic/its
call sites. Please correct me if I'm wrong.

   int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
                            int offset, int len);
   int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data,
                       unsigned long len);
   int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
                              void *data, unsigned long len);
   int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
                                     void *data, unsigned int offset,
                                     unsigned long len);

   void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
   void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
   void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn);

   #define __kvm_put_guest(kvm, gfn, offset, v)
   #define kvm_put_guest(kvm, gpa, v)
   
Thanks,
Gavin



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-28  6:43                           ` Gavin Shan
@ 2022-10-28 16:51                             ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-28 16:51 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Marc Zyngier, Oliver Upton, kvmarm, kvmarm, kvm, peterx, will,
	catalin.marinas, bgardon, shuah, andrew.jones, dmatlack,
	pbonzini, zhenyzha, james.morse, suzuki.poulose,
	alexandru.elisei, shan.gavin

On Fri, Oct 28, 2022, Gavin Shan wrote:
> Hi Sean and Marc,
> 
> On 10/28/22 2:30 AM, Marc Zyngier wrote:
> > On Thu, 27 Oct 2022 18:44:51 +0100,
> > Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > On Thu, Oct 27, 2022, Marc Zyngier wrote:
> > > > On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:
> 
> [...]
> > > 
> > > > > And ideally such bugs would detected without relying on userspace to
> > > > > enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
> > > > > time and was only found when mark_page_dirty_in_slot() started
> > > > > WARNing.
> > > > > 
> > > > > I'm ok if arm64 wants to let userspace shoot itself in the foot with
> > > > > the ITS, but I'm not ok dropping the protections in the common
> > > > > mark_page_dirty_in_slot().
> > > > > 
> > > > > One somewhat gross idea would be to let architectures override the
> > > > > "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
> > > > > in kvm->arch in its kvm_write_guest_lock() to note that an expected
> > > > > write without a vCPU is in-progress:
> > > > > 
> > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > > index 8c5c69ba47a7..d1da8914f749 100644
> > > > > --- a/virt/kvm/kvm_main.c
> > > > > +++ b/virt/kvm/kvm_main.c
> > > > > @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> > > > >          struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> > > > >   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> > > > > -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> > > > > +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
> > > > > +               return;
> > > > > +
> > > > > +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> > > > >                  return;
> > > > >   #endif
> > > > > @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> > > > >                  unsigned long rel_gfn = gfn - memslot->base_gfn;
> > > > >                  u32 slot = (memslot->as_id << 16) | memslot->id;
> > > > > -               if (kvm->dirty_ring_size)
> > > > > +               if (kvm->dirty_ring_size && vcpu)
> > > > >                          kvm_dirty_ring_push(&vcpu->dirty_ring,
> > > > >                                              slot, rel_gfn);
> > > > > -               else
> > > > > +               else if (memslot->dirty_bitmap)
> > > > >                          set_bit_le(rel_gfn, memslot->dirty_bitmap);
> > > > >          }
> > > > >   }

...

> > > A slightly different alternative would be have a completely separate
> > > API for writing guest memory without an associated vCPU.  I.e. start
> > > building up proper device emulation support.  Then the vCPU-based
> > > APIs could yell if a vCPU isn't provided (or there is no running
> > > vCPU in the current mess).  And the deviced-based API could be
> > > provided if and only if the architecture actually supports emulating
> > > writes from devices, i.e. x86 would not opt-in and so would even
> > > have access to the API.
> > 
> > Which is what I was putting under the "major surgery" label in my
> > previous email.
> > 
> > Anyhow, for the purpose of unblocking Gavin's series, I suggest to
> > adopt your per-arch opt-out suggestion as a stop gap measure, and we
> > will then be able to bike-shed for weeks on what the shape of the
> > device-originated memory dirtying API should be.
> > 
> 
> It's really a 'major surgery' and I would like to make sure I fully understand
> 'a completely separate API for writing guest memory without an associated vCPU",
> before I'm going to working on v7 for this.
>
> There are 7 functions and 2 macros involved as below. I assume Sean is suggesting
> to add another argument, whose name can be 'has_vcpu', for these functions and macros?

No.

As March suggested, for your series just implement the hacky arch opt-out, don't
try and do surgery at this time as that's likely going to be a months-long effort
that touches a lot of cross-arch code.

E.g. I believe the ARM opt-out (opt-in?) for the above hack would be

bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
{
	return vgic_has_its(kvm);
}

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-28 16:51                             ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2022-10-28 16:51 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, kvm, Marc Zyngier, bgardon, andrew.jones, dmatlack,
	shan.gavin, catalin.marinas, kvmarm, pbonzini, zhenyzha, will,
	kvmarm

On Fri, Oct 28, 2022, Gavin Shan wrote:
> Hi Sean and Marc,
> 
> On 10/28/22 2:30 AM, Marc Zyngier wrote:
> > On Thu, 27 Oct 2022 18:44:51 +0100,
> > Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > On Thu, Oct 27, 2022, Marc Zyngier wrote:
> > > > On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:
> 
> [...]
> > > 
> > > > > And ideally such bugs would detected without relying on userspace to
> > > > > enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
> > > > > time and was only found when mark_page_dirty_in_slot() started
> > > > > WARNing.
> > > > > 
> > > > > I'm ok if arm64 wants to let userspace shoot itself in the foot with
> > > > > the ITS, but I'm not ok dropping the protections in the common
> > > > > mark_page_dirty_in_slot().
> > > > > 
> > > > > One somewhat gross idea would be to let architectures override the
> > > > > "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
> > > > > in kvm->arch in its kvm_write_guest_lock() to note that an expected
> > > > > write without a vCPU is in-progress:
> > > > > 
> > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > > index 8c5c69ba47a7..d1da8914f749 100644
> > > > > --- a/virt/kvm/kvm_main.c
> > > > > +++ b/virt/kvm/kvm_main.c
> > > > > @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> > > > >          struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> > > > >   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> > > > > -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> > > > > +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
> > > > > +               return;
> > > > > +
> > > > > +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> > > > >                  return;
> > > > >   #endif
> > > > > @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> > > > >                  unsigned long rel_gfn = gfn - memslot->base_gfn;
> > > > >                  u32 slot = (memslot->as_id << 16) | memslot->id;
> > > > > -               if (kvm->dirty_ring_size)
> > > > > +               if (kvm->dirty_ring_size && vcpu)
> > > > >                          kvm_dirty_ring_push(&vcpu->dirty_ring,
> > > > >                                              slot, rel_gfn);
> > > > > -               else
> > > > > +               else if (memslot->dirty_bitmap)
> > > > >                          set_bit_le(rel_gfn, memslot->dirty_bitmap);
> > > > >          }
> > > > >   }

...

> > > A slightly different alternative would be have a completely separate
> > > API for writing guest memory without an associated vCPU.  I.e. start
> > > building up proper device emulation support.  Then the vCPU-based
> > > APIs could yell if a vCPU isn't provided (or there is no running
> > > vCPU in the current mess).  And the deviced-based API could be
> > > provided if and only if the architecture actually supports emulating
> > > writes from devices, i.e. x86 would not opt-in and so would even
> > > have access to the API.
> > 
> > Which is what I was putting under the "major surgery" label in my
> > previous email.
> > 
> > Anyhow, for the purpose of unblocking Gavin's series, I suggest to
> > adopt your per-arch opt-out suggestion as a stop gap measure, and we
> > will then be able to bike-shed for weeks on what the shape of the
> > device-originated memory dirtying API should be.
> > 
> 
> It's really a 'major surgery' and I would like to make sure I fully understand
> 'a completely separate API for writing guest memory without an associated vCPU",
> before I'm going to working on v7 for this.
>
> There are 7 functions and 2 macros involved as below. I assume Sean is suggesting
> to add another argument, whose name can be 'has_vcpu', for these functions and macros?

No.

As March suggested, for your series just implement the hacky arch opt-out, don't
try and do surgery at this time as that's likely going to be a months-long effort
that touches a lot of cross-arch code.

E.g. I believe the ARM opt-out (opt-in?) for the above hack would be

bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
{
	return vgic_has_its(kvm);
}
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-28 16:51                             ` Sean Christopherson
@ 2022-10-31  3:37                               ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-31  3:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, Marc Zyngier, bgardon, andrew.jones, dmatlack,
	shan.gavin, catalin.marinas, kvmarm, pbonzini, zhenyzha, will,
	kvmarm

Hi Sean,

On 10/29/22 12:51 AM, Sean Christopherson wrote:
> On Fri, Oct 28, 2022, Gavin Shan wrote:
>> On 10/28/22 2:30 AM, Marc Zyngier wrote:
>>> On Thu, 27 Oct 2022 18:44:51 +0100,
>>> Sean Christopherson <seanjc@google.com> wrote:
>>>>
>>>> On Thu, Oct 27, 2022, Marc Zyngier wrote:
>>>>> On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:
>>
>> [...]
>>>>
>>>>>> And ideally such bugs would detected without relying on userspace to
>>>>>> enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
>>>>>> time and was only found when mark_page_dirty_in_slot() started
>>>>>> WARNing.
>>>>>>
>>>>>> I'm ok if arm64 wants to let userspace shoot itself in the foot with
>>>>>> the ITS, but I'm not ok dropping the protections in the common
>>>>>> mark_page_dirty_in_slot().
>>>>>>
>>>>>> One somewhat gross idea would be to let architectures override the
>>>>>> "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
>>>>>> in kvm->arch in its kvm_write_guest_lock() to note that an expected
>>>>>> write without a vCPU is in-progress:
>>>>>>
>>>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>>>> index 8c5c69ba47a7..d1da8914f749 100644
>>>>>> --- a/virt/kvm/kvm_main.c
>>>>>> +++ b/virt/kvm/kvm_main.c
>>>>>> @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>>>>>           struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>>>>>    #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>>>>>> -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>>>>>> +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
>>>>>> +               return;
>>>>>> +
>>>>>> +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>>>>>>                   return;
>>>>>>    #endif
>>>>>> @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>>>>>                   unsigned long rel_gfn = gfn - memslot->base_gfn;
>>>>>>                   u32 slot = (memslot->as_id << 16) | memslot->id;
>>>>>> -               if (kvm->dirty_ring_size)
>>>>>> +               if (kvm->dirty_ring_size && vcpu)
>>>>>>                           kvm_dirty_ring_push(&vcpu->dirty_ring,
>>>>>>                                               slot, rel_gfn);
>>>>>> -               else
>>>>>> +               else if (memslot->dirty_bitmap)
>>>>>>                           set_bit_le(rel_gfn, memslot->dirty_bitmap);
>>>>>>           }
>>>>>>    }
> 
> ...
> 
>>>> A slightly different alternative would be have a completely separate
>>>> API for writing guest memory without an associated vCPU.  I.e. start
>>>> building up proper device emulation support.  Then the vCPU-based
>>>> APIs could yell if a vCPU isn't provided (or there is no running
>>>> vCPU in the current mess).  And the deviced-based API could be
>>>> provided if and only if the architecture actually supports emulating
>>>> writes from devices, i.e. x86 would not opt-in and so would even
>>>> have access to the API.
>>>
>>> Which is what I was putting under the "major surgery" label in my
>>> previous email.
>>>
>>> Anyhow, for the purpose of unblocking Gavin's series, I suggest to
>>> adopt your per-arch opt-out suggestion as a stop gap measure, and we
>>> will then be able to bike-shed for weeks on what the shape of the
>>> device-originated memory dirtying API should be.
>>>
>>
>> It's really a 'major surgery' and I would like to make sure I fully understand
>> 'a completely separate API for writing guest memory without an associated vCPU",
>> before I'm going to working on v7 for this.
>>
>> There are 7 functions and 2 macros involved as below. I assume Sean is suggesting
>> to add another argument, whose name can be 'has_vcpu', for these functions and macros?
> 
> No.
> 
> As March suggested, for your series just implement the hacky arch opt-out, don't
> try and do surgery at this time as that's likely going to be a months-long effort
> that touches a lot of cross-arch code.
> 
> E.g. I believe the ARM opt-out (opt-in?) for the above hack would be
> 
> bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> {
> 	return vgic_has_its(kvm);
> }
> 

Ok, Thanks for your confirm. v7 was just posted to address comments from Marc,
Peter, Oliver and you. Please help to review when you get a chance.

https://lore.kernel.org/kvmarm/20221031003621.164306-1-gshan@redhat.com/T/#t

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-31  3:37                               ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-31  3:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, Oliver Upton, kvmarm, kvmarm, kvm, peterx, will,
	catalin.marinas, bgardon, shuah, andrew.jones, dmatlack,
	pbonzini, zhenyzha, james.morse, suzuki.poulose,
	alexandru.elisei, shan.gavin

Hi Sean,

On 10/29/22 12:51 AM, Sean Christopherson wrote:
> On Fri, Oct 28, 2022, Gavin Shan wrote:
>> On 10/28/22 2:30 AM, Marc Zyngier wrote:
>>> On Thu, 27 Oct 2022 18:44:51 +0100,
>>> Sean Christopherson <seanjc@google.com> wrote:
>>>>
>>>> On Thu, Oct 27, 2022, Marc Zyngier wrote:
>>>>> On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:
>>
>> [...]
>>>>
>>>>>> And ideally such bugs would detected without relying on userspace to
>>>>>> enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
>>>>>> time and was only found when mark_page_dirty_in_slot() started
>>>>>> WARNing.
>>>>>>
>>>>>> I'm ok if arm64 wants to let userspace shoot itself in the foot with
>>>>>> the ITS, but I'm not ok dropping the protections in the common
>>>>>> mark_page_dirty_in_slot().
>>>>>>
>>>>>> One somewhat gross idea would be to let architectures override the
>>>>>> "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
>>>>>> in kvm->arch in its kvm_write_guest_lock() to note that an expected
>>>>>> write without a vCPU is in-progress:
>>>>>>
>>>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>>>> index 8c5c69ba47a7..d1da8914f749 100644
>>>>>> --- a/virt/kvm/kvm_main.c
>>>>>> +++ b/virt/kvm/kvm_main.c
>>>>>> @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>>>>>           struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>>>>>    #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>>>>>> -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>>>>>> +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
>>>>>> +               return;
>>>>>> +
>>>>>> +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>>>>>>                   return;
>>>>>>    #endif
>>>>>> @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>>>>>                   unsigned long rel_gfn = gfn - memslot->base_gfn;
>>>>>>                   u32 slot = (memslot->as_id << 16) | memslot->id;
>>>>>> -               if (kvm->dirty_ring_size)
>>>>>> +               if (kvm->dirty_ring_size && vcpu)
>>>>>>                           kvm_dirty_ring_push(&vcpu->dirty_ring,
>>>>>>                                               slot, rel_gfn);
>>>>>> -               else
>>>>>> +               else if (memslot->dirty_bitmap)
>>>>>>                           set_bit_le(rel_gfn, memslot->dirty_bitmap);
>>>>>>           }
>>>>>>    }
> 
> ...
> 
>>>> A slightly different alternative would be have a completely separate
>>>> API for writing guest memory without an associated vCPU.  I.e. start
>>>> building up proper device emulation support.  Then the vCPU-based
>>>> APIs could yell if a vCPU isn't provided (or there is no running
>>>> vCPU in the current mess).  And the deviced-based API could be
>>>> provided if and only if the architecture actually supports emulating
>>>> writes from devices, i.e. x86 would not opt-in and so would even
>>>> have access to the API.
>>>
>>> Which is what I was putting under the "major surgery" label in my
>>> previous email.
>>>
>>> Anyhow, for the purpose of unblocking Gavin's series, I suggest to
>>> adopt your per-arch opt-out suggestion as a stop gap measure, and we
>>> will then be able to bike-shed for weeks on what the shape of the
>>> device-originated memory dirtying API should be.
>>>
>>
>> It's really a 'major surgery' and I would like to make sure I fully understand
>> 'a completely separate API for writing guest memory without an associated vCPU",
>> before I'm going to working on v7 for this.
>>
>> There are 7 functions and 2 macros involved as below. I assume Sean is suggesting
>> to add another argument, whose name can be 'has_vcpu', for these functions and macros?
> 
> No.
> 
> As March suggested, for your series just implement the hacky arch opt-out, don't
> try and do surgery at this time as that's likely going to be a months-long effort
> that touches a lot of cross-arch code.
> 
> E.g. I believe the ARM opt-out (opt-in?) for the above hack would be
> 
> bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> {
> 	return vgic_has_its(kvm);
> }
> 

Ok, Thanks for your confirm. v7 was just posted to address comments from Marc,
Peter, Oliver and you. Please help to review when you get a chance.

https://lore.kernel.org/kvmarm/20221031003621.164306-1-gshan@redhat.com/T/#t

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-28 16:51                             ` Sean Christopherson
@ 2022-10-31  9:08                               ` Marc Zyngier
  -1 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-31  9:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Gavin Shan, Oliver Upton, kvmarm, kvmarm, kvm, peterx, will,
	catalin.marinas, bgardon, shuah, andrew.jones, dmatlack,
	pbonzini, zhenyzha, james.morse, suzuki.poulose,
	alexandru.elisei, shan.gavin

On 2022-10-28 17:51, Sean Christopherson wrote:
> On Fri, Oct 28, 2022, Gavin Shan wrote:
>> Hi Sean and Marc,
>> 
>> On 10/28/22 2:30 AM, Marc Zyngier wrote:
>> > On Thu, 27 Oct 2022 18:44:51 +0100,
>> > Sean Christopherson <seanjc@google.com> wrote:
>> > >
>> > > On Thu, Oct 27, 2022, Marc Zyngier wrote:
>> > > > On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:
>> 
>> [...]
>> > >
>> > > > > And ideally such bugs would detected without relying on userspace to
>> > > > > enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
>> > > > > time and was only found when mark_page_dirty_in_slot() started
>> > > > > WARNing.
>> > > > >
>> > > > > I'm ok if arm64 wants to let userspace shoot itself in the foot with
>> > > > > the ITS, but I'm not ok dropping the protections in the common
>> > > > > mark_page_dirty_in_slot().
>> > > > >
>> > > > > One somewhat gross idea would be to let architectures override the
>> > > > > "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
>> > > > > in kvm->arch in its kvm_write_guest_lock() to note that an expected
>> > > > > write without a vCPU is in-progress:
>> > > > >
>> > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> > > > > index 8c5c69ba47a7..d1da8914f749 100644
>> > > > > --- a/virt/kvm/kvm_main.c
>> > > > > +++ b/virt/kvm/kvm_main.c
>> > > > > @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>> > > > >          struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>> > > > >   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>> > > > > -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>> > > > > +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
>> > > > > +               return;
>> > > > > +
>> > > > > +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>> > > > >                  return;
>> > > > >   #endif
>> > > > > @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>> > > > >                  unsigned long rel_gfn = gfn - memslot->base_gfn;
>> > > > >                  u32 slot = (memslot->as_id << 16) | memslot->id;
>> > > > > -               if (kvm->dirty_ring_size)
>> > > > > +               if (kvm->dirty_ring_size && vcpu)
>> > > > >                          kvm_dirty_ring_push(&vcpu->dirty_ring,
>> > > > >                                              slot, rel_gfn);
>> > > > > -               else
>> > > > > +               else if (memslot->dirty_bitmap)
>> > > > >                          set_bit_le(rel_gfn, memslot->dirty_bitmap);
>> > > > >          }
>> > > > >   }
> 
> ...
> 
>> > > A slightly different alternative would be have a completely separate
>> > > API for writing guest memory without an associated vCPU.  I.e. start
>> > > building up proper device emulation support.  Then the vCPU-based
>> > > APIs could yell if a vCPU isn't provided (or there is no running
>> > > vCPU in the current mess).  And the deviced-based API could be
>> > > provided if and only if the architecture actually supports emulating
>> > > writes from devices, i.e. x86 would not opt-in and so would even
>> > > have access to the API.
>> >
>> > Which is what I was putting under the "major surgery" label in my
>> > previous email.
>> >
>> > Anyhow, for the purpose of unblocking Gavin's series, I suggest to
>> > adopt your per-arch opt-out suggestion as a stop gap measure, and we
>> > will then be able to bike-shed for weeks on what the shape of the
>> > device-originated memory dirtying API should be.
>> >
>> 
>> It's really a 'major surgery' and I would like to make sure I fully 
>> understand
>> 'a completely separate API for writing guest memory without an 
>> associated vCPU",
>> before I'm going to working on v7 for this.
>> 
>> There are 7 functions and 2 macros involved as below. I assume Sean is 
>> suggesting
>> to add another argument, whose name can be 'has_vcpu', for these 
>> functions and macros?
> 
> No.
> 
> As March suggested, for your series just implement the hacky arch 
> opt-out, don't

Please call me April.

> try and do surgery at this time as that's likely going to be a
> months-long effort
> that touches a lot of cross-arch code.
> 
> E.g. I believe the ARM opt-out (opt-in?) for the above hack would be
> 
> bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> {
> 	return vgic_has_its(kvm);
> }

Although that will probably lead to the expected effect,
this helper should only return true when the ITS is actively
dumped.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-31  9:08                               ` Marc Zyngier
  0 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2022-10-31  9:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On 2022-10-28 17:51, Sean Christopherson wrote:
> On Fri, Oct 28, 2022, Gavin Shan wrote:
>> Hi Sean and Marc,
>> 
>> On 10/28/22 2:30 AM, Marc Zyngier wrote:
>> > On Thu, 27 Oct 2022 18:44:51 +0100,
>> > Sean Christopherson <seanjc@google.com> wrote:
>> > >
>> > > On Thu, Oct 27, 2022, Marc Zyngier wrote:
>> > > > On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:
>> 
>> [...]
>> > >
>> > > > > And ideally such bugs would detected without relying on userspace to
>> > > > > enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
>> > > > > time and was only found when mark_page_dirty_in_slot() started
>> > > > > WARNing.
>> > > > >
>> > > > > I'm ok if arm64 wants to let userspace shoot itself in the foot with
>> > > > > the ITS, but I'm not ok dropping the protections in the common
>> > > > > mark_page_dirty_in_slot().
>> > > > >
>> > > > > One somewhat gross idea would be to let architectures override the
>> > > > > "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
>> > > > > in kvm->arch in its kvm_write_guest_lock() to note that an expected
>> > > > > write without a vCPU is in-progress:
>> > > > >
>> > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> > > > > index 8c5c69ba47a7..d1da8914f749 100644
>> > > > > --- a/virt/kvm/kvm_main.c
>> > > > > +++ b/virt/kvm/kvm_main.c
>> > > > > @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>> > > > >          struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>> > > > >   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>> > > > > -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>> > > > > +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
>> > > > > +               return;
>> > > > > +
>> > > > > +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>> > > > >                  return;
>> > > > >   #endif
>> > > > > @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>> > > > >                  unsigned long rel_gfn = gfn - memslot->base_gfn;
>> > > > >                  u32 slot = (memslot->as_id << 16) | memslot->id;
>> > > > > -               if (kvm->dirty_ring_size)
>> > > > > +               if (kvm->dirty_ring_size && vcpu)
>> > > > >                          kvm_dirty_ring_push(&vcpu->dirty_ring,
>> > > > >                                              slot, rel_gfn);
>> > > > > -               else
>> > > > > +               else if (memslot->dirty_bitmap)
>> > > > >                          set_bit_le(rel_gfn, memslot->dirty_bitmap);
>> > > > >          }
>> > > > >   }
> 
> ...
> 
>> > > A slightly different alternative would be have a completely separate
>> > > API for writing guest memory without an associated vCPU.  I.e. start
>> > > building up proper device emulation support.  Then the vCPU-based
>> > > APIs could yell if a vCPU isn't provided (or there is no running
>> > > vCPU in the current mess).  And the deviced-based API could be
>> > > provided if and only if the architecture actually supports emulating
>> > > writes from devices, i.e. x86 would not opt-in and so would even
>> > > have access to the API.
>> >
>> > Which is what I was putting under the "major surgery" label in my
>> > previous email.
>> >
>> > Anyhow, for the purpose of unblocking Gavin's series, I suggest to
>> > adopt your per-arch opt-out suggestion as a stop gap measure, and we
>> > will then be able to bike-shed for weeks on what the shape of the
>> > device-originated memory dirtying API should be.
>> >
>> 
>> It's really a 'major surgery' and I would like to make sure I fully 
>> understand
>> 'a completely separate API for writing guest memory without an 
>> associated vCPU",
>> before I'm going to working on v7 for this.
>> 
>> There are 7 functions and 2 macros involved as below. I assume Sean is 
>> suggesting
>> to add another argument, whose name can be 'has_vcpu', for these 
>> functions and macros?
> 
> No.
> 
> As March suggested, for your series just implement the hacky arch 
> opt-out, don't

Please call me April.

> try and do surgery at this time as that's likely going to be a
> months-long effort
> that touches a lot of cross-arch code.
> 
> E.g. I believe the ARM opt-out (opt-in?) for the above hack would be
> 
> bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> {
> 	return vgic_has_its(kvm);
> }

Although that will probably lead to the expected effect,
this helper should only return true when the ITS is actively
dumped.

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
  2022-10-31  9:08                               ` Marc Zyngier
@ 2022-10-31 22:48                                 ` Gavin Shan
  -1 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-31 22:48 UTC (permalink / raw)
  To: Marc Zyngier, Sean Christopherson
  Cc: Oliver Upton, kvmarm, kvmarm, kvm, peterx, will, catalin.marinas,
	bgardon, shuah, andrew.jones, dmatlack, pbonzini, zhenyzha,
	james.morse, suzuki.poulose, alexandru.elisei, shan.gavin

On 10/31/22 5:08 PM, Marc Zyngier wrote:
> On 2022-10-28 17:51, Sean Christopherson wrote:
>> On Fri, Oct 28, 2022, Gavin Shan wrote:
>>> On 10/28/22 2:30 AM, Marc Zyngier wrote:
>>> > On Thu, 27 Oct 2022 18:44:51 +0100,
>>> > > On Thu, Oct 27, 2022, Marc Zyngier wrote:
>>> > > > On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:

[...]

>>>
>>> It's really a 'major surgery' and I would like to make sure I fully understand
>>> 'a completely separate API for writing guest memory without an associated vCPU",
>>> before I'm going to working on v7 for this.
>>>
>>> There are 7 functions and 2 macros involved as below. I assume Sean is suggesting
>>> to add another argument, whose name can be 'has_vcpu', for these functions and macros?
>>
>> No.
>>
>> As March suggested, for your series just implement the hacky arch opt-out, don't
> 
> Please call me April.
> 
>> try and do surgery at this time as that's likely going to be a
>> months-long effort
>> that touches a lot of cross-arch code.
>>
>> E.g. I believe the ARM opt-out (opt-in?) for the above hack would be
>>
>> bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>> {
>>     return vgic_has_its(kvm);
>> }
> 
> Although that will probably lead to the expected effect,
> this helper should only return true when the ITS is actively
> dumped.
> 

Thanks, Marc. It makes sense to return true only when vgic/its tables
are being saved. Lets have more discussion in PATCH[v7 5/9] since Oliver
has other concerns there :)

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap
@ 2022-10-31 22:48                                 ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2022-10-31 22:48 UTC (permalink / raw)
  To: Marc Zyngier, Sean Christopherson
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm

On 10/31/22 5:08 PM, Marc Zyngier wrote:
> On 2022-10-28 17:51, Sean Christopherson wrote:
>> On Fri, Oct 28, 2022, Gavin Shan wrote:
>>> On 10/28/22 2:30 AM, Marc Zyngier wrote:
>>> > On Thu, 27 Oct 2022 18:44:51 +0100,
>>> > > On Thu, Oct 27, 2022, Marc Zyngier wrote:
>>> > > > On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@google.com> wrote:

[...]

>>>
>>> It's really a 'major surgery' and I would like to make sure I fully understand
>>> 'a completely separate API for writing guest memory without an associated vCPU",
>>> before I'm going to working on v7 for this.
>>>
>>> There are 7 functions and 2 macros involved as below. I assume Sean is suggesting
>>> to add another argument, whose name can be 'has_vcpu', for these functions and macros?
>>
>> No.
>>
>> As March suggested, for your series just implement the hacky arch opt-out, don't
> 
> Please call me April.
> 
>> try and do surgery at this time as that's likely going to be a
>> months-long effort
>> that touches a lot of cross-arch code.
>>
>> E.g. I believe the ARM opt-out (opt-in?) for the above hack would be
>>
>> bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>> {
>>     return vgic_has_its(kvm);
>> }
> 
> Although that will probably lead to the expected effect,
> this helper should only return true when the ITS is actively
> dumped.
> 

Thanks, Marc. It makes sense to return true only when vgic/its tables
are being saved. Lets have more discussion in PATCH[v7 5/9] since Oliver
has other concerns there :)

Thanks,
Gavin


_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2022-10-31 22:51 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-11  6:14 [PATCH v6 0/8] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-11  6:14 ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-20 22:42   ` Sean Christopherson
2022-10-20 22:42     ` Sean Christopherson
2022-10-21  5:54     ` Gavin Shan
2022-10-21  5:54       ` Gavin Shan
2022-10-21 15:25       ` Sean Christopherson
2022-10-21 15:25         ` Sean Christopherson
2022-10-21 23:03         ` Gavin Shan
2022-10-21 23:03           ` Gavin Shan
2022-10-21 23:48           ` Sean Christopherson
2022-10-21 23:48             ` Sean Christopherson
2022-10-22  0:16             ` Gavin Shan
2022-10-22  0:16               ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 2/8] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-18 16:07   ` Peter Xu
2022-10-18 16:07     ` Peter Xu
2022-10-18 22:20     ` Gavin Shan
2022-10-18 22:20       ` Gavin Shan
2022-10-20 18:58       ` Oliver Upton
2022-10-20 18:58         ` Oliver Upton
2022-10-20 23:44   ` Sean Christopherson
2022-10-20 23:44     ` Sean Christopherson
2022-10-21  8:06     ` Marc Zyngier
2022-10-21  8:06       ` Marc Zyngier
2022-10-21 16:05       ` Sean Christopherson
2022-10-21 16:05         ` Sean Christopherson
2022-10-22  8:27         ` Gavin Shan
2022-10-22  8:27           ` Gavin Shan
2022-10-22 10:54           ` Marc Zyngier
2022-10-22 10:54             ` Marc Zyngier
2022-10-22 10:33         ` Marc Zyngier
2022-10-22 10:33           ` Marc Zyngier
2022-10-24 23:50           ` Sean Christopherson
2022-10-24 23:50             ` Sean Christopherson
2022-10-25  0:08             ` Sean Christopherson
2022-10-25  0:08               ` Sean Christopherson
2022-10-25  0:24             ` Oliver Upton
2022-10-25  0:24               ` Oliver Upton
2022-10-25  7:31               ` Marc Zyngier
2022-10-25  7:31                 ` Marc Zyngier
2022-10-25 17:47                 ` Sean Christopherson
2022-10-25 17:47                   ` Sean Christopherson
2022-10-27  8:29                   ` Marc Zyngier
2022-10-27  8:29                     ` Marc Zyngier
2022-10-27 17:44                     ` Sean Christopherson
2022-10-27 17:44                       ` Sean Christopherson
2022-10-27 18:30                       ` Marc Zyngier
2022-10-27 18:30                         ` Marc Zyngier
2022-10-27 19:09                         ` Sean Christopherson
2022-10-27 19:09                           ` Sean Christopherson
2022-10-28  6:43                         ` Gavin Shan
2022-10-28  6:43                           ` Gavin Shan
2022-10-28 16:51                           ` Sean Christopherson
2022-10-28 16:51                             ` Sean Christopherson
2022-10-31  3:37                             ` Gavin Shan
2022-10-31  3:37                               ` Gavin Shan
2022-10-31  9:08                             ` Marc Zyngier
2022-10-31  9:08                               ` Marc Zyngier
2022-10-31 22:48                               ` Gavin Shan
2022-10-31 22:48                                 ` Gavin Shan
2022-10-25  7:22             ` Marc Zyngier
2022-10-25  7:22               ` Marc Zyngier
2022-10-21 10:13     ` Gavin Shan
2022-10-21 10:13       ` Gavin Shan
2022-10-21 23:20       ` Sean Christopherson
2022-10-21 23:20         ` Sean Christopherson
2022-10-22  0:33         ` Gavin Shan
2022-10-22  0:33           ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 4/8] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 5/8] KVM: selftests: Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP if possible Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 6/8] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 7/8] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:14 ` [PATCH v6 8/8] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-10-11  6:14   ` Gavin Shan
2022-10-11  6:23 ` [PATCH v6 0/8] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-11  6:23   ` Gavin Shan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.