kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/7] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-10-05  0:41 Gavin Shan
  2022-10-05  0:41 ` Gavin Shan
                   ` (7 more replies)
  0 siblings, 8 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, catalin.marinas, andrew.jones, will, shan.gavin, bgardon,
	dmatlack, pbonzini, zhenyzha, shuah, kvmarm

This series enables the ring-based dirty memory tracking for ARM64.
The feature has been available and enabled on x86 for a while. It
is beneficial when the number of dirty pages is small in a checkpointing
system or live migration scenario. More details can be found from
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").

This series is applied on top of Marc's v2 series [0], fixing dirty-ring
ordering issue. This series is going to land on v6.1.rc0 pretty soon.

[0] https://lore.kernel.org/kvmarm/20220926145120.27974-1-maz@kernel.org

v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com

Testing
=======
(1) kvm/selftests/dirty_log_test
(2) Live migration by QEMU

Changelog
=========
v5:
  * Drop empty stub kvm_dirty_ring_check_request()             (Marc/Peter)
  * Add PATCH[v5 3/7] to allow using bitmap, indicated by
    KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP                        (Marc/Peter)
v4:
  * Commit log improvement                                     (Marc)
  * Add helper kvm_dirty_ring_check_request()                  (Marc)
  * Drop ifdef for kvm_cpu_dirty_log_size()                    (Marc)
v3:
  * Check KVM_REQ_RING_SOFT_RULL inside kvm_request_pending()  (Peter)
  * Move declaration of kvm_cpu_dirty_log_size()               (test-robot)
v2:
  * Introduce KVM_REQ_RING_SOFT_FULL                           (Marc)
  * Changelog improvement                                      (Marc)
  * Fix dirty_log_test without knowing host page size          (Drew)

Gavin Shan (7):
  KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to
    kvm_dirty_ring.h
  KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  KVM: arm64: Enable ring-based dirty memory tracking
  KVM: selftests: Use host page size to map ring buffer in
    dirty_log_test
  KVM: selftests: Clear dirty ring states between two modes in
    dirty_log_test
  KVM: selftests: Automate choosing dirty ring size in dirty_log_test

 Documentation/virt/kvm/api.rst               | 19 ++++---
 arch/arm64/include/uapi/asm/kvm.h            |  1 +
 arch/arm64/kvm/Kconfig                       |  1 +
 arch/arm64/kvm/arm.c                         |  3 ++
 arch/x86/include/asm/kvm_host.h              |  2 -
 arch/x86/kvm/x86.c                           | 15 +++---
 include/linux/kvm_dirty_ring.h               |  9 ++--
 include/linux/kvm_host.h                     |  1 +
 include/uapi/linux/kvm.h                     |  1 +
 tools/testing/selftests/kvm/dirty_log_test.c | 53 ++++++++++++++------
 tools/testing/selftests/kvm/lib/kvm_util.c   |  2 +-
 virt/kvm/dirty_ring.c                        | 19 ++++++-
 virt/kvm/kvm_main.c                          | 24 ++++-----
 13 files changed, 92 insertions(+), 58 deletions(-)

-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v5 0/7] KVM: arm64: Enable ring-based dirty memory tracking
  2022-10-05  0:41 [PATCH v5 0/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
@ 2022-10-05  0:41 ` Gavin Shan
  2022-10-05  0:41 ` [PATCH v5 1/7] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL Gavin Shan
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

This series enables the ring-based dirty memory tracking for ARM64.
The feature has been available and enabled on x86 for a while. It
is beneficial when the number of dirty pages is small in a checkpointing
system or live migration scenario. More details can be found from
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").

This series is applied on top of Marc's v2 series [0], fixing dirty-ring
ordering issue. This series is going to land on v6.1.rc0 pretty soon.

[0] https://lore.kernel.org/kvmarm/20220926145120.27974-1-maz@kernel.org

v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com

Testing
=======
(1) kvm/selftests/dirty_log_test
(2) Live migration by QEMU

Changelog
=========
v5:
  * Drop empty stub kvm_dirty_ring_check_request()             (Marc/Peter)
  * Add PATCH[v5 3/7] to allow using bitmap, indicated by
    KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP                        (Marc/Peter)
v4:
  * Commit log improvement                                     (Marc)
  * Add helper kvm_dirty_ring_check_request()                  (Marc)
  * Drop ifdef for kvm_cpu_dirty_log_size()                    (Marc)
v3:
  * Check KVM_REQ_RING_SOFT_RULL inside kvm_request_pending()  (Peter)
  * Move declaration of kvm_cpu_dirty_log_size()               (test-robot)
v2:
  * Introduce KVM_REQ_RING_SOFT_FULL                           (Marc)
  * Changelog improvement                                      (Marc)
  * Fix dirty_log_test without knowing host page size          (Drew)

Gavin Shan (7):
  KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to
    kvm_dirty_ring.h
  KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  KVM: arm64: Enable ring-based dirty memory tracking
  KVM: selftests: Use host page size to map ring buffer in
    dirty_log_test
  KVM: selftests: Clear dirty ring states between two modes in
    dirty_log_test
  KVM: selftests: Automate choosing dirty ring size in dirty_log_test

 Documentation/virt/kvm/api.rst               | 19 ++++---
 arch/arm64/include/uapi/asm/kvm.h            |  1 +
 arch/arm64/kvm/Kconfig                       |  1 +
 arch/arm64/kvm/arm.c                         |  3 ++
 arch/x86/include/asm/kvm_host.h              |  2 -
 arch/x86/kvm/x86.c                           | 15 +++---
 include/linux/kvm_dirty_ring.h               |  9 ++--
 include/linux/kvm_host.h                     |  1 +
 include/uapi/linux/kvm.h                     |  1 +
 tools/testing/selftests/kvm/dirty_log_test.c | 53 ++++++++++++++------
 tools/testing/selftests/kvm/lib/kvm_util.c   |  2 +-
 virt/kvm/dirty_ring.c                        | 19 ++++++-
 virt/kvm/kvm_main.c                          | 24 ++++-----
 13 files changed, 92 insertions(+), 58 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v5 1/7] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  2022-10-05  0:41 [PATCH v5 0/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
  2022-10-05  0:41 ` Gavin Shan
@ 2022-10-05  0:41 ` Gavin Shan
  2022-10-05  0:41   ` Gavin Shan
  2022-10-05  0:41 ` [PATCH v5 2/7] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, catalin.marinas, andrew.jones, will, shan.gavin, bgardon,
	dmatlack, pbonzini, zhenyzha, shuah, kvmarm

This adds KVM_REQ_RING_SOFT_FULL, which is raised when the dirty
ring of the specific VCPU becomes softly full in kvm_dirty_ring_push().
The VCPU is enforced to exit when the request is raised and its
dirty ring is softly full on its entrance.

The event is checked and handled in the newly introduced helper
kvm_dirty_ring_check_request(). With this, kvm_dirty_ring_soft_full()
becomes a private function.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/x86.c             | 15 ++++++---------
 include/linux/kvm_dirty_ring.h |  8 ++------
 include/linux/kvm_host.h       |  1 +
 virt/kvm/dirty_ring.c          | 19 ++++++++++++++++++-
 4 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b0c47b41c264..0dd0d32073e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10260,16 +10260,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
-	/* Forbid vmenter if vcpu dirty ring is soft-full */
-	if (unlikely(vcpu->kvm->dirty_ring_size &&
-		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
-		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
-		trace_kvm_dirty_ring_exit(vcpu);
-		r = 0;
-		goto out;
-	}
-
 	if (kvm_request_pending(vcpu)) {
+		/* Forbid vmenter if vcpu dirty ring is soft-full */
+		if (kvm_dirty_ring_check_request(vcpu)) {
+			r = 0;
+			goto out;
+		}
+
 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
 			r = -EIO;
 			goto out;
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 906f899813dc..66508afa0b40 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -64,11 +64,6 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 {
 }
 
-static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
-{
-	return true;
-}
-
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 u32 kvm_dirty_ring_get_rsvd_entries(void);
@@ -86,11 +81,12 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
  */
 void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset);
 
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
+
 /* for use in vm_operations_struct */
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset);
 
 void kvm_dirty_ring_free(struct kvm_dirty_ring *ring);
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring);
 
 #endif /* CONFIG_HAVE_KVM_DIRTY_RING */
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4519d3689e1..53fa3134fee0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -157,6 +157,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UNBLOCK           2
 #define KVM_REQ_UNHALT            3
+#define KVM_REQ_RING_SOFT_FULL    4
 #define KVM_REQUEST_ARCH_BASE     8
 
 /*
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index d6fabf238032..f68d75026bc0 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -26,7 +26,7 @@ static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
 }
 
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
+static bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
 {
 	return kvm_dirty_ring_used(ring) >= ring->soft_limit;
 }
@@ -149,6 +149,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
 
 void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 {
+	struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring);
 	struct kvm_dirty_gfn *entry;
 
 	/* It should never get full */
@@ -166,6 +167,22 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 	kvm_dirty_gfn_set_dirtied(entry);
 	ring->dirty_index++;
 	trace_kvm_dirty_ring_push(ring, slot, offset);
+
+	if (kvm_dirty_ring_soft_full(ring))
+		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
+}
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
+{
+	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
+		kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
+		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
+		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
+		trace_kvm_dirty_ring_exit(vcpu);
+		return true;
+	}
+
+	return false;
 }
 
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 1/7] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL
  2022-10-05  0:41 ` [PATCH v5 1/7] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL Gavin Shan
@ 2022-10-05  0:41   ` Gavin Shan
  0 siblings, 0 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

This adds KVM_REQ_RING_SOFT_FULL, which is raised when the dirty
ring of the specific VCPU becomes softly full in kvm_dirty_ring_push().
The VCPU is enforced to exit when the request is raised and its
dirty ring is softly full on its entrance.

The event is checked and handled in the newly introduced helper
kvm_dirty_ring_check_request(). With this, kvm_dirty_ring_soft_full()
becomes a private function.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/x86.c             | 15 ++++++---------
 include/linux/kvm_dirty_ring.h |  8 ++------
 include/linux/kvm_host.h       |  1 +
 virt/kvm/dirty_ring.c          | 19 ++++++++++++++++++-
 4 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b0c47b41c264..0dd0d32073e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10260,16 +10260,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
-	/* Forbid vmenter if vcpu dirty ring is soft-full */
-	if (unlikely(vcpu->kvm->dirty_ring_size &&
-		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
-		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
-		trace_kvm_dirty_ring_exit(vcpu);
-		r = 0;
-		goto out;
-	}
-
 	if (kvm_request_pending(vcpu)) {
+		/* Forbid vmenter if vcpu dirty ring is soft-full */
+		if (kvm_dirty_ring_check_request(vcpu)) {
+			r = 0;
+			goto out;
+		}
+
 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
 			r = -EIO;
 			goto out;
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 906f899813dc..66508afa0b40 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -64,11 +64,6 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 {
 }
 
-static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
-{
-	return true;
-}
-
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 u32 kvm_dirty_ring_get_rsvd_entries(void);
@@ -86,11 +81,12 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
  */
 void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset);
 
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
+
 /* for use in vm_operations_struct */
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset);
 
 void kvm_dirty_ring_free(struct kvm_dirty_ring *ring);
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring);
 
 #endif /* CONFIG_HAVE_KVM_DIRTY_RING */
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4519d3689e1..53fa3134fee0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -157,6 +157,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UNBLOCK           2
 #define KVM_REQ_UNHALT            3
+#define KVM_REQ_RING_SOFT_FULL    4
 #define KVM_REQUEST_ARCH_BASE     8
 
 /*
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index d6fabf238032..f68d75026bc0 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -26,7 +26,7 @@ static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
 }
 
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
+static bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
 {
 	return kvm_dirty_ring_used(ring) >= ring->soft_limit;
 }
@@ -149,6 +149,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
 
 void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 {
+	struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring);
 	struct kvm_dirty_gfn *entry;
 
 	/* It should never get full */
@@ -166,6 +167,22 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 	kvm_dirty_gfn_set_dirtied(entry);
 	ring->dirty_index++;
 	trace_kvm_dirty_ring_push(ring, slot, offset);
+
+	if (kvm_dirty_ring_soft_full(ring))
+		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
+}
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
+{
+	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
+		kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
+		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
+		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
+		trace_kvm_dirty_ring_exit(vcpu);
+		return true;
+	}
+
+	return false;
 }
 
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 2/7] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
  2022-10-05  0:41 [PATCH v5 0/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
  2022-10-05  0:41 ` Gavin Shan
  2022-10-05  0:41 ` [PATCH v5 1/7] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL Gavin Shan
@ 2022-10-05  0:41 ` Gavin Shan
  2022-10-05  0:41   ` Gavin Shan
  2022-10-05  0:41 ` [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking Gavin Shan
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, catalin.marinas, andrew.jones, will, shan.gavin, bgardon,
	dmatlack, pbonzini, zhenyzha, shuah, kvmarm

Not all architectures like ARM64 need to override the function. Move
its declaration to kvm_dirty_ring.h to avoid the following compiling
warning on ARM64 when the feature is enabled.

  arch/arm64/kvm/../../../virt/kvm/dirty_ring.c:14:12:        \
  warning: no previous prototype for 'kvm_cpu_dirty_log_size' \
  [-Wmissing-prototypes]                                      \
  int __weak kvm_cpu_dirty_log_size(void)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 --
 include/linux/kvm_dirty_ring.h  | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa381ab69a19..f11b6a9388b5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2083,8 +2083,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 #define GET_SMSTATE(type, buf, offset)		\
 	(*(type *)((buf) + (offset) - 0x7e00))
 
-int kvm_cpu_dirty_log_size(void);
-
 int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
 
 #define KVM_CLOCK_VALID_FLAGS						\
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 66508afa0b40..fe5982b46424 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -66,6 +66,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
+int kvm_cpu_dirty_log_size(void);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 2/7] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
  2022-10-05  0:41 ` [PATCH v5 2/7] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
@ 2022-10-05  0:41   ` Gavin Shan
  0 siblings, 0 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

Not all architectures like ARM64 need to override the function. Move
its declaration to kvm_dirty_ring.h to avoid the following compiling
warning on ARM64 when the feature is enabled.

  arch/arm64/kvm/../../../virt/kvm/dirty_ring.c:14:12:        \
  warning: no previous prototype for 'kvm_cpu_dirty_log_size' \
  [-Wmissing-prototypes]                                      \
  int __weak kvm_cpu_dirty_log_size(void)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 --
 include/linux/kvm_dirty_ring.h  | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa381ab69a19..f11b6a9388b5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2083,8 +2083,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 #define GET_SMSTATE(type, buf, offset)		\
 	(*(type *)((buf) + (offset) - 0x7e00))
 
-int kvm_cpu_dirty_log_size(void);
-
 int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
 
 #define KVM_CLOCK_VALID_FLAGS						\
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 66508afa0b40..fe5982b46424 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -66,6 +66,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
+int kvm_cpu_dirty_log_size(void);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-05  0:41 [PATCH v5 0/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
                   ` (2 preceding siblings ...)
  2022-10-05  0:41 ` [PATCH v5 2/7] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
@ 2022-10-05  0:41 ` Gavin Shan
  2022-10-05  0:41   ` Gavin Shan
  2022-10-06 20:28   ` Peter Xu
  2022-10-05  0:41 ` [PATCH v5 4/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, catalin.marinas, andrew.jones, will, shan.gavin, bgardon,
	dmatlack, pbonzini, zhenyzha, shuah, kvmarm

There is no running vcpu and available per-vcpu dirty ring when
pages become dirty in some cases. One example is to save arm64's
vgic/its tables during migration. This leads to our lost tracking
on these dirty pages.

Fix the issue by reusing the bitmap to track those dirty pages.
The bitmap is visited by KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG
ioctls. KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP is used to advertise the
new capability.

Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst | 17 ++++++++---------
 include/uapi/linux/kvm.h       |  1 +
 virt/kvm/kvm_main.c            | 24 +++++++++---------------
 3 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 32427ea160df..4f5e09042f8a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8019,8 +8019,8 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
 (0x40000001). Otherwise, a guest may use the paravirtual features
 regardless of what has actually been exposed through the CPUID leaf.
 
-8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
-----------------------------------------------------------
+8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP}
+--------------------------------------------------------------
 
 :Architectures: x86
 :Parameters: args[0] - size of the dirty log ring
@@ -8104,13 +8104,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
 needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
 vmexit ensures that all dirty GFNs are flushed to the dirty rings.
 
-NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
-ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
-KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
-KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
-machine will switch to ring-buffer dirty page tracking and further
-KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
-
 NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
 should be exposed by weakly ordered architecture, in order to indicate
 the additional memory ordering requirements imposed on userspace when
@@ -8119,6 +8112,12 @@ Architecture with TSO-like ordering (such as x86) are allowed to
 expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 to userspace.
 
+NOTE: There is no running vcpu and available vcpu dirty ring when pages
+becomes dirty in some cases. One example is to save arm64's vgic/its
+tables during migration. The dirty bitmap is still used to track those
+dirty pages, indicated by KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP. The ditry
+bitmap is visited by KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls.
+
 8.30 KVM_CAP_XEN_HVM
 --------------------
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..a1f10bc5fe8b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5b064dbadaf4..505e3840cf0c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
 			new->dirty_bitmap = NULL;
 		else if (old && old->dirty_bitmap)
 			new->dirty_bitmap = old->dirty_bitmap;
-		else if (!kvm->dirty_ring_size) {
+		else {
 			r = kvm_alloc_dirty_bitmap(new);
 			if (r)
 				return r;
@@ -2060,10 +2060,6 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 	unsigned long n;
 	unsigned long any = 0;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
-		return -ENXIO;
-
 	*memslot = NULL;
 	*is_dirty = 0;
 
@@ -2125,10 +2121,6 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
-		return -ENXIO;
-
 	as_id = log->slot >> 16;
 	id = (u16)log->slot;
 	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
@@ -2237,10 +2229,6 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
-		return -ENXIO;
-
 	as_id = log->slot >> 16;
 	id = (u16)log->slot;
 	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
@@ -3305,7 +3293,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
 #ifdef CONFIG_HAVE_KVM_DIRTY_RING
-	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
 		return;
 #endif
 
@@ -3313,7 +3301,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
-		if (kvm->dirty_ring_size)
+		if (kvm->dirty_ring_size && vcpu)
 			kvm_dirty_ring_push(&vcpu->dirty_ring,
 					    slot, rel_gfn);
 		else
@@ -4485,6 +4473,12 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
 #else
 		return 0;
+#endif
+	case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING
+		return 1;
+#else
+		return 0;
 #endif
 	case KVM_CAP_BINARY_STATS_FD:
 	case KVM_CAP_SYSTEM_EVENT_DATA:
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-05  0:41 ` [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking Gavin Shan
@ 2022-10-05  0:41   ` Gavin Shan
  2022-10-06 20:28   ` Peter Xu
  1 sibling, 0 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

There is no running vcpu and available per-vcpu dirty ring when
pages become dirty in some cases. One example is to save arm64's
vgic/its tables during migration. This leads to our lost tracking
on these dirty pages.

Fix the issue by reusing the bitmap to track those dirty pages.
The bitmap is visited by KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG
ioctls. KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP is used to advertise the
new capability.

Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst | 17 ++++++++---------
 include/uapi/linux/kvm.h       |  1 +
 virt/kvm/kvm_main.c            | 24 +++++++++---------------
 3 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 32427ea160df..4f5e09042f8a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8019,8 +8019,8 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
 (0x40000001). Otherwise, a guest may use the paravirtual features
 regardless of what has actually been exposed through the CPUID leaf.
 
-8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
-----------------------------------------------------------
+8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP}
+--------------------------------------------------------------
 
 :Architectures: x86
 :Parameters: args[0] - size of the dirty log ring
@@ -8104,13 +8104,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
 needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
 vmexit ensures that all dirty GFNs are flushed to the dirty rings.
 
-NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
-ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
-KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
-KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
-machine will switch to ring-buffer dirty page tracking and further
-KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
-
 NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
 should be exposed by weakly ordered architecture, in order to indicate
 the additional memory ordering requirements imposed on userspace when
@@ -8119,6 +8112,12 @@ Architecture with TSO-like ordering (such as x86) are allowed to
 expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 to userspace.
 
+NOTE: There is no running vcpu and available vcpu dirty ring when pages
+becomes dirty in some cases. One example is to save arm64's vgic/its
+tables during migration. The dirty bitmap is still used to track those
+dirty pages, indicated by KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP. The ditry
+bitmap is visited by KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls.
+
 8.30 KVM_CAP_XEN_HVM
 --------------------
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..a1f10bc5fe8b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5b064dbadaf4..505e3840cf0c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
 			new->dirty_bitmap = NULL;
 		else if (old && old->dirty_bitmap)
 			new->dirty_bitmap = old->dirty_bitmap;
-		else if (!kvm->dirty_ring_size) {
+		else {
 			r = kvm_alloc_dirty_bitmap(new);
 			if (r)
 				return r;
@@ -2060,10 +2060,6 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 	unsigned long n;
 	unsigned long any = 0;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
-		return -ENXIO;
-
 	*memslot = NULL;
 	*is_dirty = 0;
 
@@ -2125,10 +2121,6 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
-		return -ENXIO;
-
 	as_id = log->slot >> 16;
 	id = (u16)log->slot;
 	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
@@ -2237,10 +2229,6 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
-		return -ENXIO;
-
 	as_id = log->slot >> 16;
 	id = (u16)log->slot;
 	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
@@ -3305,7 +3293,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
 #ifdef CONFIG_HAVE_KVM_DIRTY_RING
-	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
 		return;
 #endif
 
@@ -3313,7 +3301,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
-		if (kvm->dirty_ring_size)
+		if (kvm->dirty_ring_size && vcpu)
 			kvm_dirty_ring_push(&vcpu->dirty_ring,
 					    slot, rel_gfn);
 		else
@@ -4485,6 +4473,12 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
 #else
 		return 0;
+#endif
+	case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING
+		return 1;
+#else
+		return 0;
 #endif
 	case KVM_CAP_BINARY_STATS_FD:
 	case KVM_CAP_SYSTEM_EVENT_DATA:
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 4/7] KVM: arm64: Enable ring-based dirty memory tracking
  2022-10-05  0:41 [PATCH v5 0/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
                   ` (3 preceding siblings ...)
  2022-10-05  0:41 ` [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking Gavin Shan
@ 2022-10-05  0:41 ` Gavin Shan
  2022-10-05  0:41   ` Gavin Shan
  2022-10-05  0:41 ` [PATCH v5 5/7] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, catalin.marinas, andrew.jones, will, shan.gavin, bgardon,
	dmatlack, pbonzini, zhenyzha, shuah, kvmarm

Enable ring-based dirty memory tracking on arm64 by selecting
CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL and providing the ring buffer's
physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst    | 2 +-
 arch/arm64/include/uapi/asm/kvm.h | 1 +
 arch/arm64/kvm/Kconfig            | 1 +
 arch/arm64/kvm/arm.c              | 3 +++
 4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 4f5e09042f8a..8c6acbb416ee 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8022,7 +8022,7 @@ regardless of what has actually been exposed through the CPUID leaf.
 8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP}
 --------------------------------------------------------------
 
-:Architectures: x86
+:Architectures: x86, arm64
 :Parameters: args[0] - size of the dirty log ring
 
 KVM is capable of tracking dirty memory using ring buffers that are
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 316917b98707..a7a857f1784d 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 815cc118c675..095c714b559e 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -32,6 +32,7 @@ menuconfig KVM
 	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_IRQFD
+	select HAVE_KVM_DIRTY_RING_ACQ_REL
 	select HAVE_KVM_MSI
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQ_ROUTING
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 917086be5c6b..53c963a159bc 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -747,6 +747,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
 			return kvm_vcpu_suspend(vcpu);
+
+		if (kvm_dirty_ring_check_request(vcpu))
+			return 0;
 	}
 
 	return 1;
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 4/7] KVM: arm64: Enable ring-based dirty memory tracking
  2022-10-05  0:41 ` [PATCH v5 4/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
@ 2022-10-05  0:41   ` Gavin Shan
  0 siblings, 0 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

Enable ring-based dirty memory tracking on arm64 by selecting
CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL and providing the ring buffer's
physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst    | 2 +-
 arch/arm64/include/uapi/asm/kvm.h | 1 +
 arch/arm64/kvm/Kconfig            | 1 +
 arch/arm64/kvm/arm.c              | 3 +++
 4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 4f5e09042f8a..8c6acbb416ee 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8022,7 +8022,7 @@ regardless of what has actually been exposed through the CPUID leaf.
 8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP}
 --------------------------------------------------------------
 
-:Architectures: x86
+:Architectures: x86, arm64
 :Parameters: args[0] - size of the dirty log ring
 
 KVM is capable of tracking dirty memory using ring buffers that are
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 316917b98707..a7a857f1784d 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 815cc118c675..095c714b559e 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -32,6 +32,7 @@ menuconfig KVM
 	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_IRQFD
+	select HAVE_KVM_DIRTY_RING_ACQ_REL
 	select HAVE_KVM_MSI
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQ_ROUTING
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 917086be5c6b..53c963a159bc 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -747,6 +747,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
 			return kvm_vcpu_suspend(vcpu);
+
+		if (kvm_dirty_ring_check_request(vcpu))
+			return 0;
 	}
 
 	return 1;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 5/7] KVM: selftests: Use host page size to map ring buffer in dirty_log_test
  2022-10-05  0:41 [PATCH v5 0/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
                   ` (4 preceding siblings ...)
  2022-10-05  0:41 ` [PATCH v5 4/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
@ 2022-10-05  0:41 ` Gavin Shan
  2022-10-05  0:41   ` Gavin Shan
  2022-10-05  0:41 ` [PATCH v5 6/7] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
  2022-10-05  0:41 ` [PATCH v5 7/7] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
  7 siblings, 1 reply; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, catalin.marinas, andrew.jones, will, shan.gavin, bgardon,
	dmatlack, pbonzini, zhenyzha, shuah, kvmarm

In vcpu_map_dirty_ring(), the guest's page size is used to figure out
the offset in the virtual area. It works fine when we have same page
sizes on host and guest. However, it fails when the page sizes on host
and guest are different on arm64, like below error messages indicates.

  # ./dirty_log_test -M dirty-ring -m 7
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  ==== Test Assertion Failure ====
  lib/kvm_util.c:1477: addr == MAP_FAILED
  pid=9000 tid=9000 errno=0 - Success
  1  0x0000000000405f5b: vcpu_map_dirty_ring at kvm_util.c:1477
  2  0x0000000000402ebb: dirty_ring_collect_dirty_pages at dirty_log_test.c:349
  3  0x00000000004029b3: log_mode_collect_dirty_pages at dirty_log_test.c:478
  4  (inlined by) run_test at dirty_log_test.c:778
  5  (inlined by) run_test at dirty_log_test.c:691
  6  0x0000000000403a57: for_each_guest_mode at guest_modes.c:105
  7  0x0000000000401ccf: main at dirty_log_test.c:921
  8  0x0000ffffb06ec79b: ?? ??:0
  9  0x0000ffffb06ec86b: ?? ??:0
  10 0x0000000000401def: _start at ??:?
  Dirty ring mapped private

Fix the issue by using host's page size to map the ring buffer.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 411a4c0bc81c..c339c8b694e0 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1467,7 +1467,7 @@ struct kvm_reg_list *vcpu_get_reg_list(struct kvm_vcpu *vcpu)
 
 void *vcpu_map_dirty_ring(struct kvm_vcpu *vcpu)
 {
-	uint32_t page_size = vcpu->vm->page_size;
+	uint32_t page_size = getpagesize();
 	uint32_t size = vcpu->vm->dirty_ring_size;
 
 	TEST_ASSERT(size > 0, "Should enable dirty ring first");
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 5/7] KVM: selftests: Use host page size to map ring buffer in dirty_log_test
  2022-10-05  0:41 ` [PATCH v5 5/7] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
@ 2022-10-05  0:41   ` Gavin Shan
  0 siblings, 0 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

In vcpu_map_dirty_ring(), the guest's page size is used to figure out
the offset in the virtual area. It works fine when we have same page
sizes on host and guest. However, it fails when the page sizes on host
and guest are different on arm64, like below error messages indicates.

  # ./dirty_log_test -M dirty-ring -m 7
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  ==== Test Assertion Failure ====
  lib/kvm_util.c:1477: addr == MAP_FAILED
  pid=9000 tid=9000 errno=0 - Success
  1  0x0000000000405f5b: vcpu_map_dirty_ring at kvm_util.c:1477
  2  0x0000000000402ebb: dirty_ring_collect_dirty_pages at dirty_log_test.c:349
  3  0x00000000004029b3: log_mode_collect_dirty_pages at dirty_log_test.c:478
  4  (inlined by) run_test at dirty_log_test.c:778
  5  (inlined by) run_test at dirty_log_test.c:691
  6  0x0000000000403a57: for_each_guest_mode at guest_modes.c:105
  7  0x0000000000401ccf: main at dirty_log_test.c:921
  8  0x0000ffffb06ec79b: ?? ??:0
  9  0x0000ffffb06ec86b: ?? ??:0
  10 0x0000000000401def: _start at ??:?
  Dirty ring mapped private

Fix the issue by using host's page size to map the ring buffer.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 411a4c0bc81c..c339c8b694e0 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1467,7 +1467,7 @@ struct kvm_reg_list *vcpu_get_reg_list(struct kvm_vcpu *vcpu)
 
 void *vcpu_map_dirty_ring(struct kvm_vcpu *vcpu)
 {
-	uint32_t page_size = vcpu->vm->page_size;
+	uint32_t page_size = getpagesize();
 	uint32_t size = vcpu->vm->dirty_ring_size;
 
 	TEST_ASSERT(size > 0, "Should enable dirty ring first");
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 6/7] KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
  2022-10-05  0:41 [PATCH v5 0/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
                   ` (5 preceding siblings ...)
  2022-10-05  0:41 ` [PATCH v5 5/7] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
@ 2022-10-05  0:41 ` Gavin Shan
  2022-10-05  0:41   ` Gavin Shan
  2022-10-05  0:41 ` [PATCH v5 7/7] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
  7 siblings, 1 reply; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, catalin.marinas, andrew.jones, will, shan.gavin, bgardon,
	dmatlack, pbonzini, zhenyzha, shuah, kvmarm

There are two states, which need to be cleared before next mode
is executed. Otherwise, we will hit failure as the following messages
indicate.

- The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu
  thread. It's indicating if the vcpu exit due to full ring buffer.
  The value can be carried from previous mode (VM_MODE_P40V48_4K) to
  current one (VM_MODE_P40V48_64K) when VM_MODE_P40V48_16K isn't
  supported.

- The current ring buffer index needs to be reset before next mode
  (VM_MODE_P40V48_64K) is executed. Otherwise, the stale value is
  carried from previous mode (VM_MODE_P40V48_4K).

  # ./dirty_log_test -M dirty-ring
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbfffc000
    :
  Dirtied 995328 pages
  Total bits checked: dirty (1012434), clear (7114123), track_next (966700)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  vcpu continues now.
  Notifying vcpu to continue
  Iteration 1 collected 0 pages
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  ==== Test Assertion Failure ====
  dirty_log_test.c:369: cleared == count
  pid=10541 tid=10541 errno=22 - Invalid argument
     1	0x0000000000403087: dirty_ring_collect_dirty_pages at dirty_log_test.c:369
     2	0x0000000000402a0b: log_mode_collect_dirty_pages at dirty_log_test.c:492
     3	 (inlined by) run_test at dirty_log_test.c:795
     4	 (inlined by) run_test at dirty_log_test.c:705
     5	0x0000000000403a37: for_each_guest_mode at guest_modes.c:100
     6	0x0000000000401ccf: main at dirty_log_test.c:938
     7	0x0000ffff9ecd279b: ?? ??:0
     8	0x0000ffff9ecd286b: ?? ??:0
     9	0x0000000000401def: _start at ??:?
  Reset dirty pages (0) mismatch with collected (35566)

Fix the issues by clearing 'dirty_ring_vcpu_ring_full' and the ring
buffer index before next new mode is to be executed.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 27 ++++++++++++--------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index b5234d6efbe1..8758c10ec850 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -226,13 +226,15 @@ static void clear_log_create_vm_done(struct kvm_vm *vm)
 }
 
 static void dirty_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 }
 
 static void clear_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 	kvm_vm_clear_dirty_log(vcpu->vm, slot, bitmap, 0, num_pages);
@@ -329,10 +331,9 @@ static void dirty_ring_continue_vcpu(void)
 }
 
 static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					   void *bitmap, uint32_t num_pages)
+					   void *bitmap, uint32_t num_pages,
+					   uint32_t *ring_buf_idx)
 {
-	/* We only have one vcpu */
-	static uint32_t fetch_index = 0;
 	uint32_t count = 0, cleared;
 	bool continued_vcpu = false;
 
@@ -349,7 +350,8 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
 
 	/* Only have one vcpu */
 	count = dirty_ring_collect_one(vcpu_map_dirty_ring(vcpu),
-				       slot, bitmap, num_pages, &fetch_index);
+				       slot, bitmap, num_pages,
+				       ring_buf_idx);
 
 	cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
 
@@ -406,7 +408,8 @@ struct log_mode {
 	void (*create_vm_done)(struct kvm_vm *vm);
 	/* Hook to collect the dirty pages into the bitmap provided */
 	void (*collect_dirty_pages) (struct kvm_vcpu *vcpu, int slot,
-				     void *bitmap, uint32_t num_pages);
+				     void *bitmap, uint32_t num_pages,
+				     uint32_t *ring_buf_idx);
 	/* Hook to call when after each vcpu run */
 	void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
 	void (*before_vcpu_join) (void);
@@ -471,13 +474,14 @@ static void log_mode_create_vm_done(struct kvm_vm *vm)
 }
 
 static void log_mode_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					 void *bitmap, uint32_t num_pages)
+					 void *bitmap, uint32_t num_pages,
+					 uint32_t *ring_buf_idx)
 {
 	struct log_mode *mode = &log_modes[host_log_mode];
 
 	TEST_ASSERT(mode->collect_dirty_pages != NULL,
 		    "collect_dirty_pages() is required for any log mode!");
-	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages);
+	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages, ring_buf_idx);
 }
 
 static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
@@ -696,6 +700,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	struct kvm_vcpu *vcpu;
 	struct kvm_vm *vm;
 	unsigned long *bmap;
+	uint32_t ring_buf_idx = 0;
 
 	if (!log_mode_supported()) {
 		print_skip("Log mode '%s' not supported",
@@ -771,6 +776,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	host_dirty_count = 0;
 	host_clear_count = 0;
 	host_track_next_count = 0;
+	WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
 
 	pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
 
@@ -778,7 +784,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 		/* Give the vcpu thread some time to dirty some pages */
 		usleep(p->interval * 1000);
 		log_mode_collect_dirty_pages(vcpu, TEST_MEM_SLOT_INDEX,
-					     bmap, host_num_pages);
+					     bmap, host_num_pages,
+					     &ring_buf_idx);
 
 		/*
 		 * See vcpu_sync_stop_requested definition for details on why
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 6/7] KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
  2022-10-05  0:41 ` [PATCH v5 6/7] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
@ 2022-10-05  0:41   ` Gavin Shan
  0 siblings, 0 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

There are two states, which need to be cleared before next mode
is executed. Otherwise, we will hit failure as the following messages
indicate.

- The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu
  thread. It's indicating if the vcpu exit due to full ring buffer.
  The value can be carried from previous mode (VM_MODE_P40V48_4K) to
  current one (VM_MODE_P40V48_64K) when VM_MODE_P40V48_16K isn't
  supported.

- The current ring buffer index needs to be reset before next mode
  (VM_MODE_P40V48_64K) is executed. Otherwise, the stale value is
  carried from previous mode (VM_MODE_P40V48_4K).

  # ./dirty_log_test -M dirty-ring
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbfffc000
    :
  Dirtied 995328 pages
  Total bits checked: dirty (1012434), clear (7114123), track_next (966700)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  vcpu continues now.
  Notifying vcpu to continue
  Iteration 1 collected 0 pages
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  ==== Test Assertion Failure ====
  dirty_log_test.c:369: cleared == count
  pid=10541 tid=10541 errno=22 - Invalid argument
     1	0x0000000000403087: dirty_ring_collect_dirty_pages at dirty_log_test.c:369
     2	0x0000000000402a0b: log_mode_collect_dirty_pages at dirty_log_test.c:492
     3	 (inlined by) run_test at dirty_log_test.c:795
     4	 (inlined by) run_test at dirty_log_test.c:705
     5	0x0000000000403a37: for_each_guest_mode at guest_modes.c:100
     6	0x0000000000401ccf: main at dirty_log_test.c:938
     7	0x0000ffff9ecd279b: ?? ??:0
     8	0x0000ffff9ecd286b: ?? ??:0
     9	0x0000000000401def: _start at ??:?
  Reset dirty pages (0) mismatch with collected (35566)

Fix the issues by clearing 'dirty_ring_vcpu_ring_full' and the ring
buffer index before next new mode is to be executed.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 27 ++++++++++++--------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index b5234d6efbe1..8758c10ec850 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -226,13 +226,15 @@ static void clear_log_create_vm_done(struct kvm_vm *vm)
 }
 
 static void dirty_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 }
 
 static void clear_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 	kvm_vm_clear_dirty_log(vcpu->vm, slot, bitmap, 0, num_pages);
@@ -329,10 +331,9 @@ static void dirty_ring_continue_vcpu(void)
 }
 
 static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					   void *bitmap, uint32_t num_pages)
+					   void *bitmap, uint32_t num_pages,
+					   uint32_t *ring_buf_idx)
 {
-	/* We only have one vcpu */
-	static uint32_t fetch_index = 0;
 	uint32_t count = 0, cleared;
 	bool continued_vcpu = false;
 
@@ -349,7 +350,8 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
 
 	/* Only have one vcpu */
 	count = dirty_ring_collect_one(vcpu_map_dirty_ring(vcpu),
-				       slot, bitmap, num_pages, &fetch_index);
+				       slot, bitmap, num_pages,
+				       ring_buf_idx);
 
 	cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
 
@@ -406,7 +408,8 @@ struct log_mode {
 	void (*create_vm_done)(struct kvm_vm *vm);
 	/* Hook to collect the dirty pages into the bitmap provided */
 	void (*collect_dirty_pages) (struct kvm_vcpu *vcpu, int slot,
-				     void *bitmap, uint32_t num_pages);
+				     void *bitmap, uint32_t num_pages,
+				     uint32_t *ring_buf_idx);
 	/* Hook to call when after each vcpu run */
 	void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
 	void (*before_vcpu_join) (void);
@@ -471,13 +474,14 @@ static void log_mode_create_vm_done(struct kvm_vm *vm)
 }
 
 static void log_mode_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					 void *bitmap, uint32_t num_pages)
+					 void *bitmap, uint32_t num_pages,
+					 uint32_t *ring_buf_idx)
 {
 	struct log_mode *mode = &log_modes[host_log_mode];
 
 	TEST_ASSERT(mode->collect_dirty_pages != NULL,
 		    "collect_dirty_pages() is required for any log mode!");
-	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages);
+	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages, ring_buf_idx);
 }
 
 static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
@@ -696,6 +700,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	struct kvm_vcpu *vcpu;
 	struct kvm_vm *vm;
 	unsigned long *bmap;
+	uint32_t ring_buf_idx = 0;
 
 	if (!log_mode_supported()) {
 		print_skip("Log mode '%s' not supported",
@@ -771,6 +776,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	host_dirty_count = 0;
 	host_clear_count = 0;
 	host_track_next_count = 0;
+	WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
 
 	pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
 
@@ -778,7 +784,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 		/* Give the vcpu thread some time to dirty some pages */
 		usleep(p->interval * 1000);
 		log_mode_collect_dirty_pages(vcpu, TEST_MEM_SLOT_INDEX,
-					     bmap, host_num_pages);
+					     bmap, host_num_pages,
+					     &ring_buf_idx);
 
 		/*
 		 * See vcpu_sync_stop_requested definition for details on why
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 7/7] KVM: selftests: Automate choosing dirty ring size in dirty_log_test
  2022-10-05  0:41 [PATCH v5 0/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
                   ` (6 preceding siblings ...)
  2022-10-05  0:41 ` [PATCH v5 6/7] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
@ 2022-10-05  0:41 ` Gavin Shan
  2022-10-05  0:41   ` Gavin Shan
  7 siblings, 1 reply; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, catalin.marinas, andrew.jones, will, shan.gavin, bgardon,
	dmatlack, pbonzini, zhenyzha, shuah, kvmarm

In the dirty ring case, we rely on vcpu exit due to full dirty ring
state. On ARM64 system, there are 4096 host pages when the host
page size is 64KB. In this case, the vcpu never exits due to the
full dirty ring state. The similar case is 4KB page size on host
and 64KB page size on guest. The vcpu corrupts same set of host
pages, but the dirty page information isn't collected in the main
thread. This leads to infinite loop as the following log shows.

  # ./dirty_log_test -M dirty-ring -c 65536 -m 5
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbffe0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  Iteration 1 collected 576 pages
  <No more output afterwards>

Fix the issue by automatically choosing the best dirty ring size,
to ensure vcpu exit due to full dirty ring state. The option '-c'
becomes a hint to the dirty ring count, instead of the value of it.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 26 +++++++++++++++++---
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 8758c10ec850..a87e5f78ebf1 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -24,6 +24,9 @@
 #include "guest_modes.h"
 #include "processor.h"
 
+#define DIRTY_MEM_BITS 30 /* 1G */
+#define PAGE_SHIFT_4K  12
+
 /* The memory slot index to track dirty pages */
 #define TEST_MEM_SLOT_INDEX		1
 
@@ -273,6 +276,24 @@ static bool dirty_ring_supported(void)
 
 static void dirty_ring_create_vm_done(struct kvm_vm *vm)
 {
+	uint64_t pages;
+	uint32_t limit;
+
+	/*
+	 * We rely on vcpu exit due to full dirty ring state. Adjust
+	 * the ring buffer size to ensure we're able to reach the
+	 * full dirty ring state.
+	 */
+	pages = (1ul << (DIRTY_MEM_BITS - vm->page_shift)) + 3;
+	pages = vm_adjust_num_guest_pages(vm->mode, pages);
+	if (vm->page_size < getpagesize())
+		pages = vm_num_host_pages(vm->mode, pages);
+
+	limit = 1 << (31 - __builtin_clz(pages));
+	test_dirty_ring_count = 1 << (31 - __builtin_clz(test_dirty_ring_count));
+	test_dirty_ring_count = min(limit, test_dirty_ring_count);
+	pr_info("dirty ring count: 0x%x\n", test_dirty_ring_count);
+
 	/*
 	 * Switch to dirty ring mode after VM creation but before any
 	 * of the vcpu creation.
@@ -685,9 +706,6 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, struct kvm_vcpu **vcpu,
 	return vm;
 }
 
-#define DIRTY_MEM_BITS 30 /* 1G */
-#define PAGE_SHIFT_4K  12
-
 struct test_params {
 	unsigned long iterations;
 	unsigned long interval;
@@ -830,7 +848,7 @@ static void help(char *name)
 	printf("usage: %s [-h] [-i iterations] [-I interval] "
 	       "[-p offset] [-m mode]\n", name);
 	puts("");
-	printf(" -c: specify dirty ring size, in number of entries\n");
+	printf(" -c: hint to dirty ring size, in number of entries\n");
 	printf("     (only useful for dirty-ring test; default: %"PRIu32")\n",
 	       TEST_DIRTY_RING_COUNT);
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v5 7/7] KVM: selftests: Automate choosing dirty ring size in dirty_log_test
  2022-10-05  0:41 ` [PATCH v5 7/7] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
@ 2022-10-05  0:41   ` Gavin Shan
  0 siblings, 0 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-05  0:41 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, peterx, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

In the dirty ring case, we rely on vcpu exit due to full dirty ring
state. On ARM64 system, there are 4096 host pages when the host
page size is 64KB. In this case, the vcpu never exits due to the
full dirty ring state. The similar case is 4KB page size on host
and 64KB page size on guest. The vcpu corrupts same set of host
pages, but the dirty page information isn't collected in the main
thread. This leads to infinite loop as the following log shows.

  # ./dirty_log_test -M dirty-ring -c 65536 -m 5
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbffe0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  Iteration 1 collected 576 pages
  <No more output afterwards>

Fix the issue by automatically choosing the best dirty ring size,
to ensure vcpu exit due to full dirty ring state. The option '-c'
becomes a hint to the dirty ring count, instead of the value of it.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 26 +++++++++++++++++---
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 8758c10ec850..a87e5f78ebf1 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -24,6 +24,9 @@
 #include "guest_modes.h"
 #include "processor.h"
 
+#define DIRTY_MEM_BITS 30 /* 1G */
+#define PAGE_SHIFT_4K  12
+
 /* The memory slot index to track dirty pages */
 #define TEST_MEM_SLOT_INDEX		1
 
@@ -273,6 +276,24 @@ static bool dirty_ring_supported(void)
 
 static void dirty_ring_create_vm_done(struct kvm_vm *vm)
 {
+	uint64_t pages;
+	uint32_t limit;
+
+	/*
+	 * We rely on vcpu exit due to full dirty ring state. Adjust
+	 * the ring buffer size to ensure we're able to reach the
+	 * full dirty ring state.
+	 */
+	pages = (1ul << (DIRTY_MEM_BITS - vm->page_shift)) + 3;
+	pages = vm_adjust_num_guest_pages(vm->mode, pages);
+	if (vm->page_size < getpagesize())
+		pages = vm_num_host_pages(vm->mode, pages);
+
+	limit = 1 << (31 - __builtin_clz(pages));
+	test_dirty_ring_count = 1 << (31 - __builtin_clz(test_dirty_ring_count));
+	test_dirty_ring_count = min(limit, test_dirty_ring_count);
+	pr_info("dirty ring count: 0x%x\n", test_dirty_ring_count);
+
 	/*
 	 * Switch to dirty ring mode after VM creation but before any
 	 * of the vcpu creation.
@@ -685,9 +706,6 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, struct kvm_vcpu **vcpu,
 	return vm;
 }
 
-#define DIRTY_MEM_BITS 30 /* 1G */
-#define PAGE_SHIFT_4K  12
-
 struct test_params {
 	unsigned long iterations;
 	unsigned long interval;
@@ -830,7 +848,7 @@ static void help(char *name)
 	printf("usage: %s [-h] [-i iterations] [-I interval] "
 	       "[-p offset] [-m mode]\n", name);
 	puts("");
-	printf(" -c: specify dirty ring size, in number of entries\n");
+	printf(" -c: hint to dirty ring size, in number of entries\n");
 	printf("     (only useful for dirty-ring test; default: %"PRIu32")\n",
 	       TEST_DIRTY_RING_COUNT);
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-05  0:41 ` [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking Gavin Shan
  2022-10-05  0:41   ` Gavin Shan
@ 2022-10-06 20:28   ` Peter Xu
  2022-10-06 20:28     ` Peter Xu
  2022-10-06 23:38     ` Gavin Shan
  1 sibling, 2 replies; 48+ messages in thread
From: Peter Xu @ 2022-10-06 20:28 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

On Wed, Oct 05, 2022 at 08:41:50AM +0800, Gavin Shan wrote:
> -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> -----------------------------------------------------------
> +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP}
> +--------------------------------------------------------------

Shall we make it a standalone cap, just to rely on DIRTY_RING[_ACQ_REL]
being enabled first, instead of making the three caps at the same level?

E.g. we can skip creating bitmap for DIRTY_RING[_ACQ_REL] && !_ALLOW_BITMAP
(x86).

> @@ -2060,10 +2060,6 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>  	unsigned long n;
>  	unsigned long any = 0;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> -		return -ENXIO;

Then we can also have one dirty_ring_exclusive(), with something like:

bool dirty_ring_exclusive(struct kvm *kvm)
{
        return kvm->dirty_ring_size && !kvm->dirty_ring_allow_bitmap;
}

Does it make sense?  Thanks,

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-06 20:28   ` Peter Xu
@ 2022-10-06 20:28     ` Peter Xu
  2022-10-06 23:38     ` Gavin Shan
  1 sibling, 0 replies; 48+ messages in thread
From: Peter Xu @ 2022-10-06 20:28 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

On Wed, Oct 05, 2022 at 08:41:50AM +0800, Gavin Shan wrote:
> -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> -----------------------------------------------------------
> +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP}
> +--------------------------------------------------------------

Shall we make it a standalone cap, just to rely on DIRTY_RING[_ACQ_REL]
being enabled first, instead of making the three caps at the same level?

E.g. we can skip creating bitmap for DIRTY_RING[_ACQ_REL] && !_ALLOW_BITMAP
(x86).

> @@ -2060,10 +2060,6 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>  	unsigned long n;
>  	unsigned long any = 0;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> -		return -ENXIO;

Then we can also have one dirty_ring_exclusive(), with something like:

bool dirty_ring_exclusive(struct kvm *kvm)
{
        return kvm->dirty_ring_size && !kvm->dirty_ring_allow_bitmap;
}

Does it make sense?  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-06 20:28   ` Peter Xu
  2022-10-06 20:28     ` Peter Xu
@ 2022-10-06 23:38     ` Gavin Shan
  2022-10-06 23:38       ` Gavin Shan
  2022-10-07 14:31       ` Peter Xu
  1 sibling, 2 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-06 23:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

Hi Peter,

On 10/7/22 4:28 AM, Peter Xu wrote:
> On Wed, Oct 05, 2022 at 08:41:50AM +0800, Gavin Shan wrote:
>> -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>> -----------------------------------------------------------
>> +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP}
>> +--------------------------------------------------------------
> 
> Shall we make it a standalone cap, just to rely on DIRTY_RING[_ACQ_REL]
> being enabled first, instead of making the three caps at the same level?
> 
> E.g. we can skip creating bitmap for DIRTY_RING[_ACQ_REL] && !_ALLOW_BITMAP
> (x86).
> 

Both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LONG_RING_ACQ_REL are available
to x86. So KVM_CAP_DIRTY_LONG_RING_ACQ_REL can be enabled on x86 in theory.
However, the idea to disallow bitmap for KVM_CAP_DIRTY_LOG_RING (x86) makes
sense to me. I think you may be suggesting something like below.

- bool struct kvm::dirty_ring_allow_bitmap

- In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
   true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL

   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size)
   {
     :
     mutex_lock(&kvm->lock);

     if (kvm->created_vcpus) {
        /* We don't allow to change this value after vcpu created */
        r = -EINVAL;
     } else {
        kvm->dirty_ring_size = size;
        kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL);
        r = 0;
     }

     mutex_unlock(&kvm->lock);
     return r;
   }
   
- In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP
   is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled.

   static long kvm_vm_ioctl_check_extension_generic(...)
   {
     :
     case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
         return kvm->dirty_ring_allow_bitmap ? 1 : 0;
   }

- The suggested dirty_ring_exclusive() is used.

>> @@ -2060,10 +2060,6 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>>   	unsigned long n;
>>   	unsigned long any = 0;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> -		return -ENXIO;
> 
> Then we can also have one dirty_ring_exclusive(), with something like:
> 
> bool dirty_ring_exclusive(struct kvm *kvm)
> {
>          return kvm->dirty_ring_size && !kvm->dirty_ring_allow_bitmap;
> }
> 
> Does it make sense?  Thanks,
> 

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-06 23:38     ` Gavin Shan
@ 2022-10-06 23:38       ` Gavin Shan
  2022-10-07 14:31       ` Peter Xu
  1 sibling, 0 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-06 23:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

Hi Peter,

On 10/7/22 4:28 AM, Peter Xu wrote:
> On Wed, Oct 05, 2022 at 08:41:50AM +0800, Gavin Shan wrote:
>> -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>> -----------------------------------------------------------
>> +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP}
>> +--------------------------------------------------------------
> 
> Shall we make it a standalone cap, just to rely on DIRTY_RING[_ACQ_REL]
> being enabled first, instead of making the three caps at the same level?
> 
> E.g. we can skip creating bitmap for DIRTY_RING[_ACQ_REL] && !_ALLOW_BITMAP
> (x86).
> 

Both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LONG_RING_ACQ_REL are available
to x86. So KVM_CAP_DIRTY_LONG_RING_ACQ_REL can be enabled on x86 in theory.
However, the idea to disallow bitmap for KVM_CAP_DIRTY_LOG_RING (x86) makes
sense to me. I think you may be suggesting something like below.

- bool struct kvm::dirty_ring_allow_bitmap

- In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
   true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL

   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size)
   {
     :
     mutex_lock(&kvm->lock);

     if (kvm->created_vcpus) {
        /* We don't allow to change this value after vcpu created */
        r = -EINVAL;
     } else {
        kvm->dirty_ring_size = size;
        kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL);
        r = 0;
     }

     mutex_unlock(&kvm->lock);
     return r;
   }
   
- In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP
   is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled.

   static long kvm_vm_ioctl_check_extension_generic(...)
   {
     :
     case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
         return kvm->dirty_ring_allow_bitmap ? 1 : 0;
   }

- The suggested dirty_ring_exclusive() is used.

>> @@ -2060,10 +2060,6 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>>   	unsigned long n;
>>   	unsigned long any = 0;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> -		return -ENXIO;
> 
> Then we can also have one dirty_ring_exclusive(), with something like:
> 
> bool dirty_ring_exclusive(struct kvm *kvm)
> {
>          return kvm->dirty_ring_size && !kvm->dirty_ring_allow_bitmap;
> }
> 
> Does it make sense?  Thanks,
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-06 23:38     ` Gavin Shan
  2022-10-06 23:38       ` Gavin Shan
@ 2022-10-07 14:31       ` Peter Xu
  2022-10-07 14:31         ` Peter Xu
  2022-10-10 23:18         ` Oliver Upton
  1 sibling, 2 replies; 48+ messages in thread
From: Peter Xu @ 2022-10-07 14:31 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

On Fri, Oct 07, 2022 at 07:38:19AM +0800, Gavin Shan wrote:
> Hi Peter,

Hi, Gavin,

> 
> On 10/7/22 4:28 AM, Peter Xu wrote:
> > On Wed, Oct 05, 2022 at 08:41:50AM +0800, Gavin Shan wrote:
> > > -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> > > -----------------------------------------------------------
> > > +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP}
> > > +--------------------------------------------------------------
> > 
> > Shall we make it a standalone cap, just to rely on DIRTY_RING[_ACQ_REL]
> > being enabled first, instead of making the three caps at the same level?
> > 
> > E.g. we can skip creating bitmap for DIRTY_RING[_ACQ_REL] && !_ALLOW_BITMAP
> > (x86).
> > 
> 
> Both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LONG_RING_ACQ_REL are available
> to x86. So KVM_CAP_DIRTY_LONG_RING_ACQ_REL can be enabled on x86 in theory.
> However, the idea to disallow bitmap for KVM_CAP_DIRTY_LOG_RING (x86) makes
> sense to me. I think you may be suggesting something like below.
> 
> - bool struct kvm::dirty_ring_allow_bitmap
> 
> - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
>   true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL

What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly
as what you suggested, except..

> 
>   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size)
>   {
>     :
>     mutex_lock(&kvm->lock);
> 
>     if (kvm->created_vcpus) {
>        /* We don't allow to change this value after vcpu created */
>        r = -EINVAL;
>     } else {
>        kvm->dirty_ring_size = size;

.. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line,
instead..

>        kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL);
>        r = 0;
>     }
> 
>     mutex_unlock(&kvm->lock);
>     return r;
>   }
> - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP
>   is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled.
> 
>   static long kvm_vm_ioctl_check_extension_generic(...)
>   {
>     :
>     case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
>         return kvm->dirty_ring_allow_bitmap ? 1 : 0;

... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic():

      case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
           if (kvm->dirty_ring_size)
                return -EINVAL;
           kvm->dirty_ring_allow_bitmap = true;
           return 0;

A side effect of checking dirty_ring_size is then we'll be sure to have no
vcpu created too.  Maybe we should also check no memslot created to make
sure the bitmaps are not created.

Then if the userspace wants to use the bitmap altogether with the ring, it
needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it
before it enables KVM_CAP_DIRTY_LOG_RING.

One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow
!vcpu case we'll need to make sure it won't accidentally try to set bitmap
for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so
set_bit_le() will directly crash the kernel.

We could keep the old flavor of having a WARN_ON_ONCE(!vcpu &&
!ALLOW_BITMAP) then return, but since now the userspace can easily trigger
this (e.g. on ARM, a malicious userapp can have DIRTY_RING &&
!ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host
warning), I think the better approach is we can kill the process in that
case.  Not sure whether there's anything better we can do.

>   }
> 
> - The suggested dirty_ring_exclusive() is used.
> 
> > > @@ -2060,10 +2060,6 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
> > >   	unsigned long n;
> > >   	unsigned long any = 0;
> > > -	/* Dirty ring tracking is exclusive to dirty log tracking */
> > > -	if (kvm->dirty_ring_size)
> > > -		return -ENXIO;
> > 
> > Then we can also have one dirty_ring_exclusive(), with something like:
> > 
> > bool dirty_ring_exclusive(struct kvm *kvm)
> > {
> >          return kvm->dirty_ring_size && !kvm->dirty_ring_allow_bitmap;
> > }
> > 
> > Does it make sense?  Thanks,
> > 
> 
> Thanks,
> Gavin
> 

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-07 14:31       ` Peter Xu
@ 2022-10-07 14:31         ` Peter Xu
  2022-10-10 23:18         ` Oliver Upton
  1 sibling, 0 replies; 48+ messages in thread
From: Peter Xu @ 2022-10-07 14:31 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, oliver.upton, seanjc,
	shan.gavin

On Fri, Oct 07, 2022 at 07:38:19AM +0800, Gavin Shan wrote:
> Hi Peter,

Hi, Gavin,

> 
> On 10/7/22 4:28 AM, Peter Xu wrote:
> > On Wed, Oct 05, 2022 at 08:41:50AM +0800, Gavin Shan wrote:
> > > -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> > > -----------------------------------------------------------
> > > +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP}
> > > +--------------------------------------------------------------
> > 
> > Shall we make it a standalone cap, just to rely on DIRTY_RING[_ACQ_REL]
> > being enabled first, instead of making the three caps at the same level?
> > 
> > E.g. we can skip creating bitmap for DIRTY_RING[_ACQ_REL] && !_ALLOW_BITMAP
> > (x86).
> > 
> 
> Both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LONG_RING_ACQ_REL are available
> to x86. So KVM_CAP_DIRTY_LONG_RING_ACQ_REL can be enabled on x86 in theory.
> However, the idea to disallow bitmap for KVM_CAP_DIRTY_LOG_RING (x86) makes
> sense to me. I think you may be suggesting something like below.
> 
> - bool struct kvm::dirty_ring_allow_bitmap
> 
> - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
>   true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL

What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly
as what you suggested, except..

> 
>   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size)
>   {
>     :
>     mutex_lock(&kvm->lock);
> 
>     if (kvm->created_vcpus) {
>        /* We don't allow to change this value after vcpu created */
>        r = -EINVAL;
>     } else {
>        kvm->dirty_ring_size = size;

.. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line,
instead..

>        kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL);
>        r = 0;
>     }
> 
>     mutex_unlock(&kvm->lock);
>     return r;
>   }
> - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP
>   is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled.
> 
>   static long kvm_vm_ioctl_check_extension_generic(...)
>   {
>     :
>     case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
>         return kvm->dirty_ring_allow_bitmap ? 1 : 0;

... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic():

      case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
           if (kvm->dirty_ring_size)
                return -EINVAL;
           kvm->dirty_ring_allow_bitmap = true;
           return 0;

A side effect of checking dirty_ring_size is then we'll be sure to have no
vcpu created too.  Maybe we should also check no memslot created to make
sure the bitmaps are not created.

Then if the userspace wants to use the bitmap altogether with the ring, it
needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it
before it enables KVM_CAP_DIRTY_LOG_RING.

One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow
!vcpu case we'll need to make sure it won't accidentally try to set bitmap
for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so
set_bit_le() will directly crash the kernel.

We could keep the old flavor of having a WARN_ON_ONCE(!vcpu &&
!ALLOW_BITMAP) then return, but since now the userspace can easily trigger
this (e.g. on ARM, a malicious userapp can have DIRTY_RING &&
!ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host
warning), I think the better approach is we can kill the process in that
case.  Not sure whether there's anything better we can do.

>   }
> 
> - The suggested dirty_ring_exclusive() is used.
> 
> > > @@ -2060,10 +2060,6 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
> > >   	unsigned long n;
> > >   	unsigned long any = 0;
> > > -	/* Dirty ring tracking is exclusive to dirty log tracking */
> > > -	if (kvm->dirty_ring_size)
> > > -		return -ENXIO;
> > 
> > Then we can also have one dirty_ring_exclusive(), with something like:
> > 
> > bool dirty_ring_exclusive(struct kvm *kvm)
> > {
> >          return kvm->dirty_ring_size && !kvm->dirty_ring_allow_bitmap;
> > }
> > 
> > Does it make sense?  Thanks,
> > 
> 
> Thanks,
> Gavin
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-07 14:31       ` Peter Xu
  2022-10-07 14:31         ` Peter Xu
@ 2022-10-10 23:18         ` Oliver Upton
  2022-10-10 23:18           ` Oliver Upton
                             ` (2 more replies)
  1 sibling, 3 replies; 48+ messages in thread
From: Oliver Upton @ 2022-10-10 23:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote:

[...]

> > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
> >   true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL
> 
> What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly
> as what you suggested, except..

+1

> > 
> >   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size)
> >   {
> >     :
> >     mutex_lock(&kvm->lock);
> > 
> >     if (kvm->created_vcpus) {
> >        /* We don't allow to change this value after vcpu created */
> >        r = -EINVAL;
> >     } else {
> >        kvm->dirty_ring_size = size;
> 
> .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line,
> instead..
> 
> >        kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL);
> >        r = 0;
> >     }
> > 
> >     mutex_unlock(&kvm->lock);
> >     return r;
> >   }
> > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP
> >   is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled.
> > 
> >   static long kvm_vm_ioctl_check_extension_generic(...)
> >   {
> >     :
> >     case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> >         return kvm->dirty_ring_allow_bitmap ? 1 : 0;
> 
> ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic():
> 
>       case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
>            if (kvm->dirty_ring_size)
>                 return -EINVAL;
>            kvm->dirty_ring_allow_bitmap = true;
>            return 0;
> 
> A side effect of checking dirty_ring_size is then we'll be sure to have no
> vcpu created too.  Maybe we should also check no memslot created to make
> sure the bitmaps are not created.

I'm not sure I follow... What prevents userspace from creating a vCPU
between enabling the two caps?

> Then if the userspace wants to use the bitmap altogether with the ring, it
> needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it
> before it enables KVM_CAP_DIRTY_LOG_RING.
> 
> One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow
> !vcpu case we'll need to make sure it won't accidentally try to set bitmap
> for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so
> set_bit_le() will directly crash the kernel.
> 
> We could keep the old flavor of having a WARN_ON_ONCE(!vcpu &&
> !ALLOW_BITMAP) then return, but since now the userspace can easily trigger
> this (e.g. on ARM, a malicious userapp can have DIRTY_RING &&
> !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host
> warning), I think the better approach is we can kill the process in that
> case.  Not sure whether there's anything better we can do.

I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for
arm64 given the fact that we'll dirty memory outside of a vCPU context.

Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making
userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the
target. With that the old WARN() could be preserved, as you suggest. On
top of that there would no longer be a need to test for memslot creation
when userspace attempts to enable KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-10 23:18         ` Oliver Upton
@ 2022-10-10 23:18           ` Oliver Upton
  2022-10-10 23:43           ` Oliver Upton
  2022-10-10 23:49           ` Peter Xu
  2 siblings, 0 replies; 48+ messages in thread
From: Oliver Upton @ 2022-10-10 23:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, seanjc, shan.gavin

On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote:

[...]

> > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
> >   true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL
> 
> What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly
> as what you suggested, except..

+1

> > 
> >   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size)
> >   {
> >     :
> >     mutex_lock(&kvm->lock);
> > 
> >     if (kvm->created_vcpus) {
> >        /* We don't allow to change this value after vcpu created */
> >        r = -EINVAL;
> >     } else {
> >        kvm->dirty_ring_size = size;
> 
> .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line,
> instead..
> 
> >        kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL);
> >        r = 0;
> >     }
> > 
> >     mutex_unlock(&kvm->lock);
> >     return r;
> >   }
> > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP
> >   is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled.
> > 
> >   static long kvm_vm_ioctl_check_extension_generic(...)
> >   {
> >     :
> >     case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> >         return kvm->dirty_ring_allow_bitmap ? 1 : 0;
> 
> ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic():
> 
>       case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
>            if (kvm->dirty_ring_size)
>                 return -EINVAL;
>            kvm->dirty_ring_allow_bitmap = true;
>            return 0;
> 
> A side effect of checking dirty_ring_size is then we'll be sure to have no
> vcpu created too.  Maybe we should also check no memslot created to make
> sure the bitmaps are not created.

I'm not sure I follow... What prevents userspace from creating a vCPU
between enabling the two caps?

> Then if the userspace wants to use the bitmap altogether with the ring, it
> needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it
> before it enables KVM_CAP_DIRTY_LOG_RING.
> 
> One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow
> !vcpu case we'll need to make sure it won't accidentally try to set bitmap
> for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so
> set_bit_le() will directly crash the kernel.
> 
> We could keep the old flavor of having a WARN_ON_ONCE(!vcpu &&
> !ALLOW_BITMAP) then return, but since now the userspace can easily trigger
> this (e.g. on ARM, a malicious userapp can have DIRTY_RING &&
> !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host
> warning), I think the better approach is we can kill the process in that
> case.  Not sure whether there's anything better we can do.

I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for
arm64 given the fact that we'll dirty memory outside of a vCPU context.

Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making
userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the
target. With that the old WARN() could be preserved, as you suggest. On
top of that there would no longer be a need to test for memslot creation
when userspace attempts to enable KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-10 23:18         ` Oliver Upton
  2022-10-10 23:18           ` Oliver Upton
@ 2022-10-10 23:43           ` Oliver Upton
  2022-10-10 23:43             ` Oliver Upton
  2022-10-10 23:49           ` Peter Xu
  2 siblings, 1 reply; 48+ messages in thread
From: Oliver Upton @ 2022-10-10 23:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: shuah, kvm, catalin.marinas, andrew.jones, kvmarm, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, will, kvmarm

On Mon, Oct 10, 2022 at 11:18:55PM +0000, Oliver Upton wrote:
> On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote:
> 
> [...]
> 
> > > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
> > >   true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL
> > 
> > What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly
> > as what you suggested, except..
> 
> +1
> 
> > > 
> > >   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size)
> > >   {
> > >     :
> > >     mutex_lock(&kvm->lock);
> > > 
> > >     if (kvm->created_vcpus) {
> > >        /* We don't allow to change this value after vcpu created */
> > >        r = -EINVAL;
> > >     } else {
> > >        kvm->dirty_ring_size = size;
> > 
> > .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line,
> > instead..
> > 
> > >        kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL);
> > >        r = 0;
> > >     }
> > > 
> > >     mutex_unlock(&kvm->lock);
> > >     return r;
> > >   }
> > > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP
> > >   is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled.
> > > 
> > >   static long kvm_vm_ioctl_check_extension_generic(...)
> > >   {
> > >     :
> > >     case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> > >         return kvm->dirty_ring_allow_bitmap ? 1 : 0;
> > 
> > ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic():
> > 
> >       case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> >            if (kvm->dirty_ring_size)
> >                 return -EINVAL;
> >            kvm->dirty_ring_allow_bitmap = true;
> >            return 0;
> > 
> > A side effect of checking dirty_ring_size is then we'll be sure to have no
> > vcpu created too.  Maybe we should also check no memslot created to make
> > sure the bitmaps are not created.
> 
> I'm not sure I follow... What prevents userspace from creating a vCPU
> between enabling the two caps?
> 
> > Then if the userspace wants to use the bitmap altogether with the ring, it
> > needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it
> > before it enables KVM_CAP_DIRTY_LOG_RING.
> > 
> > One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow
> > !vcpu case we'll need to make sure it won't accidentally try to set bitmap
> > for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so
> > set_bit_le() will directly crash the kernel.
> > 
> > We could keep the old flavor of having a WARN_ON_ONCE(!vcpu &&
> > !ALLOW_BITMAP) then return, but since now the userspace can easily trigger
> > this (e.g. on ARM, a malicious userapp can have DIRTY_RING &&
> > !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host
> > warning), I think the better approach is we can kill the process in that
> > case.  Not sure whether there's anything better we can do.
> 
> I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for
> arm64 given the fact that we'll dirty memory outside of a vCPU context.
> 
> Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making
> userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the
> target. With that the old WARN() could be preserved, as you suggest. On
> top of that there would no longer be a need to test for memslot creation
> when userspace attempts to enable KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP.

Just to be explicit...

I don't believe ALLOW_BITMAP needs to be generally advertized on
architectures that select DIRTY_RING. Instead, architectures (just arm64
right now) should select ALLOW_BITMAP if they need to dirty memory
outside of a vCPU.

When ALLOW_BITMAP is selected, KVM_CAP_DIRTY_LOG_RING[_ACQ_REL] has the
additional restriction that KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP has been
enabled first.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-10 23:43           ` Oliver Upton
@ 2022-10-10 23:43             ` Oliver Upton
  0 siblings, 0 replies; 48+ messages in thread
From: Oliver Upton @ 2022-10-10 23:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

On Mon, Oct 10, 2022 at 11:18:55PM +0000, Oliver Upton wrote:
> On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote:
> 
> [...]
> 
> > > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
> > >   true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL
> > 
> > What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly
> > as what you suggested, except..
> 
> +1
> 
> > > 
> > >   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size)
> > >   {
> > >     :
> > >     mutex_lock(&kvm->lock);
> > > 
> > >     if (kvm->created_vcpus) {
> > >        /* We don't allow to change this value after vcpu created */
> > >        r = -EINVAL;
> > >     } else {
> > >        kvm->dirty_ring_size = size;
> > 
> > .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line,
> > instead..
> > 
> > >        kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL);
> > >        r = 0;
> > >     }
> > > 
> > >     mutex_unlock(&kvm->lock);
> > >     return r;
> > >   }
> > > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP
> > >   is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled.
> > > 
> > >   static long kvm_vm_ioctl_check_extension_generic(...)
> > >   {
> > >     :
> > >     case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> > >         return kvm->dirty_ring_allow_bitmap ? 1 : 0;
> > 
> > ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic():
> > 
> >       case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> >            if (kvm->dirty_ring_size)
> >                 return -EINVAL;
> >            kvm->dirty_ring_allow_bitmap = true;
> >            return 0;
> > 
> > A side effect of checking dirty_ring_size is then we'll be sure to have no
> > vcpu created too.  Maybe we should also check no memslot created to make
> > sure the bitmaps are not created.
> 
> I'm not sure I follow... What prevents userspace from creating a vCPU
> between enabling the two caps?
> 
> > Then if the userspace wants to use the bitmap altogether with the ring, it
> > needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it
> > before it enables KVM_CAP_DIRTY_LOG_RING.
> > 
> > One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow
> > !vcpu case we'll need to make sure it won't accidentally try to set bitmap
> > for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so
> > set_bit_le() will directly crash the kernel.
> > 
> > We could keep the old flavor of having a WARN_ON_ONCE(!vcpu &&
> > !ALLOW_BITMAP) then return, but since now the userspace can easily trigger
> > this (e.g. on ARM, a malicious userapp can have DIRTY_RING &&
> > !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host
> > warning), I think the better approach is we can kill the process in that
> > case.  Not sure whether there's anything better we can do.
> 
> I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for
> arm64 given the fact that we'll dirty memory outside of a vCPU context.
> 
> Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making
> userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the
> target. With that the old WARN() could be preserved, as you suggest. On
> top of that there would no longer be a need to test for memslot creation
> when userspace attempts to enable KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP.

Just to be explicit...

I don't believe ALLOW_BITMAP needs to be generally advertized on
architectures that select DIRTY_RING. Instead, architectures (just arm64
right now) should select ALLOW_BITMAP if they need to dirty memory
outside of a vCPU.

When ALLOW_BITMAP is selected, KVM_CAP_DIRTY_LOG_RING[_ACQ_REL] has the
additional restriction that KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP has been
enabled first.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-10 23:18         ` Oliver Upton
  2022-10-10 23:18           ` Oliver Upton
  2022-10-10 23:43           ` Oliver Upton
@ 2022-10-10 23:49           ` Peter Xu
  2022-10-10 23:49             ` Peter Xu
                               ` (2 more replies)
  2 siblings, 3 replies; 48+ messages in thread
From: Peter Xu @ 2022-10-10 23:49 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

On Mon, Oct 10, 2022 at 11:18:55PM +0000, Oliver Upton wrote:
> On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote:
> 
> [...]
> 
> > > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
> > >   true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL
> > 
> > What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly
> > as what you suggested, except..
> 
> +1
> 
> > > 
> > >   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size)
> > >   {
> > >     :
> > >     mutex_lock(&kvm->lock);
> > > 
> > >     if (kvm->created_vcpus) {
> > >        /* We don't allow to change this value after vcpu created */
> > >        r = -EINVAL;
> > >     } else {
> > >        kvm->dirty_ring_size = size;
> > 
> > .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line,
> > instead..
> > 
> > >        kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL);
> > >        r = 0;
> > >     }
> > > 
> > >     mutex_unlock(&kvm->lock);
> > >     return r;
> > >   }
> > > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP
> > >   is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled.
> > > 
> > >   static long kvm_vm_ioctl_check_extension_generic(...)
> > >   {
> > >     :
> > >     case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> > >         return kvm->dirty_ring_allow_bitmap ? 1 : 0;
> > 
> > ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic():
> > 
> >       case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> >            if (kvm->dirty_ring_size)
> >                 return -EINVAL;
> >            kvm->dirty_ring_allow_bitmap = true;
> >            return 0;
> > 
> > A side effect of checking dirty_ring_size is then we'll be sure to have no
> > vcpu created too.  Maybe we should also check no memslot created to make
> > sure the bitmaps are not created.
> 
> I'm not sure I follow... What prevents userspace from creating a vCPU
> between enabling the two caps?

Enabling of dirty ring requires no vcpu created, so as to make sure all the
vcpus will have the ring structures allocated as long as ring enabled for
the vm.  Done in kvm_vm_ioctl_enable_dirty_log_ring():

	if (kvm->created_vcpus) {
		/* We don't allow to change this value after vcpu created */
		r = -EINVAL;
	} else {
		kvm->dirty_ring_size = size;
		r = 0;
	}

Then if we have KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP checking
dirty_ring_size first then we make sure we need to configure both
ALLOW_BITMAP and DIRTY_RING before any vcpu creation.

> 
> > Then if the userspace wants to use the bitmap altogether with the ring, it
> > needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it
> > before it enables KVM_CAP_DIRTY_LOG_RING.
> > 
> > One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow
> > !vcpu case we'll need to make sure it won't accidentally try to set bitmap
> > for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so
> > set_bit_le() will directly crash the kernel.
> > 
> > We could keep the old flavor of having a WARN_ON_ONCE(!vcpu &&
> > !ALLOW_BITMAP) then return, but since now the userspace can easily trigger
> > this (e.g. on ARM, a malicious userapp can have DIRTY_RING &&
> > !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host
> > warning), I think the better approach is we can kill the process in that
> > case.  Not sure whether there's anything better we can do.
> 
> I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for
> arm64 given the fact that we'll dirty memory outside of a vCPU context.

Yes it's not, but after Gavin's current series it'll be possible, IOW a
malicious app can leverage this to trigger host warning, which is IMHO not
wanted.

> 
> Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making
> userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the
> target. With that the old WARN() could be preserved, as you suggest.

It's just that x86 doesn't need the bitmap, so it'll be a pure waste there
otherwise.  It's not only about the memory that will be wasted (that's
guest mem size / 32k), but also the sync() process for x86 will be all
zeros and totally meaningless - note that the sync() of bitmap will be part
of VM downtime in this case (we need to sync() after turning VM off), so it
will make x86 downtime larger but without any benefit.

> On
> top of that there would no longer be a need to test for memslot creation
> when userspace attempts to enable KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP.

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-10 23:49           ` Peter Xu
@ 2022-10-10 23:49             ` Peter Xu
  2022-10-10 23:58             ` Gavin Shan
  2022-10-10 23:58             ` Oliver Upton
  2 siblings, 0 replies; 48+ messages in thread
From: Peter Xu @ 2022-10-10 23:49 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, seanjc, shan.gavin

On Mon, Oct 10, 2022 at 11:18:55PM +0000, Oliver Upton wrote:
> On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote:
> 
> [...]
> 
> > > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
> > >   true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL
> > 
> > What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly
> > as what you suggested, except..
> 
> +1
> 
> > > 
> > >   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size)
> > >   {
> > >     :
> > >     mutex_lock(&kvm->lock);
> > > 
> > >     if (kvm->created_vcpus) {
> > >        /* We don't allow to change this value after vcpu created */
> > >        r = -EINVAL;
> > >     } else {
> > >        kvm->dirty_ring_size = size;
> > 
> > .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line,
> > instead..
> > 
> > >        kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL);
> > >        r = 0;
> > >     }
> > > 
> > >     mutex_unlock(&kvm->lock);
> > >     return r;
> > >   }
> > > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP
> > >   is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled.
> > > 
> > >   static long kvm_vm_ioctl_check_extension_generic(...)
> > >   {
> > >     :
> > >     case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> > >         return kvm->dirty_ring_allow_bitmap ? 1 : 0;
> > 
> > ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic():
> > 
> >       case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> >            if (kvm->dirty_ring_size)
> >                 return -EINVAL;
> >            kvm->dirty_ring_allow_bitmap = true;
> >            return 0;
> > 
> > A side effect of checking dirty_ring_size is then we'll be sure to have no
> > vcpu created too.  Maybe we should also check no memslot created to make
> > sure the bitmaps are not created.
> 
> I'm not sure I follow... What prevents userspace from creating a vCPU
> between enabling the two caps?

Enabling of dirty ring requires no vcpu created, so as to make sure all the
vcpus will have the ring structures allocated as long as ring enabled for
the vm.  Done in kvm_vm_ioctl_enable_dirty_log_ring():

	if (kvm->created_vcpus) {
		/* We don't allow to change this value after vcpu created */
		r = -EINVAL;
	} else {
		kvm->dirty_ring_size = size;
		r = 0;
	}

Then if we have KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP checking
dirty_ring_size first then we make sure we need to configure both
ALLOW_BITMAP and DIRTY_RING before any vcpu creation.

> 
> > Then if the userspace wants to use the bitmap altogether with the ring, it
> > needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it
> > before it enables KVM_CAP_DIRTY_LOG_RING.
> > 
> > One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow
> > !vcpu case we'll need to make sure it won't accidentally try to set bitmap
> > for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so
> > set_bit_le() will directly crash the kernel.
> > 
> > We could keep the old flavor of having a WARN_ON_ONCE(!vcpu &&
> > !ALLOW_BITMAP) then return, but since now the userspace can easily trigger
> > this (e.g. on ARM, a malicious userapp can have DIRTY_RING &&
> > !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host
> > warning), I think the better approach is we can kill the process in that
> > case.  Not sure whether there's anything better we can do.
> 
> I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for
> arm64 given the fact that we'll dirty memory outside of a vCPU context.

Yes it's not, but after Gavin's current series it'll be possible, IOW a
malicious app can leverage this to trigger host warning, which is IMHO not
wanted.

> 
> Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making
> userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the
> target. With that the old WARN() could be preserved, as you suggest.

It's just that x86 doesn't need the bitmap, so it'll be a pure waste there
otherwise.  It's not only about the memory that will be wasted (that's
guest mem size / 32k), but also the sync() process for x86 will be all
zeros and totally meaningless - note that the sync() of bitmap will be part
of VM downtime in this case (we need to sync() after turning VM off), so it
will make x86 downtime larger but without any benefit.

> On
> top of that there would no longer be a need to test for memslot creation
> when userspace attempts to enable KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-10 23:49           ` Peter Xu
  2022-10-10 23:49             ` Peter Xu
@ 2022-10-10 23:58             ` Gavin Shan
  2022-10-10 23:58               ` Gavin Shan
  2022-10-10 23:58             ` Oliver Upton
  2 siblings, 1 reply; 48+ messages in thread
From: Gavin Shan @ 2022-10-10 23:58 UTC (permalink / raw)
  To: Peter Xu, Oliver Upton
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

Hi Peter and Oliver,

On 10/11/22 7:49 AM, Peter Xu wrote:
> On Mon, Oct 10, 2022 at 11:18:55PM +0000, Oliver Upton wrote:
>> On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote:
>>
>> [...]
>>
>>>> - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
>>>>    true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL
>>>
>>> What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly
>>> as what you suggested, except..
>>
>> +1
>>

Agreed.

[...]

>>
>>> Then if the userspace wants to use the bitmap altogether with the ring, it
>>> needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it
>>> before it enables KVM_CAP_DIRTY_LOG_RING.
>>>
>>> One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow
>>> !vcpu case we'll need to make sure it won't accidentally try to set bitmap
>>> for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so
>>> set_bit_le() will directly crash the kernel.
>>>
>>> We could keep the old flavor of having a WARN_ON_ONCE(!vcpu &&
>>> !ALLOW_BITMAP) then return, but since now the userspace can easily trigger
>>> this (e.g. on ARM, a malicious userapp can have DIRTY_RING &&
>>> !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host
>>> warning), I think the better approach is we can kill the process in that
>>> case.  Not sure whether there's anything better we can do.
>>
>> I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for
>> arm64 given the fact that we'll dirty memory outside of a vCPU context.
> 
> Yes it's not, but after Gavin's current series it'll be possible, IOW a
> malicious app can leverage this to trigger host warning, which is IMHO not
> wanted.
> 
>>
>> Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making
>> userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the
>> target. With that the old WARN() could be preserved, as you suggest.
> 
> It's just that x86 doesn't need the bitmap, so it'll be a pure waste there
> otherwise.  It's not only about the memory that will be wasted (that's
> guest mem size / 32k), but also the sync() process for x86 will be all
> zeros and totally meaningless - note that the sync() of bitmap will be part
> of VM downtime in this case (we need to sync() after turning VM off), so it
> will make x86 downtime larger but without any benefit.
> 

Besides, old QEMU won't work if ALLOW_BITMAP is required to enable DIRTY_RING,
if I'm correct.

>> On
>> top of that there would no longer be a need to test for memslot creation
>> when userspace attempts to enable KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP.
> 

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-10 23:58             ` Gavin Shan
@ 2022-10-10 23:58               ` Gavin Shan
  0 siblings, 0 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-10 23:58 UTC (permalink / raw)
  To: Peter Xu, Oliver Upton
  Cc: kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, seanjc, shan.gavin

Hi Peter and Oliver,

On 10/11/22 7:49 AM, Peter Xu wrote:
> On Mon, Oct 10, 2022 at 11:18:55PM +0000, Oliver Upton wrote:
>> On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote:
>>
>> [...]
>>
>>>> - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
>>>>    true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL
>>>
>>> What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly
>>> as what you suggested, except..
>>
>> +1
>>

Agreed.

[...]

>>
>>> Then if the userspace wants to use the bitmap altogether with the ring, it
>>> needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it
>>> before it enables KVM_CAP_DIRTY_LOG_RING.
>>>
>>> One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow
>>> !vcpu case we'll need to make sure it won't accidentally try to set bitmap
>>> for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so
>>> set_bit_le() will directly crash the kernel.
>>>
>>> We could keep the old flavor of having a WARN_ON_ONCE(!vcpu &&
>>> !ALLOW_BITMAP) then return, but since now the userspace can easily trigger
>>> this (e.g. on ARM, a malicious userapp can have DIRTY_RING &&
>>> !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host
>>> warning), I think the better approach is we can kill the process in that
>>> case.  Not sure whether there's anything better we can do.
>>
>> I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for
>> arm64 given the fact that we'll dirty memory outside of a vCPU context.
> 
> Yes it's not, but after Gavin's current series it'll be possible, IOW a
> malicious app can leverage this to trigger host warning, which is IMHO not
> wanted.
> 
>>
>> Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making
>> userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the
>> target. With that the old WARN() could be preserved, as you suggest.
> 
> It's just that x86 doesn't need the bitmap, so it'll be a pure waste there
> otherwise.  It's not only about the memory that will be wasted (that's
> guest mem size / 32k), but also the sync() process for x86 will be all
> zeros and totally meaningless - note that the sync() of bitmap will be part
> of VM downtime in this case (we need to sync() after turning VM off), so it
> will make x86 downtime larger but without any benefit.
> 

Besides, old QEMU won't work if ALLOW_BITMAP is required to enable DIRTY_RING,
if I'm correct.

>> On
>> top of that there would no longer be a need to test for memslot creation
>> when userspace attempts to enable KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP.
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-10 23:49           ` Peter Xu
  2022-10-10 23:49             ` Peter Xu
  2022-10-10 23:58             ` Gavin Shan
@ 2022-10-10 23:58             ` Oliver Upton
  2022-10-10 23:58               ` Oliver Upton
  2022-10-11  0:20               ` Peter Xu
  2 siblings, 2 replies; 48+ messages in thread
From: Oliver Upton @ 2022-10-10 23:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

On Mon, Oct 10, 2022 at 07:49:15PM -0400, Peter Xu wrote:
> On Mon, Oct 10, 2022 at 11:18:55PM +0000, Oliver Upton wrote:
> > On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote:
> > 
> > [...]
> > 
> > > > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
> > > >   true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL
> > > 
> > > What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly
> > > as what you suggested, except..
> > 
> > +1
> > 
> > > > 
> > > >   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size)
> > > >   {
> > > >     :
> > > >     mutex_lock(&kvm->lock);
> > > > 
> > > >     if (kvm->created_vcpus) {
> > > >        /* We don't allow to change this value after vcpu created */
> > > >        r = -EINVAL;
> > > >     } else {
> > > >        kvm->dirty_ring_size = size;
> > > 
> > > .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line,
> > > instead..
> > > 
> > > >        kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL);
> > > >        r = 0;
> > > >     }
> > > > 
> > > >     mutex_unlock(&kvm->lock);
> > > >     return r;
> > > >   }
> > > > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP
> > > >   is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled.
> > > > 
> > > >   static long kvm_vm_ioctl_check_extension_generic(...)
> > > >   {
> > > >     :
> > > >     case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> > > >         return kvm->dirty_ring_allow_bitmap ? 1 : 0;
> > > 
> > > ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic():
> > > 
> > >       case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> > >            if (kvm->dirty_ring_size)
> > >                 return -EINVAL;
> > >            kvm->dirty_ring_allow_bitmap = true;
> > >            return 0;
> > > 
> > > A side effect of checking dirty_ring_size is then we'll be sure to have no
> > > vcpu created too.  Maybe we should also check no memslot created to make
> > > sure the bitmaps are not created.
> > 
> > I'm not sure I follow... What prevents userspace from creating a vCPU
> > between enabling the two caps?
> 
> Enabling of dirty ring requires no vcpu created, so as to make sure all the
> vcpus will have the ring structures allocated as long as ring enabled for
> the vm.  Done in kvm_vm_ioctl_enable_dirty_log_ring():
> 
> 	if (kvm->created_vcpus) {
> 		/* We don't allow to change this value after vcpu created */
> 		r = -EINVAL;
> 	} else {
> 		kvm->dirty_ring_size = size;
> 		r = 0;
> 	}
> 
> Then if we have KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP checking
> dirty_ring_size first then we make sure we need to configure both
> ALLOW_BITMAP and DIRTY_RING before any vcpu creation.

Ah, right. Sorry, I had the 'if' condition inverted in my head.

> > 
> > > Then if the userspace wants to use the bitmap altogether with the ring, it
> > > needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it
> > > before it enables KVM_CAP_DIRTY_LOG_RING.
> > > 
> > > One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow
> > > !vcpu case we'll need to make sure it won't accidentally try to set bitmap
> > > for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so
> > > set_bit_le() will directly crash the kernel.
> > > 
> > > We could keep the old flavor of having a WARN_ON_ONCE(!vcpu &&
> > > !ALLOW_BITMAP) then return, but since now the userspace can easily trigger
> > > this (e.g. on ARM, a malicious userapp can have DIRTY_RING &&
> > > !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host
> > > warning), I think the better approach is we can kill the process in that
> > > case.  Not sure whether there's anything better we can do.
> > 
> > I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for
> > arm64 given the fact that we'll dirty memory outside of a vCPU context.
> 
> Yes it's not, but after Gavin's current series it'll be possible, IOW a
> malicious app can leverage this to trigger host warning, which is IMHO not
> wanted.
> 
> > 
> > Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making
> > userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the
> > target. With that the old WARN() could be preserved, as you suggest.
> 
> It's just that x86 doesn't need the bitmap, so it'll be a pure waste there
> otherwise.  It's not only about the memory that will be wasted (that's
> guest mem size / 32k), but also the sync() process for x86 will be all
> zeros and totally meaningless - note that the sync() of bitmap will be part
> of VM downtime in this case (we need to sync() after turning VM off), so it
> will make x86 downtime larger but without any benefit.

Ah, my follow-up [1] missed by just a few minutes :)

I think this further drives the point home -- there's zero need for the
bitmap with dirty ring on x86, so why even support it? The proposal of
ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that
needs to dirty memory outside of a vCPU context can opt-in to the
behavior.

[1]: https://lore.kernel.org/kvm/Y0SuJee3oWL2QCqM@google.com/

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-10 23:58             ` Oliver Upton
@ 2022-10-10 23:58               ` Oliver Upton
  2022-10-11  0:20               ` Peter Xu
  1 sibling, 0 replies; 48+ messages in thread
From: Oliver Upton @ 2022-10-10 23:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, seanjc, shan.gavin

On Mon, Oct 10, 2022 at 07:49:15PM -0400, Peter Xu wrote:
> On Mon, Oct 10, 2022 at 11:18:55PM +0000, Oliver Upton wrote:
> > On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote:
> > 
> > [...]
> > 
> > > > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to
> > > >   true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL
> > > 
> > > What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly
> > > as what you suggested, except..
> > 
> > +1
> > 
> > > > 
> > > >   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size)
> > > >   {
> > > >     :
> > > >     mutex_lock(&kvm->lock);
> > > > 
> > > >     if (kvm->created_vcpus) {
> > > >        /* We don't allow to change this value after vcpu created */
> > > >        r = -EINVAL;
> > > >     } else {
> > > >        kvm->dirty_ring_size = size;
> > > 
> > > .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line,
> > > instead..
> > > 
> > > >        kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL);
> > > >        r = 0;
> > > >     }
> > > > 
> > > >     mutex_unlock(&kvm->lock);
> > > >     return r;
> > > >   }
> > > > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP
> > > >   is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled.
> > > > 
> > > >   static long kvm_vm_ioctl_check_extension_generic(...)
> > > >   {
> > > >     :
> > > >     case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> > > >         return kvm->dirty_ring_allow_bitmap ? 1 : 0;
> > > 
> > > ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic():
> > > 
> > >       case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP:
> > >            if (kvm->dirty_ring_size)
> > >                 return -EINVAL;
> > >            kvm->dirty_ring_allow_bitmap = true;
> > >            return 0;
> > > 
> > > A side effect of checking dirty_ring_size is then we'll be sure to have no
> > > vcpu created too.  Maybe we should also check no memslot created to make
> > > sure the bitmaps are not created.
> > 
> > I'm not sure I follow... What prevents userspace from creating a vCPU
> > between enabling the two caps?
> 
> Enabling of dirty ring requires no vcpu created, so as to make sure all the
> vcpus will have the ring structures allocated as long as ring enabled for
> the vm.  Done in kvm_vm_ioctl_enable_dirty_log_ring():
> 
> 	if (kvm->created_vcpus) {
> 		/* We don't allow to change this value after vcpu created */
> 		r = -EINVAL;
> 	} else {
> 		kvm->dirty_ring_size = size;
> 		r = 0;
> 	}
> 
> Then if we have KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP checking
> dirty_ring_size first then we make sure we need to configure both
> ALLOW_BITMAP and DIRTY_RING before any vcpu creation.

Ah, right. Sorry, I had the 'if' condition inverted in my head.

> > 
> > > Then if the userspace wants to use the bitmap altogether with the ring, it
> > > needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it
> > > before it enables KVM_CAP_DIRTY_LOG_RING.
> > > 
> > > One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow
> > > !vcpu case we'll need to make sure it won't accidentally try to set bitmap
> > > for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so
> > > set_bit_le() will directly crash the kernel.
> > > 
> > > We could keep the old flavor of having a WARN_ON_ONCE(!vcpu &&
> > > !ALLOW_BITMAP) then return, but since now the userspace can easily trigger
> > > this (e.g. on ARM, a malicious userapp can have DIRTY_RING &&
> > > !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host
> > > warning), I think the better approach is we can kill the process in that
> > > case.  Not sure whether there's anything better we can do.
> > 
> > I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for
> > arm64 given the fact that we'll dirty memory outside of a vCPU context.
> 
> Yes it's not, but after Gavin's current series it'll be possible, IOW a
> malicious app can leverage this to trigger host warning, which is IMHO not
> wanted.
> 
> > 
> > Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making
> > userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the
> > target. With that the old WARN() could be preserved, as you suggest.
> 
> It's just that x86 doesn't need the bitmap, so it'll be a pure waste there
> otherwise.  It's not only about the memory that will be wasted (that's
> guest mem size / 32k), but also the sync() process for x86 will be all
> zeros and totally meaningless - note that the sync() of bitmap will be part
> of VM downtime in this case (we need to sync() after turning VM off), so it
> will make x86 downtime larger but without any benefit.

Ah, my follow-up [1] missed by just a few minutes :)

I think this further drives the point home -- there's zero need for the
bitmap with dirty ring on x86, so why even support it? The proposal of
ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that
needs to dirty memory outside of a vCPU context can opt-in to the
behavior.

[1]: https://lore.kernel.org/kvm/Y0SuJee3oWL2QCqM@google.com/

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-10 23:58             ` Oliver Upton
  2022-10-10 23:58               ` Oliver Upton
@ 2022-10-11  0:20               ` Peter Xu
  2022-10-11  0:20                 ` Peter Xu
  2022-10-11  1:12                 ` Oliver Upton
  1 sibling, 2 replies; 48+ messages in thread
From: Peter Xu @ 2022-10-11  0:20 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote:
> I think this further drives the point home -- there's zero need for the
> bitmap with dirty ring on x86, so why even support it? The proposal of
> ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that
> needs to dirty memory outside of a vCPU context can opt-in to the
> behavior.

Yeah that sounds working too, but it'll be slightly hackish as then the
user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap.
With the new cap the user app can implement the whole ring with generic
code.

Also more flexible to expose it as generic cap? E.g., one day x86 can
enable this too for whatever reason (even though I don't think so..).

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-11  0:20               ` Peter Xu
@ 2022-10-11  0:20                 ` Peter Xu
  2022-10-11  1:12                 ` Oliver Upton
  1 sibling, 0 replies; 48+ messages in thread
From: Peter Xu @ 2022-10-11  0:20 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, seanjc, shan.gavin

On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote:
> I think this further drives the point home -- there's zero need for the
> bitmap with dirty ring on x86, so why even support it? The proposal of
> ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that
> needs to dirty memory outside of a vCPU context can opt-in to the
> behavior.

Yeah that sounds working too, but it'll be slightly hackish as then the
user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap.
With the new cap the user app can implement the whole ring with generic
code.

Also more flexible to expose it as generic cap? E.g., one day x86 can
enable this too for whatever reason (even though I don't think so..).

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-11  0:20               ` Peter Xu
  2022-10-11  0:20                 ` Peter Xu
@ 2022-10-11  1:12                 ` Oliver Upton
  2022-10-11  1:12                   ` Oliver Upton
                                     ` (2 more replies)
  1 sibling, 3 replies; 48+ messages in thread
From: Oliver Upton @ 2022-10-11  1:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

[-- Attachment #1: Type: text/plain, Size: 1579 bytes --]

On Mon, Oct 10, 2022 at 08:20:29PM -0400, Peter Xu wrote:
> On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote:
> > I think this further drives the point home -- there's zero need for the
> > bitmap with dirty ring on x86, so why even support it? The proposal of
> > ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that
> > needs to dirty memory outside of a vCPU context can opt-in to the
> > behavior.
> 
> Yeah that sounds working too, but it'll be slightly hackish as then the
> user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap.
> With the new cap the user app can implement the whole ring with generic
> code.

Isn't the current route of exposing ALLOW_BITMAP on other arches for no
reason headed in exactly that direction? Userspace would need to know if
it _really_ needs the dirty bitmap in addition to the dirty ring, which
could take the form of architecture ifdeffery.

OTOH, if the cap is only exposed when it is absolutely necessary, an
arch-generic live migration implementation could enable the cap whenever
it is advertized and scan the bitmap accordingly.

The VMM must know something about the architecture it is running on, as
it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all...

> Also more flexible to expose it as generic cap? E.g., one day x86 can
> enable this too for whatever reason (even though I don't think so..).

I had imagined something like this patch where the arch opts-in to some
generic construct if it *requires* the use of both the ring and bitmap
(very rough sketch).

--
Thanks,
Oliver

[-- Attachment #2: 0001-KVM-Add-support-for-using-dirty-ring-in-conjuction-w.patch --]
[-- Type: text/x-diff, Size: 4502 bytes --]

From ec2fd1ec35b040ac5446f10b895b72d45e0d2845 Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton@linux.dev>
Date: Tue, 11 Oct 2022 00:57:00 +0000
Subject: [PATCH] KVM: Add support for using dirty ring in conjuction with
 bitmap

Some architectures (such as arm64) need to dirty memory outside of the
context of a vCPU. Of course, this simply doesn't fit with the UAPI of
KVM's per-vCPU dirty ring.

Introduce a new flavor of dirty ring that requires the use of both vCPU
dirty rings and a dirty bitmap. The expectation is that for non-vCPU
sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
the dirty bitmap. Userspace should scan the dirty bitmap before
migrating the VM to the target.

Use an additional capability to advertize this behavior and require
explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
not allow userspace to enable dirty ring if it hasn't also enabled the
ring && bitmap capability, as a VM is likely DOA without the pages
marked in the bitmap.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 include/linux/kvm_host.h |  1 +
 include/uapi/linux/kvm.h |  1 +
 virt/kvm/Kconfig         |  8 ++++++++
 virt/kvm/kvm_main.c      | 16 ++++++++++++++--
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4519d3689e1..76a6b1687312 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -779,6 +779,7 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
+	bool dirty_ring_with_bitmap;
 	bool vm_bugged;
 	bool vm_dead;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..c87b5882d7ae 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 800f9470e36b..e3b45a5489b6 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
        bool
        select HAVE_KVM_DIRTY_RING
 
+# Only architectures that need to dirty memory outside of a vCPU
+# context should select this, advertising to userspace the
+# requirement to use a dirty bitmap in addition to the vCPU dirty
+# ring.
+config HAVE_KVM_DIRTY_RING_WITH_BITMAP
+       bool
+       depends on HAVE_KVM_DIRTY_RING
+
 config HAVE_KVM_EVENTFD
        bool
        select EVENTFD
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5b064dbadaf4..5a99ab973706 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3304,7 +3304,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 {
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
-#ifdef CONFIG_HAVE_KVM_DIRTY_RING
+#if defined(CONFIG_HAVE_KVM_DIRTY_RING) && !defined(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP)
 	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
 		return;
 #endif
@@ -3313,7 +3313,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
-		if (kvm->dirty_ring_size)
+		if (vcpu && kvm->dirty_ring_size)
 			kvm_dirty_ring_push(&vcpu->dirty_ring,
 					    slot, rel_gfn);
 		else
@@ -4488,6 +4488,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #endif
 	case KVM_CAP_BINARY_STATS_FD:
 	case KVM_CAP_SYSTEM_EVENT_DATA:
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING_ENABLE_BITMAP
+	case KVM_CAP_DIRTY_LOG_RING_ENABLE_BITMAP:
+#endif
 		return 1;
 	default:
 		break;
@@ -4499,6 +4502,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
 {
 	int r;
 
+#ifdef CONFIG_HAVE_DIRTY_RING_WITH_BITMAP
+	if (!kvm->dirty_ring_with_bitmap)
+		return -EINVAL;
+#endif
+
 	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
 		return -EINVAL;
 
@@ -4588,6 +4596,10 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 	case KVM_CAP_DIRTY_LOG_RING:
 	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
+		kvm->dirty_ring_with_bitmap = cap->args[0];
+
+		return 0;
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}
-- 
2.38.0.rc1.362.ged0d419d3c-goog


[-- Attachment #3: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-11  1:12                 ` Oliver Upton
@ 2022-10-11  1:12                   ` Oliver Upton
  2022-10-11  3:56                   ` Gavin Shan
  2022-10-14 16:55                   ` Peter Xu
  2 siblings, 0 replies; 48+ messages in thread
From: Oliver Upton @ 2022-10-11  1:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, seanjc, shan.gavin

[-- Attachment #1: Type: text/plain, Size: 1579 bytes --]

On Mon, Oct 10, 2022 at 08:20:29PM -0400, Peter Xu wrote:
> On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote:
> > I think this further drives the point home -- there's zero need for the
> > bitmap with dirty ring on x86, so why even support it? The proposal of
> > ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that
> > needs to dirty memory outside of a vCPU context can opt-in to the
> > behavior.
> 
> Yeah that sounds working too, but it'll be slightly hackish as then the
> user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap.
> With the new cap the user app can implement the whole ring with generic
> code.

Isn't the current route of exposing ALLOW_BITMAP on other arches for no
reason headed in exactly that direction? Userspace would need to know if
it _really_ needs the dirty bitmap in addition to the dirty ring, which
could take the form of architecture ifdeffery.

OTOH, if the cap is only exposed when it is absolutely necessary, an
arch-generic live migration implementation could enable the cap whenever
it is advertized and scan the bitmap accordingly.

The VMM must know something about the architecture it is running on, as
it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all...

> Also more flexible to expose it as generic cap? E.g., one day x86 can
> enable this too for whatever reason (even though I don't think so..).

I had imagined something like this patch where the arch opts-in to some
generic construct if it *requires* the use of both the ring and bitmap
(very rough sketch).

--
Thanks,
Oliver

[-- Attachment #2: 0001-KVM-Add-support-for-using-dirty-ring-in-conjuction-w.patch --]
[-- Type: text/x-diff, Size: 4502 bytes --]

From ec2fd1ec35b040ac5446f10b895b72d45e0d2845 Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton@linux.dev>
Date: Tue, 11 Oct 2022 00:57:00 +0000
Subject: [PATCH] KVM: Add support for using dirty ring in conjuction with
 bitmap

Some architectures (such as arm64) need to dirty memory outside of the
context of a vCPU. Of course, this simply doesn't fit with the UAPI of
KVM's per-vCPU dirty ring.

Introduce a new flavor of dirty ring that requires the use of both vCPU
dirty rings and a dirty bitmap. The expectation is that for non-vCPU
sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
the dirty bitmap. Userspace should scan the dirty bitmap before
migrating the VM to the target.

Use an additional capability to advertize this behavior and require
explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
not allow userspace to enable dirty ring if it hasn't also enabled the
ring && bitmap capability, as a VM is likely DOA without the pages
marked in the bitmap.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 include/linux/kvm_host.h |  1 +
 include/uapi/linux/kvm.h |  1 +
 virt/kvm/Kconfig         |  8 ++++++++
 virt/kvm/kvm_main.c      | 16 ++++++++++++++--
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4519d3689e1..76a6b1687312 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -779,6 +779,7 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
+	bool dirty_ring_with_bitmap;
 	bool vm_bugged;
 	bool vm_dead;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..c87b5882d7ae 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 800f9470e36b..e3b45a5489b6 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
        bool
        select HAVE_KVM_DIRTY_RING
 
+# Only architectures that need to dirty memory outside of a vCPU
+# context should select this, advertising to userspace the
+# requirement to use a dirty bitmap in addition to the vCPU dirty
+# ring.
+config HAVE_KVM_DIRTY_RING_WITH_BITMAP
+       bool
+       depends on HAVE_KVM_DIRTY_RING
+
 config HAVE_KVM_EVENTFD
        bool
        select EVENTFD
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5b064dbadaf4..5a99ab973706 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3304,7 +3304,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 {
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
-#ifdef CONFIG_HAVE_KVM_DIRTY_RING
+#if defined(CONFIG_HAVE_KVM_DIRTY_RING) && !defined(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP)
 	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
 		return;
 #endif
@@ -3313,7 +3313,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
-		if (kvm->dirty_ring_size)
+		if (vcpu && kvm->dirty_ring_size)
 			kvm_dirty_ring_push(&vcpu->dirty_ring,
 					    slot, rel_gfn);
 		else
@@ -4488,6 +4488,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #endif
 	case KVM_CAP_BINARY_STATS_FD:
 	case KVM_CAP_SYSTEM_EVENT_DATA:
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING_ENABLE_BITMAP
+	case KVM_CAP_DIRTY_LOG_RING_ENABLE_BITMAP:
+#endif
 		return 1;
 	default:
 		break;
@@ -4499,6 +4502,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
 {
 	int r;
 
+#ifdef CONFIG_HAVE_DIRTY_RING_WITH_BITMAP
+	if (!kvm->dirty_ring_with_bitmap)
+		return -EINVAL;
+#endif
+
 	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
 		return -EINVAL;
 
@@ -4588,6 +4596,10 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 	case KVM_CAP_DIRTY_LOG_RING:
 	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
+		kvm->dirty_ring_with_bitmap = cap->args[0];
+
+		return 0;
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-11  1:12                 ` Oliver Upton
  2022-10-11  1:12                   ` Oliver Upton
@ 2022-10-11  3:56                   ` Gavin Shan
  2022-10-11  3:56                     ` Gavin Shan
  2022-10-11  6:31                     ` Gavin Shan
  2022-10-14 16:55                   ` Peter Xu
  2 siblings, 2 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-11  3:56 UTC (permalink / raw)
  To: Oliver Upton, Peter Xu
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

On 10/11/22 9:12 AM, Oliver Upton wrote:
> On Mon, Oct 10, 2022 at 08:20:29PM -0400, Peter Xu wrote:
>> On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote:
>>> I think this further drives the point home -- there's zero need for the
>>> bitmap with dirty ring on x86, so why even support it? The proposal of
>>> ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that
>>> needs to dirty memory outside of a vCPU context can opt-in to the
>>> behavior.
>>
>> Yeah that sounds working too, but it'll be slightly hackish as then the
>> user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap.
>> With the new cap the user app can implement the whole ring with generic
>> code.
> 
> Isn't the current route of exposing ALLOW_BITMAP on other arches for no
> reason headed in exactly that direction? Userspace would need to know if
> it _really_ needs the dirty bitmap in addition to the dirty ring, which
> could take the form of architecture ifdeffery.
> 
> OTOH, if the cap is only exposed when it is absolutely necessary, an
> arch-generic live migration implementation could enable the cap whenever
> it is advertized and scan the bitmap accordingly.
> 
> The VMM must know something about the architecture it is running on, as
> it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all...
> 

It looks good to me by using CONFIG_HAVE_KVM_DIRTY_RING_USE_BITMAP to
opt-in KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The most important point is
to ensure 'kvm->dirty_ring_with_bitmap == true' when dirty ring capability
is enabled. In this way, we can fail early when KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
isn't enabled on attempt to enable dirty ring capability.

If both of you agree, I will integrate the suggested code changes in
next respin, with necessary tweak.

- In kvm_vm_ioctl_enable_cap_generic(), 'kvm->dirty_ring_with_bitmap' is
   updated to 'true' unconditionally.

   static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
                                              struct kvm_enable_cap *cap)
   {
       :
       case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
            kvm->dirty_ring_with_bitmap = true;
            return 0;
   }

- In mark_page_dirty_in_slot(), we need comprehensive checks like below.

   void mark_page_dirty_in_slot(struct kvm *kvm,
                                const struct kvm_memory_slot *memslot,
                                gfn_t gfn)
   {
#ifdef CONFIG_HAVE_KVM_DIRTY_RING
       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
           return;

#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
       if (WARN_ON_ONCE(!vcpu))
           return;
#endif
#endif

   }

- Use kvm_dirty_ring_exclusive(), which was suggested by Peter before.
   The function is used in various spots to allow the dirty bitmap is
   created and accessed.

   bool kvm_dirty_ring_exclusive(struct kvm *kvm)
   {
       return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
   }


>> Also more flexible to expose it as generic cap? E.g., one day x86 can
>> enable this too for whatever reason (even though I don't think so..).
> 
> I had imagined something like this patch where the arch opts-in to some
> generic construct if it *requires* the use of both the ring and bitmap
> (very rough sketch).
> 

Thanks,
Gavin



_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-11  3:56                   ` Gavin Shan
@ 2022-10-11  3:56                     ` Gavin Shan
  2022-10-11  6:31                     ` Gavin Shan
  1 sibling, 0 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-11  3:56 UTC (permalink / raw)
  To: Oliver Upton, Peter Xu
  Cc: kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, seanjc, shan.gavin

On 10/11/22 9:12 AM, Oliver Upton wrote:
> On Mon, Oct 10, 2022 at 08:20:29PM -0400, Peter Xu wrote:
>> On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote:
>>> I think this further drives the point home -- there's zero need for the
>>> bitmap with dirty ring on x86, so why even support it? The proposal of
>>> ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that
>>> needs to dirty memory outside of a vCPU context can opt-in to the
>>> behavior.
>>
>> Yeah that sounds working too, but it'll be slightly hackish as then the
>> user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap.
>> With the new cap the user app can implement the whole ring with generic
>> code.
> 
> Isn't the current route of exposing ALLOW_BITMAP on other arches for no
> reason headed in exactly that direction? Userspace would need to know if
> it _really_ needs the dirty bitmap in addition to the dirty ring, which
> could take the form of architecture ifdeffery.
> 
> OTOH, if the cap is only exposed when it is absolutely necessary, an
> arch-generic live migration implementation could enable the cap whenever
> it is advertized and scan the bitmap accordingly.
> 
> The VMM must know something about the architecture it is running on, as
> it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all...
> 

It looks good to me by using CONFIG_HAVE_KVM_DIRTY_RING_USE_BITMAP to
opt-in KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The most important point is
to ensure 'kvm->dirty_ring_with_bitmap == true' when dirty ring capability
is enabled. In this way, we can fail early when KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
isn't enabled on attempt to enable dirty ring capability.

If both of you agree, I will integrate the suggested code changes in
next respin, with necessary tweak.

- In kvm_vm_ioctl_enable_cap_generic(), 'kvm->dirty_ring_with_bitmap' is
   updated to 'true' unconditionally.

   static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
                                              struct kvm_enable_cap *cap)
   {
       :
       case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
            kvm->dirty_ring_with_bitmap = true;
            return 0;
   }

- In mark_page_dirty_in_slot(), we need comprehensive checks like below.

   void mark_page_dirty_in_slot(struct kvm *kvm,
                                const struct kvm_memory_slot *memslot,
                                gfn_t gfn)
   {
#ifdef CONFIG_HAVE_KVM_DIRTY_RING
       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
           return;

#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
       if (WARN_ON_ONCE(!vcpu))
           return;
#endif
#endif

   }

- Use kvm_dirty_ring_exclusive(), which was suggested by Peter before.
   The function is used in various spots to allow the dirty bitmap is
   created and accessed.

   bool kvm_dirty_ring_exclusive(struct kvm *kvm)
   {
       return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
   }


>> Also more flexible to expose it as generic cap? E.g., one day x86 can
>> enable this too for whatever reason (even though I don't think so..).
> 
> I had imagined something like this patch where the arch opts-in to some
> generic construct if it *requires* the use of both the ring and bitmap
> (very rough sketch).
> 

Thanks,
Gavin




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-11  3:56                   ` Gavin Shan
  2022-10-11  3:56                     ` Gavin Shan
@ 2022-10-11  6:31                     ` Gavin Shan
  2022-10-11  6:31                       ` Gavin Shan
  1 sibling, 1 reply; 48+ messages in thread
From: Gavin Shan @ 2022-10-11  6:31 UTC (permalink / raw)
  To: Oliver Upton, Peter Xu
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

Hi Peter/Oliver,

On 10/11/22 11:56 AM, Gavin Shan wrote:
> On 10/11/22 9:12 AM, Oliver Upton wrote:
>> On Mon, Oct 10, 2022 at 08:20:29PM -0400, Peter Xu wrote:
>>> On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote:
>>>> I think this further drives the point home -- there's zero need for the
>>>> bitmap with dirty ring on x86, so why even support it? The proposal of
>>>> ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that
>>>> needs to dirty memory outside of a vCPU context can opt-in to the
>>>> behavior.
>>>
>>> Yeah that sounds working too, but it'll be slightly hackish as then the
>>> user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap.
>>> With the new cap the user app can implement the whole ring with generic
>>> code.
>>
>> Isn't the current route of exposing ALLOW_BITMAP on other arches for no
>> reason headed in exactly that direction? Userspace would need to know if
>> it _really_ needs the dirty bitmap in addition to the dirty ring, which
>> could take the form of architecture ifdeffery.
>>
>> OTOH, if the cap is only exposed when it is absolutely necessary, an
>> arch-generic live migration implementation could enable the cap whenever
>> it is advertized and scan the bitmap accordingly.
>>
>> The VMM must know something about the architecture it is running on, as
>> it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all...
>>
> 
> It looks good to me by using CONFIG_HAVE_KVM_DIRTY_RING_USE_BITMAP to
> opt-in KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The most important point is
> to ensure 'kvm->dirty_ring_with_bitmap == true' when dirty ring capability
> is enabled. In this way, we can fail early when KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> isn't enabled on attempt to enable dirty ring capability.
> 
> If both of you agree, I will integrate the suggested code changes in
> next respin, with necessary tweak.
> 
> - In kvm_vm_ioctl_enable_cap_generic(), 'kvm->dirty_ring_with_bitmap' is
>    updated to 'true' unconditionally.
> 
>    static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>                                               struct kvm_enable_cap *cap)
>    {
>        :
>        case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>             kvm->dirty_ring_with_bitmap = true;
>             return 0;
>    }
> 
> - In mark_page_dirty_in_slot(), we need comprehensive checks like below.
> 
>    void mark_page_dirty_in_slot(struct kvm *kvm,
>                                 const struct kvm_memory_slot *memslot,
>                                 gfn_t gfn)
>    {
> #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>        if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>            return;
> 
> #ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>        if (WARN_ON_ONCE(!vcpu))
>            return;
> #endif
> #endif
> 
>    }
> 
> - Use kvm_dirty_ring_exclusive(), which was suggested by Peter before.
>    The function is used in various spots to allow the dirty bitmap is
>    created and accessed.
> 
>    bool kvm_dirty_ring_exclusive(struct kvm *kvm)
>    {
>        return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
>    }
> 
> 

I've included Oliver's suggested changes into v6, which was just posted:

https://lore.kernel.org/kvmarm/3123a04f-a674-782b-9e9b-0baf3db49ebc@redhat.com/

Please find your time to review v6 directly, thanks!

>>> Also more flexible to expose it as generic cap? E.g., one day x86 can
>>> enable this too for whatever reason (even though I don't think so..).
>>
>> I had imagined something like this patch where the arch opts-in to some
>> generic construct if it *requires* the use of both the ring and bitmap
>> (very rough sketch).
>>

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-11  6:31                     ` Gavin Shan
@ 2022-10-11  6:31                       ` Gavin Shan
  0 siblings, 0 replies; 48+ messages in thread
From: Gavin Shan @ 2022-10-11  6:31 UTC (permalink / raw)
  To: Oliver Upton, Peter Xu
  Cc: kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, seanjc, shan.gavin

Hi Peter/Oliver,

On 10/11/22 11:56 AM, Gavin Shan wrote:
> On 10/11/22 9:12 AM, Oliver Upton wrote:
>> On Mon, Oct 10, 2022 at 08:20:29PM -0400, Peter Xu wrote:
>>> On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote:
>>>> I think this further drives the point home -- there's zero need for the
>>>> bitmap with dirty ring on x86, so why even support it? The proposal of
>>>> ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that
>>>> needs to dirty memory outside of a vCPU context can opt-in to the
>>>> behavior.
>>>
>>> Yeah that sounds working too, but it'll be slightly hackish as then the
>>> user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap.
>>> With the new cap the user app can implement the whole ring with generic
>>> code.
>>
>> Isn't the current route of exposing ALLOW_BITMAP on other arches for no
>> reason headed in exactly that direction? Userspace would need to know if
>> it _really_ needs the dirty bitmap in addition to the dirty ring, which
>> could take the form of architecture ifdeffery.
>>
>> OTOH, if the cap is only exposed when it is absolutely necessary, an
>> arch-generic live migration implementation could enable the cap whenever
>> it is advertized and scan the bitmap accordingly.
>>
>> The VMM must know something about the architecture it is running on, as
>> it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all...
>>
> 
> It looks good to me by using CONFIG_HAVE_KVM_DIRTY_RING_USE_BITMAP to
> opt-in KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The most important point is
> to ensure 'kvm->dirty_ring_with_bitmap == true' when dirty ring capability
> is enabled. In this way, we can fail early when KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> isn't enabled on attempt to enable dirty ring capability.
> 
> If both of you agree, I will integrate the suggested code changes in
> next respin, with necessary tweak.
> 
> - In kvm_vm_ioctl_enable_cap_generic(), 'kvm->dirty_ring_with_bitmap' is
>    updated to 'true' unconditionally.
> 
>    static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>                                               struct kvm_enable_cap *cap)
>    {
>        :
>        case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>             kvm->dirty_ring_with_bitmap = true;
>             return 0;
>    }
> 
> - In mark_page_dirty_in_slot(), we need comprehensive checks like below.
> 
>    void mark_page_dirty_in_slot(struct kvm *kvm,
>                                 const struct kvm_memory_slot *memslot,
>                                 gfn_t gfn)
>    {
> #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>        if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>            return;
> 
> #ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>        if (WARN_ON_ONCE(!vcpu))
>            return;
> #endif
> #endif
> 
>    }
> 
> - Use kvm_dirty_ring_exclusive(), which was suggested by Peter before.
>    The function is used in various spots to allow the dirty bitmap is
>    created and accessed.
> 
>    bool kvm_dirty_ring_exclusive(struct kvm *kvm)
>    {
>        return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap;
>    }
> 
> 

I've included Oliver's suggested changes into v6, which was just posted:

https://lore.kernel.org/kvmarm/3123a04f-a674-782b-9e9b-0baf3db49ebc@redhat.com/

Please find your time to review v6 directly, thanks!

>>> Also more flexible to expose it as generic cap? E.g., one day x86 can
>>> enable this too for whatever reason (even though I don't think so..).
>>
>> I had imagined something like this patch where the arch opts-in to some
>> generic construct if it *requires* the use of both the ring and bitmap
>> (very rough sketch).
>>

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-11  1:12                 ` Oliver Upton
  2022-10-11  1:12                   ` Oliver Upton
  2022-10-11  3:56                   ` Gavin Shan
@ 2022-10-14 16:55                   ` Peter Xu
  2022-10-14 16:55                     ` Peter Xu
  2022-10-18  7:38                     ` Oliver Upton
  2 siblings, 2 replies; 48+ messages in thread
From: Peter Xu @ 2022-10-14 16:55 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

On Tue, Oct 11, 2022 at 01:12:43AM +0000, Oliver Upton wrote:
> The VMM must know something about the architecture it is running on, as
> it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all...

IIUC this is still a kernel impl detail to flush data into guest pages
within this ioctl, or am I wrong?

For example, I'm assuming it's safe to change KVM_DEV_ARM_ITS_SAVE_TABLES
impl one day to not flush data to guest memories, then the kernel should
also disable the ALLOW_BITMAP cap in the same patch, so that any old qemu
binary that supports arm64 dirty ring will naturally skip all the bitmap
ops and becoming the same as what it does with x86 when running on that new
kernel.  With implicit approach suggested, we need to modify QEMU.

Changing impl of KVM_DEV_ARM_ITS_SAVE_TABLES is probably not a good
example.. but just want to show what I meant.  Fundamentally it sounds
cleaner if it's the kernel that tells the user "okay you collected the
ring, but that's not enough; you need to collect the bitmap too", rather
than assuming the user app will always know what kvm did in details.  No
strong opinion though, as I could also have misunderstood how arm works.

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-14 16:55                   ` Peter Xu
@ 2022-10-14 16:55                     ` Peter Xu
  2022-10-18  7:38                     ` Oliver Upton
  1 sibling, 0 replies; 48+ messages in thread
From: Peter Xu @ 2022-10-14 16:55 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, seanjc, shan.gavin

On Tue, Oct 11, 2022 at 01:12:43AM +0000, Oliver Upton wrote:
> The VMM must know something about the architecture it is running on, as
> it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all...

IIUC this is still a kernel impl detail to flush data into guest pages
within this ioctl, or am I wrong?

For example, I'm assuming it's safe to change KVM_DEV_ARM_ITS_SAVE_TABLES
impl one day to not flush data to guest memories, then the kernel should
also disable the ALLOW_BITMAP cap in the same patch, so that any old qemu
binary that supports arm64 dirty ring will naturally skip all the bitmap
ops and becoming the same as what it does with x86 when running on that new
kernel.  With implicit approach suggested, we need to modify QEMU.

Changing impl of KVM_DEV_ARM_ITS_SAVE_TABLES is probably not a good
example.. but just want to show what I meant.  Fundamentally it sounds
cleaner if it's the kernel that tells the user "okay you collected the
ring, but that's not enough; you need to collect the bitmap too", rather
than assuming the user app will always know what kvm did in details.  No
strong opinion though, as I could also have misunderstood how arm works.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-14 16:55                   ` Peter Xu
  2022-10-14 16:55                     ` Peter Xu
@ 2022-10-18  7:38                     ` Oliver Upton
  2022-10-18  7:38                       ` Oliver Upton
                                         ` (2 more replies)
  1 sibling, 3 replies; 48+ messages in thread
From: Oliver Upton @ 2022-10-18  7:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

On Fri, Oct 14, 2022 at 12:55:35PM -0400, Peter Xu wrote:
> On Tue, Oct 11, 2022 at 01:12:43AM +0000, Oliver Upton wrote:
> > The VMM must know something about the architecture it is running on, as
> > it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all...
> 
> IIUC this is still a kernel impl detail to flush data into guest pages
> within this ioctl, or am I wrong?

Somewhat...

The guest is assigning memory from the IPA space to back the ITS tables,
but KVM maintains its own internal representation. It just so happens
that we've conditioned userspace to be aware that ITS emulation is
incoherent w.r.t. the guest memory that backs the tables.

> For example, I'm assuming it's safe to change KVM_DEV_ARM_ITS_SAVE_TABLES
> impl one day to not flush data to guest memories, then the kernel should
> also disable the ALLOW_BITMAP cap in the same patch, so that any old qemu
> binary that supports arm64 dirty ring will naturally skip all the bitmap
> ops and becoming the same as what it does with x86 when running on that new
> kernel.  With implicit approach suggested, we need to modify QEMU.
> 
> Changing impl of KVM_DEV_ARM_ITS_SAVE_TABLES is probably not a good
> example.. but just want to show what I meant.  Fundamentally it sounds
> cleaner if it's the kernel that tells the user "okay you collected the
> ring, but that's not enough; you need to collect the bitmap too", rather
> than assuming the user app will always know what kvm did in details.  No
> strong opinion though, as I could also have misunderstood how arm works.

I think the SAVE_TABLES ioctl is likely here to stay given the odd quirk
that it really is guest memory, so we'll probably need the bitmap on
arm64 for a long time. Even if we were to kill it, userspace would need
to take a change anyway to switch to a new ITS migration mechanism.

If we ever get to the point that we can relax this restriction i think a
flag on the BITMAP_WITH_TABLE cap that says "I don't actually set any
bits in the bitmap" would do. We shouldn't hide the cap entirely, as
that would be ABI breakage for VMMs that expect bitmap+ring.

Thoughts?

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-18  7:38                     ` Oliver Upton
@ 2022-10-18  7:38                       ` Oliver Upton
  2022-10-18  7:40                       ` Oliver Upton
  2022-10-18 15:50                       ` Peter Xu
  2 siblings, 0 replies; 48+ messages in thread
From: Oliver Upton @ 2022-10-18  7:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, seanjc, shan.gavin

On Fri, Oct 14, 2022 at 12:55:35PM -0400, Peter Xu wrote:
> On Tue, Oct 11, 2022 at 01:12:43AM +0000, Oliver Upton wrote:
> > The VMM must know something about the architecture it is running on, as
> > it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all...
> 
> IIUC this is still a kernel impl detail to flush data into guest pages
> within this ioctl, or am I wrong?

Somewhat...

The guest is assigning memory from the IPA space to back the ITS tables,
but KVM maintains its own internal representation. It just so happens
that we've conditioned userspace to be aware that ITS emulation is
incoherent w.r.t. the guest memory that backs the tables.

> For example, I'm assuming it's safe to change KVM_DEV_ARM_ITS_SAVE_TABLES
> impl one day to not flush data to guest memories, then the kernel should
> also disable the ALLOW_BITMAP cap in the same patch, so that any old qemu
> binary that supports arm64 dirty ring will naturally skip all the bitmap
> ops and becoming the same as what it does with x86 when running on that new
> kernel.  With implicit approach suggested, we need to modify QEMU.
> 
> Changing impl of KVM_DEV_ARM_ITS_SAVE_TABLES is probably not a good
> example.. but just want to show what I meant.  Fundamentally it sounds
> cleaner if it's the kernel that tells the user "okay you collected the
> ring, but that's not enough; you need to collect the bitmap too", rather
> than assuming the user app will always know what kvm did in details.  No
> strong opinion though, as I could also have misunderstood how arm works.

I think the SAVE_TABLES ioctl is likely here to stay given the odd quirk
that it really is guest memory, so we'll probably need the bitmap on
arm64 for a long time. Even if we were to kill it, userspace would need
to take a change anyway to switch to a new ITS migration mechanism.

If we ever get to the point that we can relax this restriction i think a
flag on the BITMAP_WITH_TABLE cap that says "I don't actually set any
bits in the bitmap" would do. We shouldn't hide the cap entirely, as
that would be ABI breakage for VMMs that expect bitmap+ring.

Thoughts?

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-18  7:38                     ` Oliver Upton
  2022-10-18  7:38                       ` Oliver Upton
@ 2022-10-18  7:40                       ` Oliver Upton
  2022-10-18  7:40                         ` Oliver Upton
  2022-10-18 15:50                       ` Peter Xu
  2 siblings, 1 reply; 48+ messages in thread
From: Oliver Upton @ 2022-10-18  7:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: shuah, kvm, catalin.marinas, andrew.jones, kvmarm, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, will, kvmarm

On Tue, Oct 18, 2022 at 10:38:10AM +0300, Oliver Upton wrote:
> On Fri, Oct 14, 2022 at 12:55:35PM -0400, Peter Xu wrote:
> > On Tue, Oct 11, 2022 at 01:12:43AM +0000, Oliver Upton wrote:
> > > The VMM must know something about the architecture it is running on, as
> > > it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all...
> > 
> > IIUC this is still a kernel impl detail to flush data into guest pages
> > within this ioctl, or am I wrong?
> 
> Somewhat...
> 
> The guest is assigning memory from the IPA space to back the ITS tables,
> but KVM maintains its own internal representation. It just so happens
> that we've conditioned userspace to be aware that ITS emulation is
> incoherent w.r.t. the guest memory that backs the tables.
> 
> > For example, I'm assuming it's safe to change KVM_DEV_ARM_ITS_SAVE_TABLES
> > impl one day to not flush data to guest memories, then the kernel should
> > also disable the ALLOW_BITMAP cap in the same patch, so that any old qemu
> > binary that supports arm64 dirty ring will naturally skip all the bitmap
> > ops and becoming the same as what it does with x86 when running on that new
> > kernel.  With implicit approach suggested, we need to modify QEMU.
> > 
> > Changing impl of KVM_DEV_ARM_ITS_SAVE_TABLES is probably not a good
> > example.. but just want to show what I meant.  Fundamentally it sounds
> > cleaner if it's the kernel that tells the user "okay you collected the
> > ring, but that's not enough; you need to collect the bitmap too", rather
> > than assuming the user app will always know what kvm did in details.  No
> > strong opinion though, as I could also have misunderstood how arm works.
> 
> I think the SAVE_TABLES ioctl is likely here to stay given the odd quirk
> that it really is guest memory, so we'll probably need the bitmap on
> arm64 for a long time. Even if we were to kill it, userspace would need
> to take a change anyway to switch to a new ITS migration mechanism.
> 
> If we ever get to the point that we can relax this restriction i think a
> flag on the BITMAP_WITH_TABLE cap that says "I don't actually set any

BITMAP_WITH_RING

> bits in the bitmap" would do. We shouldn't hide the cap entirely, as
> that would be ABI breakage for VMMs that expect bitmap+ring.
> 
> Thoughts?
> 
> --
> Thanks,
> Oliver
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-18  7:40                       ` Oliver Upton
@ 2022-10-18  7:40                         ` Oliver Upton
  0 siblings, 0 replies; 48+ messages in thread
From: Oliver Upton @ 2022-10-18  7:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

On Tue, Oct 18, 2022 at 10:38:10AM +0300, Oliver Upton wrote:
> On Fri, Oct 14, 2022 at 12:55:35PM -0400, Peter Xu wrote:
> > On Tue, Oct 11, 2022 at 01:12:43AM +0000, Oliver Upton wrote:
> > > The VMM must know something about the architecture it is running on, as
> > > it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all...
> > 
> > IIUC this is still a kernel impl detail to flush data into guest pages
> > within this ioctl, or am I wrong?
> 
> Somewhat...
> 
> The guest is assigning memory from the IPA space to back the ITS tables,
> but KVM maintains its own internal representation. It just so happens
> that we've conditioned userspace to be aware that ITS emulation is
> incoherent w.r.t. the guest memory that backs the tables.
> 
> > For example, I'm assuming it's safe to change KVM_DEV_ARM_ITS_SAVE_TABLES
> > impl one day to not flush data to guest memories, then the kernel should
> > also disable the ALLOW_BITMAP cap in the same patch, so that any old qemu
> > binary that supports arm64 dirty ring will naturally skip all the bitmap
> > ops and becoming the same as what it does with x86 when running on that new
> > kernel.  With implicit approach suggested, we need to modify QEMU.
> > 
> > Changing impl of KVM_DEV_ARM_ITS_SAVE_TABLES is probably not a good
> > example.. but just want to show what I meant.  Fundamentally it sounds
> > cleaner if it's the kernel that tells the user "okay you collected the
> > ring, but that's not enough; you need to collect the bitmap too", rather
> > than assuming the user app will always know what kvm did in details.  No
> > strong opinion though, as I could also have misunderstood how arm works.
> 
> I think the SAVE_TABLES ioctl is likely here to stay given the odd quirk
> that it really is guest memory, so we'll probably need the bitmap on
> arm64 for a long time. Even if we were to kill it, userspace would need
> to take a change anyway to switch to a new ITS migration mechanism.
> 
> If we ever get to the point that we can relax this restriction i think a
> flag on the BITMAP_WITH_TABLE cap that says "I don't actually set any

BITMAP_WITH_RING

> bits in the bitmap" would do. We shouldn't hide the cap entirely, as
> that would be ABI breakage for VMMs that expect bitmap+ring.
> 
> Thoughts?
> 
> --
> Thanks,
> Oliver
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-18  7:38                     ` Oliver Upton
  2022-10-18  7:38                       ` Oliver Upton
  2022-10-18  7:40                       ` Oliver Upton
@ 2022-10-18 15:50                       ` Peter Xu
  2022-10-18 15:50                         ` Peter Xu
  2 siblings, 1 reply; 48+ messages in thread
From: Peter Xu @ 2022-10-18 15:50 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm

On Tue, Oct 18, 2022 at 10:38:10AM +0300, Oliver Upton wrote:
> If we ever get to the point that we can relax this restriction i think a
> flag on the BITMAP_WITH_TABLE cap that says "I don't actually set any
> bits in the bitmap" would do. We shouldn't hide the cap entirely, as
> that would be ABI breakage for VMMs that expect bitmap+ring.

I'd rather drop the cap directly if it's just a boolean that tells us
"whether we need bitmap to back rings". Btw when I said "dropping it" I
meant "don't return 1 for the cap anymore" - we definitely need to make the
cap macro stable as part of kvm API..

But I think I misunderstood the proposal previously, sorry. I assumed we
were discussing an internal HAVE_DIRTY_RING_WITH_BITMAP only.  I noticed
this only after I had a closer look at Gavin's patch.  Having a cap exposed
sounds always good to me.

I'll comment on Gavin's patch directly, thanks.

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking
  2022-10-18 15:50                       ` Peter Xu
@ 2022-10-18 15:50                         ` Peter Xu
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Xu @ 2022-10-18 15:50 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, catalin.marinas, bgardon, shuah,
	andrew.jones, will, dmatlack, pbonzini, zhenyzha, james.morse,
	suzuki.poulose, alexandru.elisei, seanjc, shan.gavin

On Tue, Oct 18, 2022 at 10:38:10AM +0300, Oliver Upton wrote:
> If we ever get to the point that we can relax this restriction i think a
> flag on the BITMAP_WITH_TABLE cap that says "I don't actually set any
> bits in the bitmap" would do. We shouldn't hide the cap entirely, as
> that would be ABI breakage for VMMs that expect bitmap+ring.

I'd rather drop the cap directly if it's just a boolean that tells us
"whether we need bitmap to back rings". Btw when I said "dropping it" I
meant "don't return 1 for the cap anymore" - we definitely need to make the
cap macro stable as part of kvm API..

But I think I misunderstood the proposal previously, sorry. I assumed we
were discussing an internal HAVE_DIRTY_RING_WITH_BITMAP only.  I noticed
this only after I had a closer look at Gavin's patch.  Having a cap exposed
sounds always good to me.

I'll comment on Gavin's patch directly, thanks.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2022-10-18 15:51 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-05  0:41 [PATCH v5 0/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-05  0:41 ` Gavin Shan
2022-10-05  0:41 ` [PATCH v5 1/7] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL Gavin Shan
2022-10-05  0:41   ` Gavin Shan
2022-10-05  0:41 ` [PATCH v5 2/7] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-10-05  0:41   ` Gavin Shan
2022-10-05  0:41 ` [PATCH v5 3/7] KVM: x86: Allow to use bitmap in ring-based dirty page tracking Gavin Shan
2022-10-05  0:41   ` Gavin Shan
2022-10-06 20:28   ` Peter Xu
2022-10-06 20:28     ` Peter Xu
2022-10-06 23:38     ` Gavin Shan
2022-10-06 23:38       ` Gavin Shan
2022-10-07 14:31       ` Peter Xu
2022-10-07 14:31         ` Peter Xu
2022-10-10 23:18         ` Oliver Upton
2022-10-10 23:18           ` Oliver Upton
2022-10-10 23:43           ` Oliver Upton
2022-10-10 23:43             ` Oliver Upton
2022-10-10 23:49           ` Peter Xu
2022-10-10 23:49             ` Peter Xu
2022-10-10 23:58             ` Gavin Shan
2022-10-10 23:58               ` Gavin Shan
2022-10-10 23:58             ` Oliver Upton
2022-10-10 23:58               ` Oliver Upton
2022-10-11  0:20               ` Peter Xu
2022-10-11  0:20                 ` Peter Xu
2022-10-11  1:12                 ` Oliver Upton
2022-10-11  1:12                   ` Oliver Upton
2022-10-11  3:56                   ` Gavin Shan
2022-10-11  3:56                     ` Gavin Shan
2022-10-11  6:31                     ` Gavin Shan
2022-10-11  6:31                       ` Gavin Shan
2022-10-14 16:55                   ` Peter Xu
2022-10-14 16:55                     ` Peter Xu
2022-10-18  7:38                     ` Oliver Upton
2022-10-18  7:38                       ` Oliver Upton
2022-10-18  7:40                       ` Oliver Upton
2022-10-18  7:40                         ` Oliver Upton
2022-10-18 15:50                       ` Peter Xu
2022-10-18 15:50                         ` Peter Xu
2022-10-05  0:41 ` [PATCH v5 4/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-05  0:41   ` Gavin Shan
2022-10-05  0:41 ` [PATCH v5 5/7] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-10-05  0:41   ` Gavin Shan
2022-10-05  0:41 ` [PATCH v5 6/7] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-10-05  0:41   ` Gavin Shan
2022-10-05  0:41 ` [PATCH v5 7/7] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-10-05  0:41   ` Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).