All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/5] KVM: Dirty quota-based throttling
@ 2022-09-15 10:10 Shivam Kumar
  2022-09-15 10:10 ` [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Shivam Kumar @ 2022-09-15 10:10 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david; +Cc: kvm, Shivam Kumar

This is v6 of the dirty quota series, with the following changes over v4, v5 had
some styling issues in the blurb added to the KVM documentation (unfortunately the
'checkscript' script didn't report any issue):

1. x86-specific changes separated into a new commit.
2. KVM requests used to handle dirty quota exit conditions.


v1:
https://lore.kernel.org/kvm/20211114145721.209219-1-shivam.kumar1@nutanix.com/
v2: https://lore.kernel.org/kvm/Ydx2EW6U3fpJoJF0@google.com/T/
v3: https://lore.kernel.org/kvm/YkT1kzWidaRFdQQh@google.com/T/
v4:
https://lore.kernel.org/all/20220521202937.184189-1-shivam.kumar1@nutanix.com/
v5: https://lore.kernel.org/all/202209130532.2BJwW65L-lkp@intel.com/T/

Thanks,
Shivam

Shivam Kumar (5):
  KVM: Implement dirty quota-based throttling of vcpus
  KVM: x86: Dirty quota-based throttling of vcpus
  KVM: arm64: Dirty quota-based throttling of vcpus
  KVM: s390x: Dirty quota-based throttling of vcpus
  KVM: selftests: Add selftests for dirty quota throttling

 Documentation/virt/kvm/api.rst                | 35 ++++++++++++++++++
 arch/arm64/kvm/arm.c                          |  9 +++++
 arch/s390/kvm/kvm-s390.c                      |  9 +++++
 arch/x86/kvm/mmu/spte.c                       |  4 +--
 arch/x86/kvm/vmx/vmx.c                        |  3 ++
 arch/x86/kvm/x86.c                            |  9 +++++
 include/linux/kvm_host.h                      | 20 ++++++++++-
 include/linux/kvm_types.h                     |  1 +
 include/uapi/linux/kvm.h                      | 12 +++++++
 tools/testing/selftests/kvm/dirty_log_test.c  | 33 +++++++++++++++--
 .../selftests/kvm/include/kvm_util_base.h     |  4 +++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 36 +++++++++++++++++++
 virt/kvm/kvm_main.c                           | 26 ++++++++++++--
 13 files changed, 193 insertions(+), 8 deletions(-)

-- 
2.22.3


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-09-15 10:10 [PATCH v6 0/5] KVM: Dirty quota-based throttling Shivam Kumar
@ 2022-09-15 10:10 ` Shivam Kumar
  2022-09-15 13:21   ` Christian Borntraeger
                     ` (2 more replies)
  2022-09-15 10:10 ` [PATCH v6 2/5] KVM: x86: Dirty " Shivam Kumar
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 28+ messages in thread
From: Shivam Kumar @ 2022-09-15 10:10 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david
  Cc: kvm, Shivam Kumar, Shaju Abraham, Manish Mishra, Anurag Madnawat

Define variables to track and throttle memory dirtying for every vcpu.

dirty_count:    Number of pages the vcpu has dirtied since its creation,
                while dirty logging is enabled.
dirty_quota:    Number of pages the vcpu is allowed to dirty. To dirty
                more, it needs to request more quota by exiting to
                userspace.

Implement the flow for throttling based on dirty quota.

i) Increment dirty_count for the vcpu whenever it dirties a page.
ii) Exit to userspace whenever the dirty quota is exhausted (i.e. dirty
count equals/exceeds dirty quota) to request more dirty quota.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 Documentation/virt/kvm/api.rst | 35 ++++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h       | 20 ++++++++++++++++++-
 include/linux/kvm_types.h      |  1 +
 include/uapi/linux/kvm.h       | 12 ++++++++++++
 virt/kvm/kvm_main.c            | 26 ++++++++++++++++++++++---
 5 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index abd7c32126ce..97030a6a35b4 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6614,6 +6614,26 @@ array field represents return values. The userspace should update the return
 values of SBI call before resuming the VCPU. For more details on RISC-V SBI
 spec refer, https://github.com/riscv/riscv-sbi-doc.
 
+::
+
+		/* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
+		struct {
+			__u64 count;
+			__u64 quota;
+		} dirty_quota_exit;
+
+If exit reason is KVM_EXIT_DIRTY_QUOTA_EXHAUSTED, it indicates that the VCPU has
+exhausted its dirty quota. The 'dirty_quota_exit' member of kvm_run structure
+makes the following information available to the userspace:
+    count: the current count of pages dirtied by the VCPU, can be
+    skewed based on the size of the pages accessed by each vCPU.
+    quota: the observed dirty quota just before the exit to userspace.
+
+The userspace can design a strategy to allocate the overall scope of dirtying
+for the VM among the vcpus. Based on the strategy and the current state of dirty
+quota throttling, the userspace can make a decision to either update (increase)
+the quota or to put the VCPU to sleep for some time.
+
 ::
 
     /* KVM_EXIT_NOTIFY */
@@ -6668,6 +6688,21 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 
 ::
 
+	/*
+	 * Number of pages the vCPU is allowed to have dirtied over its entire
+	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the quota
+	 * is reached/exceeded.
+	 */
+	__u64 dirty_quota;
+
+Please note that enforcing the quota is best effort, as the guest may dirty
+multiple pages before KVM can recheck the quota.  However, unless KVM is using
+a hardware-based dirty ring buffer, e.g. Intel's Page Modification Logging,
+KVM will detect quota exhaustion within a handful of dirtied pages.  If a
+hardware ring buffer is used, the overrun is bounded by the size of the buffer
+(512 entries for PML).
+
+::
   };
 
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4519d3689e1..9acb28635d94 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -151,12 +151,13 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQUEST_NO_ACTION      BIT(10)
 /*
  * Architecture-independent vcpu->requests bit members
- * Bits 4-7 are reserved for more arch-independent bits.
+ * Bits 5-7 are reserved for more arch-independent bits.
  */
 #define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UNBLOCK           2
 #define KVM_REQ_UNHALT            3
+#define KVM_REQ_DIRTY_QUOTA_EXIT  4
 #define KVM_REQUEST_ARCH_BASE     8
 
 /*
@@ -380,6 +381,8 @@ struct kvm_vcpu {
 	 */
 	struct kvm_memory_slot *last_used_slot;
 	u64 last_used_slot_gen;
+
+	u64 dirty_quota;
 };
 
 /*
@@ -542,6 +545,21 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
 	return cmpxchg(&vcpu->mode, IN_GUEST_MODE, EXITING_GUEST_MODE);
 }
 
+static inline int kvm_vcpu_check_dirty_quota(struct kvm_vcpu *vcpu)
+{
+	struct kvm_run *run = vcpu->run;
+	u64 dirty_quota = READ_ONCE(run->dirty_quota);
+	u64 pages_dirtied = vcpu->stat.generic.pages_dirtied;
+
+	if (!dirty_quota || (pages_dirtied < dirty_quota))
+		return 1;
+
+	run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
+	run->dirty_quota_exit.count = pages_dirtied;
+	run->dirty_quota_exit.quota = dirty_quota;
+	return 0;
+}
+
 /*
  * Some of the bitops functions do not support too long bitmaps.
  * This number must be determined not to exceed such limits.
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 3ca3db020e0e..263a588f3cd3 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -118,6 +118,7 @@ struct kvm_vcpu_stat_generic {
 	u64 halt_poll_fail_hist[HALT_POLL_HIST_COUNT];
 	u64 halt_wait_hist[HALT_POLL_HIST_COUNT];
 	u64 blocking;
+	u64 pages_dirtied;
 };
 
 #define KVM_STATS_NAME_SIZE	48
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index eed0315a77a6..4c4a65b0f0a5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -272,6 +272,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_SBI        35
 #define KVM_EXIT_RISCV_CSR        36
 #define KVM_EXIT_NOTIFY           37
+#define KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 38
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -510,6 +511,11 @@ struct kvm_run {
 #define KVM_NOTIFY_CONTEXT_INVALID	(1 << 0)
 			__u32 flags;
 		} notify;
+		/* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
+		struct {
+			__u64 count;
+			__u64 quota;
+		} dirty_quota_exit;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -531,6 +537,12 @@ struct kvm_run {
 		struct kvm_sync_regs regs;
 		char padding[SYNC_REGS_SIZE_BYTES];
 	} s;
+	/*
+	 * Number of pages the vCPU is allowed to have dirtied over its entire
+	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the
+	 * quota is reached/exceeded.
+	 */
+	__u64 dirty_quota;
 };
 
 /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 584a5bab3af3..f315af50037d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3298,18 +3298,36 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
 }
 EXPORT_SYMBOL_GPL(kvm_clear_guest);
 
+static void kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
+{
+	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
+
+	if (!dirty_quota || (vcpu->stat.generic.pages_dirtied < dirty_quota))
+		return;
+
+	/*
+	 * Snapshot the quota to report it to userspace.  The dirty count will be
+	 * captured when the request is processed.
+	 */
+	vcpu->dirty_quota = dirty_quota;
+	kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
+}
+
 void mark_page_dirty_in_slot(struct kvm *kvm,
 			     const struct kvm_memory_slot *memslot,
 		 	     gfn_t gfn)
 {
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
-#ifdef CONFIG_HAVE_KVM_DIRTY_RING
 	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
 		return;
-#endif
 
-	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
+	if (!memslot)
+		return;
+
+	WARN_ON_ONCE(!vcpu->stat.generic.pages_dirtied++);
+
+	if (kvm_slot_dirty_track_enabled(memslot)) {
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
@@ -3318,6 +3336,8 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 					    slot, rel_gfn);
 		else
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
+
+		kvm_vcpu_is_dirty_quota_exhausted(vcpu);
 	}
 }
 EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot);
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6 2/5] KVM: x86: Dirty quota-based throttling of vcpus
  2022-09-15 10:10 [PATCH v6 0/5] KVM: Dirty quota-based throttling Shivam Kumar
  2022-09-15 10:10 ` [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
@ 2022-09-15 10:10 ` Shivam Kumar
  2022-10-07 19:30   ` Sean Christopherson
  2022-09-15 10:10 ` [PATCH v6 3/5] KVM: arm64: " Shivam Kumar
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Shivam Kumar @ 2022-09-15 10:10 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david
  Cc: kvm, Shivam Kumar, Shaju Abraham, Manish Mishra, Anurag Madnawat

Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
equals/exceeds dirty quota) to request more dirty quota.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 arch/x86/kvm/mmu/spte.c | 4 ++--
 arch/x86/kvm/vmx/vmx.c  | 3 +++
 arch/x86/kvm/x86.c      | 9 +++++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 2e08b2a45361..c0ed35abbf2d 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -228,9 +228,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
 		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
 
-	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
+	if (spte & PT_WRITABLE_MASK) {
 		/* Enforced by kvm_mmu_hugepage_adjust. */
-		WARN_ON(level > PG_LEVEL_4K);
+		WARN_ON(level > PG_LEVEL_4K && kvm_slot_dirty_track_enabled(slot));
 		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
 	}
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c9b49a09e6b5..140a5511ed45 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5749,6 +5749,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 		 */
 		if (__xfer_to_guest_mode_work_pending())
 			return 1;
+
+		if (!kvm_vcpu_check_dirty_quota(vcpu))
+			return 0;
 	}
 
 	return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 43a6a7efc6ec..8447e70b26f5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10379,6 +10379,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			r = 0;
 			goto out;
 		}
+		if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
+			struct kvm_run *run = vcpu->run;
+
+			run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
+			run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
+			run->dirty_quota_exit.quota = vcpu->dirty_quota;
+			r = 0;
+			goto out;
+		}
 
 		/*
 		 * KVM_REQ_HV_STIMER has to be processed after
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6 3/5] KVM: arm64: Dirty quota-based throttling of vcpus
  2022-09-15 10:10 [PATCH v6 0/5] KVM: Dirty quota-based throttling Shivam Kumar
  2022-09-15 10:10 ` [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
  2022-09-15 10:10 ` [PATCH v6 2/5] KVM: x86: Dirty " Shivam Kumar
@ 2022-09-15 10:10 ` Shivam Kumar
  2022-09-15 10:10 ` [PATCH v6 4/5] KVM: s390x: " Shivam Kumar
  2022-09-15 10:10 ` [PATCH v6 5/5] KVM: selftests: Add selftests for dirty quota throttling Shivam Kumar
  4 siblings, 0 replies; 28+ messages in thread
From: Shivam Kumar @ 2022-09-15 10:10 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david
  Cc: kvm, Shivam Kumar, Shaju Abraham, Manish Mishra, Anurag Madnawat

Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
equals/exceeds dirty quota) to request more dirty quota.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 arch/arm64/kvm/arm.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2ff0ef62abad..ff17d14aa5b0 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -747,6 +747,15 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
 			return kvm_vcpu_suspend(vcpu);
+
+		if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
+			struct kvm_run *run = vcpu->run;
+
+			run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
+			run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
+			run->dirty_quota_exit.quota = vcpu->dirty_quota;
+			return 0;
+		}
 	}
 
 	return 1;
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6 4/5] KVM: s390x: Dirty quota-based throttling of vcpus
  2022-09-15 10:10 [PATCH v6 0/5] KVM: Dirty quota-based throttling Shivam Kumar
                   ` (2 preceding siblings ...)
  2022-09-15 10:10 ` [PATCH v6 3/5] KVM: arm64: " Shivam Kumar
@ 2022-09-15 10:10 ` Shivam Kumar
  2022-09-15 13:24   ` Christian Borntraeger
  2022-09-15 10:10 ` [PATCH v6 5/5] KVM: selftests: Add selftests for dirty quota throttling Shivam Kumar
  4 siblings, 1 reply; 28+ messages in thread
From: Shivam Kumar @ 2022-09-15 10:10 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david
  Cc: kvm, Shivam Kumar, Shaju Abraham, Manish Mishra, Anurag Madnawat

Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
equals/exceeds dirty quota) to request more dirty quota.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 arch/s390/kvm/kvm-s390.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index edfd4bbd0cba..2fe2933a7064 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4343,6 +4343,15 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
 		goto retry;
 	}
 
+	if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
+		struct kvm_run *run = vcpu->run;
+
+		run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
+		run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
+		run->dirty_quota_exit.quota = vcpu->dirty_quota;
+		return 1;
+	}
+
 	/* nothing to do, just clear the request */
 	kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 	/* we left the vsie handler, nothing to do, just clear the request */
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v6 5/5] KVM: selftests: Add selftests for dirty quota throttling
  2022-09-15 10:10 [PATCH v6 0/5] KVM: Dirty quota-based throttling Shivam Kumar
                   ` (3 preceding siblings ...)
  2022-09-15 10:10 ` [PATCH v6 4/5] KVM: s390x: " Shivam Kumar
@ 2022-09-15 10:10 ` Shivam Kumar
  2022-10-07 18:29   ` Sean Christopherson
  4 siblings, 1 reply; 28+ messages in thread
From: Shivam Kumar @ 2022-09-15 10:10 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david
  Cc: kvm, Shivam Kumar, Shaju Abraham, Manish Mishra, Anurag Madnawat

Add selftests for dirty quota throttling with an optional -d parameter
to configure by what value dirty quota should be incremented after
each dirty quota exit. With very small intervals, a smaller value of
dirty quota can ensure that the dirty quota exit code is tested. A zero
value disables dirty quota throttling and thus dirty logging, without
dirty quota throttling, can be tested.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c  | 33 +++++++++++++++--
 .../selftests/kvm/include/kvm_util_base.h     |  4 +++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 36 +++++++++++++++++++
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 9c883c94d478..81c46eff7e1d 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -63,6 +63,8 @@
 
 #define SIG_IPI SIGUSR1
 
+#define TEST_DIRTY_QUOTA_INCREMENT		8
+
 /*
  * Guest/Host shared variables. Ensure addr_gva2hva() and/or
  * sync_global_to/from_guest() are used when accessing from
@@ -189,6 +191,7 @@ static enum log_mode_t host_log_mode_option = LOG_MODE_ALL;
 static enum log_mode_t host_log_mode;
 static pthread_t vcpu_thread;
 static uint32_t test_dirty_ring_count = TEST_DIRTY_RING_COUNT;
+static uint64_t test_dirty_quota_increment = TEST_DIRTY_QUOTA_INCREMENT;
 
 static void vcpu_kick(void)
 {
@@ -208,6 +211,13 @@ static void sem_wait_until(sem_t *sem)
 	while (ret == -1 && errno == EINTR);
 }
 
+static void set_dirty_quota(struct kvm_vm *vm, uint64_t dirty_quota)
+{
+	struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+
+	vcpu_set_dirty_quota(run, dirty_quota);
+}
+
 static bool clear_log_supported(void)
 {
 	return kvm_has_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
@@ -255,7 +265,11 @@ static void default_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
 	TEST_ASSERT(ret == 0 || (ret == -1 && err == EINTR),
 		    "vcpu run failed: errno=%d", err);
 
-	TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
+	if (test_dirty_quota_increment &&
+		run->exit_reason == KVM_EXIT_DIRTY_QUOTA_EXHAUSTED)
+		vcpu_handle_dirty_quota_exit(run, test_dirty_quota_increment);
+	else
+		TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
 		    "Invalid guest sync status: exit_reason=%s\n",
 		    exit_reason_str(run->exit_reason));
 
@@ -372,6 +386,9 @@ static void dirty_ring_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
 	if (get_ucall(vcpu, NULL) == UCALL_SYNC) {
 		/* We should allow this to continue */
 		;
+	} else if (test_dirty_quota_increment &&
+		run->exit_reason == KVM_EXIT_DIRTY_QUOTA_EXHAUSTED) {
+		vcpu_handle_dirty_quota_exit(run, test_dirty_quota_increment);
 	} else if (run->exit_reason == KVM_EXIT_DIRTY_RING_FULL ||
 		   (ret == -1 && err == EINTR)) {
 		/* Update the flag first before pause */
@@ -762,6 +779,10 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	sync_global_to_guest(vm, guest_test_virt_mem);
 	sync_global_to_guest(vm, guest_num_pages);
 
+	/* Initialise dirty quota */
+	if (test_dirty_quota_increment)
+		set_dirty_quota(vm, test_dirty_quota_increment);
+
 	/* Start the iterations */
 	iteration = 1;
 	sync_global_to_guest(vm, iteration);
@@ -803,6 +824,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	/* Tell the vcpu thread to quit */
 	host_quit = true;
 	log_mode_before_vcpu_join();
+	/* Terminate dirty quota throttling */
+	if (test_dirty_quota_increment)
+		set_dirty_quota(vm, 0);
 	pthread_join(vcpu_thread, NULL);
 
 	pr_info("Total bits checked: dirty (%"PRIu64"), clear (%"PRIu64"), "
@@ -824,6 +848,8 @@ static void help(char *name)
 	printf(" -c: specify dirty ring size, in number of entries\n");
 	printf("     (only useful for dirty-ring test; default: %"PRIu32")\n",
 	       TEST_DIRTY_RING_COUNT);
+	printf(" -q: specify incemental dirty quota (default: %"PRIu32")\n",
+	       TEST_DIRTY_QUOTA_INCREMENT);
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
 	       TEST_HOST_LOOP_N);
 	printf(" -I: specify interval in ms (default: %"PRIu64" ms)\n",
@@ -852,11 +878,14 @@ int main(int argc, char *argv[])
 
 	guest_modes_append_default();
 
-	while ((opt = getopt(argc, argv, "c:hi:I:p:m:M:")) != -1) {
+	while ((opt = getopt(argc, argv, "c:q:hi:I:p:m:M:")) != -1) {
 		switch (opt) {
 		case 'c':
 			test_dirty_ring_count = strtol(optarg, NULL, 10);
 			break;
+		case 'q':
+			test_dirty_quota_increment = strtol(optarg, NULL, 10);
+			break;
 		case 'i':
 			p.iterations = strtol(optarg, NULL, 10);
 			break;
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 24fde97f6121..68be66b9ac67 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -834,4 +834,8 @@ static inline int __vm_disable_nx_huge_pages(struct kvm_vm *vm)
 	return __vm_enable_cap(vm, KVM_CAP_VM_DISABLE_NX_HUGE_PAGES, 0);
 }
 
+void vcpu_set_dirty_quota(struct kvm_run *run, uint64_t dirty_quota);
+void vcpu_handle_dirty_quota_exit(struct kvm_run *run,
+			uint64_t test_dirty_quota_increment);
+
 #endif /* SELFTEST_KVM_UTIL_BASE_H */
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 9889fe0d8919..4c7bd9807d0b 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -18,6 +18,7 @@
 #include <linux/kernel.h>
 
 #define KVM_UTIL_MIN_PFN	2
+#define PML_BUFFER_SIZE	512
 
 static int vcpu_mmap_sz(void);
 
@@ -1703,6 +1704,7 @@ static struct exit_reason {
 	{KVM_EXIT_X86_RDMSR, "RDMSR"},
 	{KVM_EXIT_X86_WRMSR, "WRMSR"},
 	{KVM_EXIT_XEN, "XEN"},
+	{KVM_EXIT_DIRTY_QUOTA_EXHAUSTED, "DIRTY_QUOTA_EXHAUSTED"},
 #ifdef KVM_EXIT_MEMORY_NOT_PRESENT
 	{KVM_EXIT_MEMORY_NOT_PRESENT, "MEMORY_NOT_PRESENT"},
 #endif
@@ -1979,3 +1981,37 @@ void __vm_get_stat(struct kvm_vm *vm, const char *stat_name, uint64_t *data,
 		break;
 	}
 }
+
+void vcpu_set_dirty_quota(struct kvm_run *run, uint64_t dirty_quota)
+{
+	run->dirty_quota = dirty_quota;
+
+	if (dirty_quota)
+		pr_info("Dirty quota throttling enabled with initial quota %"PRIu64"\n",
+			dirty_quota);
+	else
+		pr_info("Dirty quota throttling disabled\n");
+}
+
+void vcpu_handle_dirty_quota_exit(struct kvm_run *run,
+			uint64_t test_dirty_quota_increment)
+{
+	uint64_t quota = run->dirty_quota_exit.quota;
+	uint64_t count = run->dirty_quota_exit.count;
+
+	/*
+	 * Due to PML, number of pages dirtied by the vcpu can exceed its dirty
+	 * quota by PML buffer size.
+	 */
+	TEST_ASSERT(count <= quota + PML_BUFFER_SIZE, "Invalid number of pages
+		dirtied: count=%"PRIu64", quota=%"PRIu64"\n", count, quota);
+
+	TEST_ASSERT(count >= quota, "Dirty quota exit happened with quota yet to
+		be exhausted: count=%"PRIu64", quota=%"PRIu64"\n", count, quota);
+
+	if (count > quota)
+		pr_info("Dirty quota exit with unequal quota and count:
+			count=%"PRIu64", quota=%"PRIu64"\n", count, quota);
+
+	run->dirty_quota = count + test_dirty_quota_increment;
+}
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-09-15 10:10 ` [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
@ 2022-09-15 13:21   ` Christian Borntraeger
  2022-09-15 14:34     ` Christian Borntraeger
  2022-10-07 19:08   ` Sean Christopherson
  2022-10-10  5:41   ` Shivam Kumar
  2 siblings, 1 reply; 28+ messages in thread
From: Christian Borntraeger @ 2022-09-15 13:21 UTC (permalink / raw)
  To: Shivam Kumar, pbonzini, seanjc, maz, james.morse, david
  Cc: kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat



Am 15.09.22 um 12:10 schrieb Shivam Kumar:
> Define variables to track and throttle memory dirtying for every vcpu.
> 
> dirty_count:    Number of pages the vcpu has dirtied since its creation,
>                  while dirty logging is enabled.
> dirty_quota:    Number of pages the vcpu is allowed to dirty. To dirty
>                  more, it needs to request more quota by exiting to
>                  userspace.
> 
> Implement the flow for throttling based on dirty quota.
> 
> i) Increment dirty_count for the vcpu whenever it dirties a page.
> ii) Exit to userspace whenever the dirty quota is exhausted (i.e. dirty
> count equals/exceeds dirty quota) to request more dirty quota.
> 
> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
[...]

I am wondering if this will work on s390. On s390  we only call
mark_page_dirty_in_slot for the kvm_read/write functions but not
for those done by the guest on fault. We do account those lazily in
kvm_arch_sync_dirty_log (like x96 in the past).

> +
>   void mark_page_dirty_in_slot(struct kvm *kvm,
>   			     const struct kvm_memory_slot *memslot,
>   		 	     gfn_t gfn)
>   {
>   	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>   
> -#ifdef CONFIG_HAVE_KVM_DIRTY_RING
>   	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>   		return;
> -#endif

This will rigger on s390 for the interrupt payload written from vhost
if I recall correctly. Please do not enable this warning.

>   
> -	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> +	if (!memslot)
> +		return;
> +
> +	WARN_ON_ONCE(!vcpu->stat.generic.pages_dirtied++);
> +
> +	if (kvm_slot_dirty_track_enabled(memslot)) {
>   		unsigned long rel_gfn = gfn - memslot->base_gfn;
>   		u32 slot = (memslot->as_id << 16) | memslot->id;
>   
> @@ -3318,6 +3336,8 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>   					    slot, rel_gfn);
>   		else
>   			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> +
> +		kvm_vcpu_is_dirty_quota_exhausted(vcpu);
>   	}
>   }
>   EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 4/5] KVM: s390x: Dirty quota-based throttling of vcpus
  2022-09-15 10:10 ` [PATCH v6 4/5] KVM: s390x: " Shivam Kumar
@ 2022-09-15 13:24   ` Christian Borntraeger
  0 siblings, 0 replies; 28+ messages in thread
From: Christian Borntraeger @ 2022-09-15 13:24 UTC (permalink / raw)
  To: Shivam Kumar, pbonzini, seanjc, maz, james.morse, david
  Cc: kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat



Am 15.09.22 um 12:10 schrieb Shivam Kumar:
> Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
> equals/exceeds dirty quota) to request more dirty quota.
> 
> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> ---
>   arch/s390/kvm/kvm-s390.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index edfd4bbd0cba..2fe2933a7064 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -4343,6 +4343,15 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
>   		goto retry;
>   	}
>   
> +	if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
> +		struct kvm_run *run = vcpu->run;
> +
> +		run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> +		run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
> +		run->dirty_quota_exit.quota = vcpu->dirty_quota;
> +		return 1;
> +	}

Please use
return -EREMOTE;

We use this in s390 code to indicate "we need exit to userspace" and
kvm_arch_vcpu_ioctl_run will change -EREMOTE to 0.

> +
>   	/* nothing to do, just clear the request */
>   	kvm_clear_request(KVM_REQ_UNHALT, vcpu);
>   	/* we left the vsie handler, nothing to do, just clear the request */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-09-15 13:21   ` Christian Borntraeger
@ 2022-09-15 14:34     ` Christian Borntraeger
  2022-10-07 18:18       ` Sean Christopherson
  0 siblings, 1 reply; 28+ messages in thread
From: Christian Borntraeger @ 2022-09-15 14:34 UTC (permalink / raw)
  To: Shivam Kumar, pbonzini, seanjc, maz, james.morse, david,
	Claudio Imbrenda, Janosch Frank
  Cc: kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat,
	Heiko Carstens, Sven Schnelle, Vasily Gorbik, Alexander Gordeev

Am 15.09.22 um 15:21 schrieb Christian Borntraeger:
> 
> 
> Am 15.09.22 um 12:10 schrieb Shivam Kumar:
>> Define variables to track and throttle memory dirtying for every vcpu.
>>
>> dirty_count:    Number of pages the vcpu has dirtied since its creation,
>>                  while dirty logging is enabled.
>> dirty_quota:    Number of pages the vcpu is allowed to dirty. To dirty
>>                  more, it needs to request more quota by exiting to
>>                  userspace.
>>
>> Implement the flow for throttling based on dirty quota.
>>
>> i) Increment dirty_count for the vcpu whenever it dirties a page.
>> ii) Exit to userspace whenever the dirty quota is exhausted (i.e. dirty
>> count equals/exceeds dirty quota) to request more dirty quota.
>>
>> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
>> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
>> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> [...]
> 
> I am wondering if this will work on s390. On s390  we only call
> mark_page_dirty_in_slot for the kvm_read/write functions but not
> for those done by the guest on fault. We do account those lazily in
> kvm_arch_sync_dirty_log (like x96 in the past).
> 

I think we need to rework the page fault handling on s390 to actually make
use of this. This has to happen anyway somewhen (as indicated by the guest
enter/exit rework from Mark). Right now we handle KVM page faults directly
in the normal system fault handler. It seems we need to make a side turn
into KVM for page faults on guests in the long run.

Until this is done the dirty logging is really not per CPU on s390 so we
need to defer this feature for now. CC other s390 maintainers.
The other use case for a page fault handler rework was Mark Rutlands
"kvm: fix latent guest entry/exit bugs" series which did not work on s390.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-09-15 14:34     ` Christian Borntraeger
@ 2022-10-07 18:18       ` Sean Christopherson
  2022-10-09 18:36         ` Shivam Kumar
  0 siblings, 1 reply; 28+ messages in thread
From: Sean Christopherson @ 2022-10-07 18:18 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Shivam Kumar, pbonzini, maz, james.morse, david,
	Claudio Imbrenda, Janosch Frank, kvm, Shaju Abraham,
	Manish Mishra, Anurag Madnawat, Heiko Carstens, Sven Schnelle,
	Vasily Gorbik, Alexander Gordeev

On Thu, Sep 15, 2022, Christian Borntraeger wrote:
> Am 15.09.22 um 15:21 schrieb Christian Borntraeger:
> > I am wondering if this will work on s390. On s390  we only call
> > mark_page_dirty_in_slot for the kvm_read/write functions but not
> > for those done by the guest on fault. We do account those lazily in
> > kvm_arch_sync_dirty_log (like x96 in the past).
> > 
> 
> I think we need to rework the page fault handling on s390 to actually make
> use of this. This has to happen anyway somewhen (as indicated by the guest
> enter/exit rework from Mark). Right now we handle KVM page faults directly
> in the normal system fault handler. It seems we need to make a side turn
> into KVM for page faults on guests in the long run.

s390's page table management came up in conversation at KVM Forum in the context
of a common MMU (or MMUs)[*].  If/when we get an MMU that supports multiple
architectures, that would be a perfect opportunity for s390 to switch.  I don't
know that it'll be feasible to make a common MMU support s390 from the get-go,
but at the very least we can keep y'all in the loop and not make it unnecessarily
difficult to support s390.

[*] https://kvmforum2022.sched.com/event/15jJk

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 5/5] KVM: selftests: Add selftests for dirty quota throttling
  2022-09-15 10:10 ` [PATCH v6 5/5] KVM: selftests: Add selftests for dirty quota throttling Shivam Kumar
@ 2022-10-07 18:29   ` Sean Christopherson
  2022-10-09 19:26     ` Shivam Kumar
  0 siblings, 1 reply; 28+ messages in thread
From: Sean Christopherson @ 2022-10-07 18:29 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Thu, Sep 15, 2022, Shivam Kumar wrote:
>  #endif /* SELFTEST_KVM_UTIL_BASE_H */
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 9889fe0d8919..4c7bd9807d0b 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -18,6 +18,7 @@
>  #include <linux/kernel.h>
>  
>  #define KVM_UTIL_MIN_PFN	2
> +#define PML_BUFFER_SIZE	512

...

> +	/*
> +	 * Due to PML, number of pages dirtied by the vcpu can exceed its dirty
> +	 * quota by PML buffer size.
> +	 */

Buffering for PML is very Intel centric, and even then it's not guaranteed.  In
x86, PML can and should be detected programmatically:

bool kvm_is_pml_enabled(void)
{
	return is_intel_cpu() && get_kvm_intel_param_bool("pml");
}


Not sure if it's worth a generic arch hook to get the CPU buffer size, e.g. the
test could do:


	/*
	 * Allow ??? pages of overrun, KVM is allowed to dirty multiple pages
	 * before exiting to userspace, e.g. when emulating an instruction that
	 * performs multiple memory accesses.
	 */
	uint64_t buffer = ???;

	/*
	 * When Intel's Page-Modification Logging (PML) is enabled, the CPU may
	 * dirty up to 512 pages (number of entries in the PML buffer) without
	 * exiting, thus KVM may effectively dirty that many pages before
	 * enforcing the dirty quota.
	 */
#ifdef __x86_64__
	if (kvm_is_pml_enabled(void)
		buffer = PML_BUFFER_SIZE;
#endif
	

> +	TEST_ASSERT(count <= quota + PML_BUFFER_SIZE, "Invalid number of pages

Clarify _why_ the count is invalid.

> +		dirtied: count=%"PRIu64", quota=%"PRIu64"\n", count, quota);

Don't split strings unless it's truly necessary.  This is one of those cases where
running over the 80 char soft limit is preferable to wrapping.  And no need to use
PRIu64, selftests are 64-bit only.  E.g.

	TEST_ASSERT(count <= (quota + buffer),
		    "KVM dirtied too many pages: count = %lx, quota = %lx, buffer = %lx",
		    count, quota, buffer);


> +
> +	TEST_ASSERT(count >= quota, "Dirty quota exit happened with quota yet to
> +		be exhausted: count=%"PRIu64", quota=%"PRIu64"\n", count, quota);

Similar comments here.

> +
> +	if (count > quota)
> +		pr_info("Dirty quota exit with unequal quota and count:
> +			count=%"PRIu64", quota=%"PRIu64"\n", count, quota);

Not sure I'd bother with this.  Realistically, is anyone ever going to be able to
do anything useful with this information?  E.g. is this just going to be a "PML is
enabled!" message?

> +
> +	run->dirty_quota = count + test_dirty_quota_increment;
> +}
> -- 
> 2.22.3
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-09-15 10:10 ` [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
  2022-09-15 13:21   ` Christian Borntraeger
@ 2022-10-07 19:08   ` Sean Christopherson
  2022-10-07 19:20     ` Sean Christopherson
  2022-10-09 19:30     ` Shivam Kumar
  2022-10-10  5:41   ` Shivam Kumar
  2 siblings, 2 replies; 28+ messages in thread
From: Sean Christopherson @ 2022-10-07 19:08 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Thu, Sep 15, 2022, Shivam Kumar wrote:
> @@ -542,6 +545,21 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
>  	return cmpxchg(&vcpu->mode, IN_GUEST_MODE, EXITING_GUEST_MODE);
>  }
>  
> +static inline int kvm_vcpu_check_dirty_quota(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_run *run = vcpu->run;
> +	u64 dirty_quota = READ_ONCE(run->dirty_quota);
> +	u64 pages_dirtied = vcpu->stat.generic.pages_dirtied;
> +
> +	if (!dirty_quota || (pages_dirtied < dirty_quota))
> +		return 1;
> +
> +	run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> +	run->dirty_quota_exit.count = pages_dirtied;
> +	run->dirty_quota_exit.quota = dirty_quota;
> +	return 0;

Dead code.

> +}

...

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 584a5bab3af3..f315af50037d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3298,18 +3298,36 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
>  }
>  EXPORT_SYMBOL_GPL(kvm_clear_guest);
>  
> +static void kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)

Ouch, sorry.  I suspect you got this name from me[*].  That was a goof on my end,
I'm 99% certain I copy-pasted stale code, i.e. didn't intended to suggest a
rename.

Let's keep kvm_vcpu_check_dirty_quota(), IMO that's still the least awful name.

[*] https://lore.kernel.org/all/Yo+82LjHSOdyxKzT@google.com

> +{
> +	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
> +
> +	if (!dirty_quota || (vcpu->stat.generic.pages_dirtied < dirty_quota))
> +		return;
> +
> +	/*
> +	 * Snapshot the quota to report it to userspace.  The dirty count will be
> +	 * captured when the request is processed.
> +	 */
> +	vcpu->dirty_quota = dirty_quota;
> +	kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);

Making the request needs to be guarded with an arch opt-in.  Pending requests
prevent KVM from entering the guest, and so making a request that an arch isn't
aware of will effectively hang the vCPU.  Obviously userspace would be shooting
itself in the foot by setting run->dirty_quota in this case, but KVM shouldn't
hand userspace a loaded gun and help them aim.

My suggestion from v1[*] about not forcing architectures to opt-in was in the
context of a request-less implementation where dirty_quota was a nop until the
arch took action.

And regardless of arch opt-in, I think this needs a capability so that userspace
can detect KVM support.

I don't see any reason to wrap the request or vcpu_run field, e.g. something like
this should suffice:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ea5847d22aff..93362441215b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3298,8 +3298,9 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
 }
 EXPORT_SYMBOL_GPL(kvm_clear_guest);
 
-static void kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
+static void kvm_vcpu_check_dirty_quota(struct kvm_vcpu *vcpu)
 {
+#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
        u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
 
        if (!dirty_quota || (vcpu->stat.generic.pages_dirtied < dirty_quota))
@@ -3311,6 +3312,7 @@ static void kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
         */
        vcpu->dirty_quota = dirty_quota;
        kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
+#endif
 }
 
 void mark_page_dirty_in_slot(struct kvm *kvm,
@@ -4507,6 +4509,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
        case KVM_CAP_BINARY_STATS_FD:
        case KVM_CAP_SYSTEM_EVENT_DATA:
                return 1;
+       case KVM_CAP_DIRTY_QUOTA:
+               return !!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_QUOTA);
        default:
                break;
        }


[*] https://lore.kernel.org/all/YZaUENi0ZyQi%2F9M0@google.com

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-10-07 19:08   ` Sean Christopherson
@ 2022-10-07 19:20     ` Sean Christopherson
  2022-10-09 18:49       ` Shivam Kumar
  2022-10-09 19:30     ` Shivam Kumar
  1 sibling, 1 reply; 28+ messages in thread
From: Sean Christopherson @ 2022-10-07 19:20 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Fri, Oct 07, 2022, Sean Christopherson wrote:
> On Thu, Sep 15, 2022, Shivam Kumar wrote:
> Let's keep kvm_vcpu_check_dirty_quota(), IMO that's still the least awful name.
> 
> [*] https://lore.kernel.org/all/Yo+82LjHSOdyxKzT@google.com

Actually, I take that back.  The code snippet itself is also flawed.  If userspace
increases the quota (or disables it entirely) between KVM snapshotting the quota
and making the request, then there's no need for KVM to exit to userspace.

So I think this can be:

static void kvm_vcpu_is_dirty_quota_exchausted(struct kvm_vcpu *vcpu)
{
#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);

	return dirty_quota && (vcpu->stat.generic.pages_dirtied >= dirty_quota);
#else
	return false;
#endif
}

and the usage becomes:

		if (kvm_vcpu_is_dirty_quota_exhausted(vcpu))
			kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);

More thoughts in the x86 patch.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Dirty quota-based throttling of vcpus
  2022-09-15 10:10 ` [PATCH v6 2/5] KVM: x86: Dirty " Shivam Kumar
@ 2022-10-07 19:30   ` Sean Christopherson
  2022-10-09 19:05     ` Shivam Kumar
  2022-10-09 19:17     ` Shivam Kumar
  0 siblings, 2 replies; 28+ messages in thread
From: Sean Christopherson @ 2022-10-07 19:30 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Thu, Sep 15, 2022, Shivam Kumar wrote:
> index c9b49a09e6b5..140a5511ed45 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5749,6 +5749,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>  		 */
>  		if (__xfer_to_guest_mode_work_pending())
>  			return 1;
> +
> +		if (!kvm_vcpu_check_dirty_quota(vcpu))

A dedicated helper is no longer necessary, this can _test_ the event, a la
KVM_REQ_EVENT above:

		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
			return 1;

> +			return 0;
>  	}
>  
>  	return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 43a6a7efc6ec..8447e70b26f5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10379,6 +10379,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			r = 0;
>  			goto out;
>  		}
> +		if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
> +			struct kvm_run *run = vcpu->run;
> +

> +			run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> +			run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
> +			run->dirty_quota_exit.quota = vcpu->dirty_quota;

As mentioned in patch 1, the code code snippet I suggested is bad.  With a request,
there's no need to snapshot the quota prior to making the request.  If userspace
changes the quota, then using the new quota is perfectly ok since KVM is still
providing sane data.  If userspace lowers the quota, an exit will still ocurr as
expected.  If userspace disables or increases the quota to the point where no exit
is necessary, then userspace can't expect and exit and won't even be aware that KVM
temporarily detected an exhausted quota.

And all of this belongs in a common KVM helper.  Name isn't pefect, but it aligns
with kvm_check_request().

#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
{
	struct kvm_run *run = vcpu->run;

	run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
	run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
	run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);

	/*
	 * Re-check the quota and exit if and only if the vCPU still exceeds its
	 * quota.  If userspace increases (or disables entirely) the quota, then
	 * no exit is required as KVM is still honoring its ABI, e.g. userspace
	 * won't even be aware that KVM temporarily detected an exhausted quota.
	 */
	return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota; 
}
#endif

And then arch usage is simply something like:

		if (kvm_check_dirty_quota_request(vcpu)) {
			r = 0;
			goto out;
		}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-10-07 18:18       ` Sean Christopherson
@ 2022-10-09 18:36         ` Shivam Kumar
  2022-10-10  6:12           ` Christian Borntraeger
  0 siblings, 1 reply; 28+ messages in thread
From: Shivam Kumar @ 2022-10-09 18:36 UTC (permalink / raw)
  To: Sean Christopherson, Christian Borntraeger
  Cc: pbonzini, maz, james.morse, david, Claudio Imbrenda,
	Janosch Frank, kvm, Shaju Abraham, Manish Mishra,
	Anurag Madnawat, Heiko Carstens, Sven Schnelle, Vasily Gorbik,
	Alexander Gordeev



On 07/10/22 11:48 pm, Sean Christopherson wrote:
> On Thu, Sep 15, 2022, Christian Borntraeger wrote:
>> Am 15.09.22 um 15:21 schrieb Christian Borntraeger:
>>> I am wondering if this will work on s390. On s390  we only call
>>> mark_page_dirty_in_slot for the kvm_read/write functions but not
>>> for those done by the guest on fault. We do account those lazily in
>>> kvm_arch_sync_dirty_log (like x96 in the past).
>>>
>>
>> I think we need to rework the page fault handling on s390 to actually make
>> use of this. This has to happen anyway somewhen (as indicated by the guest
>> enter/exit rework from Mark). Right now we handle KVM page faults directly
>> in the normal system fault handler. It seems we need to make a side turn
>> into KVM for page faults on guests in the long run.
> 
> s390's page table management came up in conversation at KVM Forum in the context
> of a common MMU (or MMUs)[*].  If/when we get an MMU that supports multiple
> architectures, that would be a perfect opportunity for s390 to switch.  I don't
> know that it'll be feasible to make a common MMU support s390 from the get-go,
> but at the very least we can keep y'all in the loop and not make it unnecessarily
> difficult to support s390.
> 
> [*] https://urldefense.proofpoint.com/v2/url?u=https-3A__kvmforum2022.sched.com_event_15jJk&d=DwIDAw&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=ea1RmNv-vQM3Ad3SEOr7EmyBdSD1qIFG_Vp8yROenZZFQWjn4Dm6CGaOpNE0LIeJ&s=GRTR1AntYFCoIjcd2GajoTeeek3nLNgmU2slGAbzSgI&e=
Thank you Sean for the reference to the KVM Forum talk. Should I skip 
the support for s390 (patch no. 4) for the next patchset?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-10-07 19:20     ` Sean Christopherson
@ 2022-10-09 18:49       ` Shivam Kumar
  2022-10-10 16:09         ` Sean Christopherson
  0 siblings, 1 reply; 28+ messages in thread
From: Shivam Kumar @ 2022-10-09 18:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat



On 08/10/22 12:50 am, Sean Christopherson wrote:
> On Fri, Oct 07, 2022, Sean Christopherson wrote:
>> On Thu, Sep 15, 2022, Shivam Kumar wrote:
>> Let's keep kvm_vcpu_check_dirty_quota(), IMO that's still the least awful name.
>>
>> [*] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_Yo-2B82LjHSOdyxKzT-40google.com&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=0-XNirx6DRihxIvWzzJHJnErbZelq39geArwcitkIRgMl23nTXBs57QP543DuFnw&s=7zXRbLuhXLpsET-zMv7muSajxOFUoktaL97P3huVuhA&e=
> 
> Actually, I take that back.  The code snippet itself is also flawed.  If userspace
> increases the quota (or disables it entirely) between KVM snapshotting the quota
> and making the request, then there's no need for KVM to exit to userspace.
> 
> So I think this can be:
> 
> static void kvm_vcpu_is_dirty_quota_exchausted(struct kvm_vcpu *vcpu)
> {
> #ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> 	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
> 
> 	return dirty_quota && (vcpu->stat.generic.pages_dirtied >= dirty_quota);
> #else
> 	return false;
> #endif
> }
> 
> and the usage becomes:
> 
> 		if (kvm_vcpu_is_dirty_quota_exhausted(vcpu))
> 			kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
> 
> More thoughts in the x86 patch.

Snapshotting is not a requirement for now anyway. We have plans to 
lazily update the quota, i.e. only when it needs to do more dirtying. 
This helps us prevent overthrottling of the VM due to skewed cases where 
some vcpus are mostly reading and the others are mostly wirting.


So, this looks good from every angle. The name 
kvm_vcpu_is_dirty_quota_exchausted, though verbose, looks good to me.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Dirty quota-based throttling of vcpus
  2022-10-07 19:30   ` Sean Christopherson
@ 2022-10-09 19:05     ` Shivam Kumar
  2022-10-18 17:43       ` Shivam Kumar
  2022-10-09 19:17     ` Shivam Kumar
  1 sibling, 1 reply; 28+ messages in thread
From: Shivam Kumar @ 2022-10-09 19:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat



On 08/10/22 1:00 am, Sean Christopherson wrote:
> On Thu, Sep 15, 2022, Shivam Kumar wrote:
>> index c9b49a09e6b5..140a5511ed45 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -5749,6 +5749,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>>   		 */
>>   		if (__xfer_to_guest_mode_work_pending())
>>   			return 1;
>> +
>> +		if (!kvm_vcpu_check_dirty_quota(vcpu))
> 
> A dedicated helper is no longer necessary, this can _test_ the event, a la
> KVM_REQ_EVENT above:
> 
> 		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
> 			return 1;
> 
>> +			return 0;
>>   	}
>>   
>>   	return 1;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 43a6a7efc6ec..8447e70b26f5 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10379,6 +10379,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>   			r = 0;
>>   			goto out;
>>   		}
>> +		if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
>> +			struct kvm_run *run = vcpu->run;
>> +
> 
>> +			run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
>> +			run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
>> +			run->dirty_quota_exit.quota = vcpu->dirty_quota;
> 
> As mentioned in patch 1, the code code snippet I suggested is bad.  With a request,
> there's no need to snapshot the quota prior to making the request.  If userspace
> changes the quota, then using the new quota is perfectly ok since KVM is still
> providing sane data.  If userspace lowers the quota, an exit will still ocurr as
> expected.  If userspace disables or increases the quota to the point where no exit
> is necessary, then userspace can't expect and exit and won't even be aware that KVM
> temporarily detected an exhausted quota.
> 
> And all of this belongs in a common KVM helper.  Name isn't pefect, but it aligns
> with kvm_check_request().
> 
> #ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
> {
> 	struct kvm_run *run = vcpu->run;
> 
> 	run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> 	run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
> 	run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
> 
> 	/*
> 	 * Re-check the quota and exit if and only if the vCPU still exceeds its
> 	 * quota.  If userspace increases (or disables entirely) the quota, then
> 	 * no exit is required as KVM is still honoring its ABI, e.g. userspace
> 	 * won't even be aware that KVM temporarily detected an exhausted quota.
> 	 */
> 	return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
> }
> #endif
> 
> And then arch usage is simply something like:
> 
> 		if (kvm_check_dirty_quota_request(vcpu)) {
> 			r = 0;
> 			goto out;
> 		}
If we are not even checking for request KVM_REQ_DIRTY_QUOTA_EXIT, what's 
the use of kvm_make_request in patch 1?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Dirty quota-based throttling of vcpus
  2022-10-07 19:30   ` Sean Christopherson
  2022-10-09 19:05     ` Shivam Kumar
@ 2022-10-09 19:17     ` Shivam Kumar
  1 sibling, 0 replies; 28+ messages in thread
From: Shivam Kumar @ 2022-10-09 19:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat



On 08/10/22 1:00 am, Sean Christopherson wrote:
> On Thu, Sep 15, 2022, Shivam Kumar wrote:
>> index c9b49a09e6b5..140a5511ed45 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -5749,6 +5749,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>>   		 */
>>   		if (__xfer_to_guest_mode_work_pending())
>>   			return 1;
>> +
>> +		if (!kvm_vcpu_check_dirty_quota(vcpu))
> 
> A dedicated helper is no longer necessary, this can _test_ the event, a la
> KVM_REQ_EVENT above:
> 
> 		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
> 			return 1;
Sure, thanks.
> 
>> +			return 0;
>>   	}
>>   
>>   	return 1;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 43a6a7efc6ec..8447e70b26f5 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10379,6 +10379,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>   			r = 0;
>>   			goto out;
>>   		}
>> +		if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
>> +			struct kvm_run *run = vcpu->run;
>> +
> 
>> +			run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
>> +			run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
>> +			run->dirty_quota_exit.quota = vcpu->dirty_quota;
> 
> As mentioned in patch 1, the code code snippet I suggested is bad.  With a request,
> there's no need to snapshot the quota prior to making the request.  If userspace
> changes the quota, then using the new quota is perfectly ok since KVM is still
> providing sane data.  If userspace lowers the quota, an exit will still ocurr as
> expected.  If userspace disables or increases the quota to the point where no exit
> is necessary, then userspace can't expect and exit and won't even be aware that KVM
> temporarily detected an exhausted quota.
> 
> And all of this belongs in a common KVM helper.  Name isn't pefect, but it aligns
> with kvm_check_request().
> 
> #ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
> {
> 	struct kvm_run *run = vcpu->run;
> 
> 	run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> 	run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
> 	run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
> 
> 	/*
> 	 * Re-check the quota and exit if and only if the vCPU still exceeds its
> 	 * quota.  If userspace increases (or disables entirely) the quota, then
> 	 * no exit is required as KVM is still honoring its ABI, e.g. userspace
> 	 * won't even be aware that KVM temporarily detected an exhausted quota.
> 	 */
> 	return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
> }
> #endif
> 
> And then arch usage is simply something like:
> 
> 		if (kvm_check_dirty_quota_request(vcpu)) {
> 			r = 0;
> 			goto out;
> 		}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 5/5] KVM: selftests: Add selftests for dirty quota throttling
  2022-10-07 18:29   ` Sean Christopherson
@ 2022-10-09 19:26     ` Shivam Kumar
  2022-10-10 15:47       ` Sean Christopherson
  0 siblings, 1 reply; 28+ messages in thread
From: Shivam Kumar @ 2022-10-09 19:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat



On 07/10/22 11:59 pm, Sean Christopherson wrote:
> On Thu, Sep 15, 2022, Shivam Kumar wrote:
>>   #endif /* SELFTEST_KVM_UTIL_BASE_H */
>> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
>> index 9889fe0d8919..4c7bd9807d0b 100644
>> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
>> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
>> @@ -18,6 +18,7 @@
>>   #include <linux/kernel.h>
>>   
>>   #define KVM_UTIL_MIN_PFN	2
>> +#define PML_BUFFER_SIZE	512
> 
> ...
> 
>> +	/*
>> +	 * Due to PML, number of pages dirtied by the vcpu can exceed its dirty
>> +	 * quota by PML buffer size.
>> +	 */
> 
> Buffering for PML is very Intel centric, and even then it's not guaranteed.  In
> x86, PML can and should be detected programmatically:
> 
> bool kvm_is_pml_enabled(void)
> {
> 	return is_intel_cpu() && get_kvm_intel_param_bool("pml");
> }
> Yes, I was looking for something like this. Thanks.
> 
> Not sure if it's worth a generic arch hook to get the CPU buffer size, e.g. the
> test could do:
> 
I can't think of any usecase for a custom buffer as of now. We are 
working on a proposal for dynamically adjusting the PML buffer size in 
order to throttle effectively with dirty quota. A potential usecase.
> 
> 	/*
> 	 * Allow ??? pages of overrun, KVM is allowed to dirty multiple pages
> 	 * before exiting to userspace, e.g. when emulating an instruction that
> 	 * performs multiple memory accesses.
> 	 */
> 	uint64_t buffer = ???;
> 
> 	/*
> 	 * When Intel's Page-Modification Logging (PML) is enabled, the CPU may
> 	 * dirty up to 512 pages (number of entries in the PML buffer) without
> 	 * exiting, thus KVM may effectively dirty that many pages before
> 	 * enforcing the dirty quota.
> 	 */
> #ifdef __x86_64__
> 	if (kvm_is_pml_enabled(void)
> 		buffer = PML_BUFFER_SIZE;
> #endif
> 	
> 
>> +	TEST_ASSERT(count <= quota + PML_BUFFER_SIZE, "Invalid number of pages
> 
> Clarify _why_ the count is invalid.
In the worst case, the vcpu will be able to dirty 512 more pages than 
its dirty quota due to PML.
> 
>> +		dirtied: count=%"PRIu64", quota=%"PRIu64"\n", count, quota);
> 
> Don't split strings unless it's truly necessary.  This is one of those cases where
> running over the 80 char soft limit is preferable to wrapping.  And no need to use
> PRIu64, selftests are 64-bit only.  E.g.
Got it, thanks.
> 
> 	TEST_ASSERT(count <= (quota + buffer),
> 		    "KVM dirtied too many pages: count = %lx, quota = %lx, buffer = %lx",
> 		    count, quota, buffer);
> 
> 
>> +
>> +	TEST_ASSERT(count >= quota, "Dirty quota exit happened with quota yet to
>> +		be exhausted: count=%"PRIu64", quota=%"PRIu64"\n", count, quota);
> 
> Similar comments here.
> 
>> +
>> +	if (count > quota)
>> +		pr_info("Dirty quota exit with unequal quota and count:
>> +			count=%"PRIu64", quota=%"PRIu64"\n", count, quota);
> 
> Not sure I'd bother with this.  Realistically, is anyone ever going to be able to
> do anything useful with this information?  E.g. is this just going to be a "PML is
> enabled!" message?
I thought this information could help detect anomalies when PML is 
disabled. If PML is disabled, I am expecting that the exit would be case 
when the quota equals count and count wouldn't ever exceed quota.

Now, when I think of, I think I should move this statement inside an 
'if' block and make it too an assertion.
	if (!kvm_is_pml_enabled(void))
		TEST_ASSERT(count == quota, "Dirty quota exit with unequal quota and 
count.");

> 
>> +
>> +	run->dirty_quota = count + test_dirty_quota_increment;
>> +}
>> -- 
>> 2.22.3
>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-10-07 19:08   ` Sean Christopherson
  2022-10-07 19:20     ` Sean Christopherson
@ 2022-10-09 19:30     ` Shivam Kumar
  1 sibling, 0 replies; 28+ messages in thread
From: Shivam Kumar @ 2022-10-09 19:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat



On 08/10/22 12:38 am, Sean Christopherson wrote:
> On Thu, Sep 15, 2022, Shivam Kumar wrote:
>> @@ -542,6 +545,21 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
>>   	return cmpxchg(&vcpu->mode, IN_GUEST_MODE, EXITING_GUEST_MODE);
>>   }
>>   
>> +static inline int kvm_vcpu_check_dirty_quota(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm_run *run = vcpu->run;
>> +	u64 dirty_quota = READ_ONCE(run->dirty_quota);
>> +	u64 pages_dirtied = vcpu->stat.generic.pages_dirtied;
>> +
>> +	if (!dirty_quota || (pages_dirtied < dirty_quota))
>> +		return 1;
>> +
>> +	run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
>> +	run->dirty_quota_exit.count = pages_dirtied;
>> +	run->dirty_quota_exit.quota = dirty_quota;
>> +	return 0;
> 
> Dead code.
> 
>> +}
> 
> ...
> 
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 584a5bab3af3..f315af50037d 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -3298,18 +3298,36 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
>>   }
>>   EXPORT_SYMBOL_GPL(kvm_clear_guest);
>>   
>> +static void kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
> 
> Ouch, sorry.  I suspect you got this name from me[*].  That was a goof on my end,
> I'm 99% certain I copy-pasted stale code, i.e. didn't intended to suggest a
> rename.
> 
> Let's keep kvm_vcpu_check_dirty_quota(), IMO that's still the least awful name.
> 
> [*] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_Yo-2B82LjHSOdyxKzT-40google.com&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=IIED7C8bZGKU5CadTcY5IR4yLs79Rsv2HVHCn5FIL8TQpiAbpjEBs97euDE3FFZf&s=WyikNnoMe8QzP5dTKKectMN-uxc_fTby1FlKVAaS7zI&e=
> 
>> +{
>> +	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
>> +
>> +	if (!dirty_quota || (vcpu->stat.generic.pages_dirtied < dirty_quota))
>> +		return;
>> +
>> +	/*
>> +	 * Snapshot the quota to report it to userspace.  The dirty count will be
>> +	 * captured when the request is processed.
>> +	 */
>> +	vcpu->dirty_quota = dirty_quota;
>> +	kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
> 
> Making the request needs to be guarded with an arch opt-in.  Pending requests
> prevent KVM from entering the guest, and so making a request that an arch isn't
> aware of will effectively hang the vCPU.  Obviously userspace would be shooting
> itself in the foot by setting run->dirty_quota in this case, but KVM shouldn't
> hand userspace a loaded gun and help them aim.
> 
> My suggestion from v1[*] about not forcing architectures to opt-in was in the
> context of a request-less implementation where dirty_quota was a nop until the
> arch took action.
> 
> And regardless of arch opt-in, I think this needs a capability so that userspace
> can detect KVM support.
> 
> I don't see any reason to wrap the request or vcpu_run field, e.g. something like
> this should suffice:
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ea5847d22aff..93362441215b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3298,8 +3298,9 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
>   }
>   EXPORT_SYMBOL_GPL(kvm_clear_guest);
>   
> -static void kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
> +static void kvm_vcpu_check_dirty_quota(struct kvm_vcpu *vcpu)
>   {
> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
>          u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
>   
>          if (!dirty_quota || (vcpu->stat.generic.pages_dirtied < dirty_quota))
> @@ -3311,6 +3312,7 @@ static void kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
>           */
>          vcpu->dirty_quota = dirty_quota;
>          kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
> +#endif
>   }
>   
>   void mark_page_dirty_in_slot(struct kvm *kvm,
> @@ -4507,6 +4509,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>          case KVM_CAP_BINARY_STATS_FD:
>          case KVM_CAP_SYSTEM_EVENT_DATA:
>                  return 1;
> +       case KVM_CAP_DIRTY_QUOTA:
> +               return !!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_QUOTA);
>          default:
>                  break;
>          }
> 
> 
> [*] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_YZaUENi0ZyQi-252F9M0-40google.com&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=IIED7C8bZGKU5CadTcY5IR4yLs79Rsv2HVHCn5FIL8TQpiAbpjEBs97euDE3FFZf&s=kbNeQcRDxAxx4SEPIe_tL-_1dYMa9zx-REEozKvO0w0&e=

Will add. Thank you so much for the comments.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-09-15 10:10 ` [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
  2022-09-15 13:21   ` Christian Borntraeger
  2022-10-07 19:08   ` Sean Christopherson
@ 2022-10-10  5:41   ` Shivam Kumar
  2022-10-17  5:28     ` Shivam Kumar
  2 siblings, 1 reply; 28+ messages in thread
From: Shivam Kumar @ 2022-10-10  5:41 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david
  Cc: kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat



On 15/09/22 3:40 pm, Shivam Kumar wrote:
> Define variables to track and throttle memory dirtying for every vcpu.
> 
> dirty_count:    Number of pages the vcpu has dirtied since its creation,
>                  while dirty logging is enabled.
> dirty_quota:    Number of pages the vcpu is allowed to dirty. To dirty
>                  more, it needs to request more quota by exiting to
>                  userspace.
> 
> Implement the flow for throttling based on dirty quota.
> 
> i) Increment dirty_count for the vcpu whenever it dirties a page.
> ii) Exit to userspace whenever the dirty quota is exhausted (i.e. dirty
> count equals/exceeds dirty quota) to request more dirty quota.
> 
> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> ---
>   Documentation/virt/kvm/api.rst | 35 ++++++++++++++++++++++++++++++++++
>   include/linux/kvm_host.h       | 20 ++++++++++++++++++-
>   include/linux/kvm_types.h      |  1 +
>   include/uapi/linux/kvm.h       | 12 ++++++++++++
>   virt/kvm/kvm_main.c            | 26 ++++++++++++++++++++++---
>   5 files changed, 90 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index abd7c32126ce..97030a6a35b4 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6614,6 +6614,26 @@ array field represents return values. The userspace should update the return
>   values of SBI call before resuming the VCPU. For more details on RISC-V SBI
>   spec refer, https://github.com/riscv/riscv-sbi-doc.
>   
> +::
> +
> +		/* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
> +		struct {
> +			__u64 count;
> +			__u64 quota;
> +		} dirty_quota_exit;
> +
> +If exit reason is KVM_EXIT_DIRTY_QUOTA_EXHAUSTED, it indicates that the VCPU has
> +exhausted its dirty quota. The 'dirty_quota_exit' member of kvm_run structure
> +makes the following information available to the userspace:
> +    count: the current count of pages dirtied by the VCPU, can be
> +    skewed based on the size of the pages accessed by each vCPU.
> +    quota: the observed dirty quota just before the exit to userspace.
> +
> +The userspace can design a strategy to allocate the overall scope of dirtying
> +for the VM among the vcpus. Based on the strategy and the current state of dirty
> +quota throttling, the userspace can make a decision to either update (increase)
> +the quota or to put the VCPU to sleep for some time.
> +
>   ::
>   
>       /* KVM_EXIT_NOTIFY */
> @@ -6668,6 +6688,21 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
>   
>   ::
>   
> +	/*
> +	 * Number of pages the vCPU is allowed to have dirtied over its entire
> +	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the quota
> +	 * is reached/exceeded.
> +	 */
> +	__u64 dirty_quota;
> +
> +Please note that enforcing the quota is best effort, as the guest may dirty
> +multiple pages before KVM can recheck the quota.  However, unless KVM is using
> +a hardware-based dirty ring buffer, e.g. Intel's Page Modification Logging,
> +KVM will detect quota exhaustion within a handful of dirtied pages.  If a
> +hardware ring buffer is used, the overrun is bounded by the size of the buffer
> +(512 entries for PML).
> +
> +::
>     };
>   
>   
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f4519d3689e1..9acb28635d94 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -151,12 +151,13 @@ static inline bool is_error_page(struct page *page)
>   #define KVM_REQUEST_NO_ACTION      BIT(10)
>   /*
>    * Architecture-independent vcpu->requests bit members
> - * Bits 4-7 are reserved for more arch-independent bits.
> + * Bits 5-7 are reserved for more arch-independent bits.
>    */
>   #define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>   #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>   #define KVM_REQ_UNBLOCK           2
>   #define KVM_REQ_UNHALT            3
> +#define KVM_REQ_DIRTY_QUOTA_EXIT  4
>   #define KVM_REQUEST_ARCH_BASE     8
>   
>   /*
> @@ -380,6 +381,8 @@ struct kvm_vcpu {
>   	 */
>   	struct kvm_memory_slot *last_used_slot;
>   	u64 last_used_slot_gen;
> +
> +	u64 dirty_quota;
>   };
>   
>   /*
> @@ -542,6 +545,21 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
>   	return cmpxchg(&vcpu->mode, IN_GUEST_MODE, EXITING_GUEST_MODE);
>   }
>   
> +static inline int kvm_vcpu_check_dirty_quota(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_run *run = vcpu->run;
> +	u64 dirty_quota = READ_ONCE(run->dirty_quota);
> +	u64 pages_dirtied = vcpu->stat.generic.pages_dirtied;
> +
> +	if (!dirty_quota || (pages_dirtied < dirty_quota))
> +		return 1;
> +
> +	run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> +	run->dirty_quota_exit.count = pages_dirtied;
> +	run->dirty_quota_exit.quota = dirty_quota;
> +	return 0;
> +}
> +
>   /*
>    * Some of the bitops functions do not support too long bitmaps.
>    * This number must be determined not to exceed such limits.
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index 3ca3db020e0e..263a588f3cd3 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -118,6 +118,7 @@ struct kvm_vcpu_stat_generic {
>   	u64 halt_poll_fail_hist[HALT_POLL_HIST_COUNT];
>   	u64 halt_wait_hist[HALT_POLL_HIST_COUNT];
>   	u64 blocking;
> +	u64 pages_dirtied;
I am reworking the QEMU patches and I am not sure how I can access the
pages_dirtied info from the userspace side when the migration starts, i.e.
without a dirty quota exit.

I need this info to initialise the dirty quota. This is what I am looking
to do on the userspace side at the start of dirty quota migration:
	dirty_quota = pages_dirtied + some initial quota

Hoping if you could help, Sean. Thanks in advance.
>   #define KVM_STATS_NAME_SIZE	48
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index eed0315a77a6..4c4a65b0f0a5 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -272,6 +272,7 @@ struct kvm_xen_exit {
>   #define KVM_EXIT_RISCV_SBI        35
>   #define KVM_EXIT_RISCV_CSR        36
>   #define KVM_EXIT_NOTIFY           37
> +#define KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 38
>   
>   /* For KVM_EXIT_INTERNAL_ERROR */
>   /* Emulate instruction failed. */
> @@ -510,6 +511,11 @@ struct kvm_run {
>   #define KVM_NOTIFY_CONTEXT_INVALID	(1 << 0)
>   			__u32 flags;
>   		} notify;
> +		/* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
> +		struct {
> +			__u64 count;
> +			__u64 quota;
> +		} dirty_quota_exit;
>   		/* Fix the size of the union. */
>   		char padding[256];
>   	};
> @@ -531,6 +537,12 @@ struct kvm_run {
>   		struct kvm_sync_regs regs;
>   		char padding[SYNC_REGS_SIZE_BYTES];
>   	} s;
> +	/*
> +	 * Number of pages the vCPU is allowed to have dirtied over its entire
> +	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the
> +	 * quota is reached/exceeded.
> +	 */
> +	__u64 dirty_quota;
>   };
>   
>   /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 584a5bab3af3..f315af50037d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3298,18 +3298,36 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
>   }
>   EXPORT_SYMBOL_GPL(kvm_clear_guest);
>   
> +static void kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
> +{
> +	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
> +
> +	if (!dirty_quota || (vcpu->stat.generic.pages_dirtied < dirty_quota))
> +		return;
> +
> +	/*
> +	 * Snapshot the quota to report it to userspace.  The dirty count will be
> +	 * captured when the request is processed.
> +	 */
> +	vcpu->dirty_quota = dirty_quota;
> +	kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
> +}
> +
>   void mark_page_dirty_in_slot(struct kvm *kvm,
>   			     const struct kvm_memory_slot *memslot,
>   		 	     gfn_t gfn)
>   {
>   	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>   
> -#ifdef CONFIG_HAVE_KVM_DIRTY_RING
>   	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>   		return;
> -#endif
>   
> -	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> +	if (!memslot)
> +		return;
> +
> +	WARN_ON_ONCE(!vcpu->stat.generic.pages_dirtied++);
> +
> +	if (kvm_slot_dirty_track_enabled(memslot)) {
>   		unsigned long rel_gfn = gfn - memslot->base_gfn;
>   		u32 slot = (memslot->as_id << 16) | memslot->id;
>   
> @@ -3318,6 +3336,8 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>   					    slot, rel_gfn);
>   		else
>   			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> +
> +		kvm_vcpu_is_dirty_quota_exhausted(vcpu);
>   	}
>   }
>   EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-10-09 18:36         ` Shivam Kumar
@ 2022-10-10  6:12           ` Christian Borntraeger
  0 siblings, 0 replies; 28+ messages in thread
From: Christian Borntraeger @ 2022-10-10  6:12 UTC (permalink / raw)
  To: Shivam Kumar, Sean Christopherson
  Cc: pbonzini, maz, james.morse, david, Claudio Imbrenda,
	Janosch Frank, kvm, Shaju Abraham, Manish Mishra,
	Anurag Madnawat, Heiko Carstens, Sven Schnelle, Vasily Gorbik,
	Alexander Gordeev



Am 09.10.22 um 20:36 schrieb Shivam Kumar:
> 
> 
> On 07/10/22 11:48 pm, Sean Christopherson wrote:
>> On Thu, Sep 15, 2022, Christian Borntraeger wrote:
>>> Am 15.09.22 um 15:21 schrieb Christian Borntraeger:
>>>> I am wondering if this will work on s390. On s390  we only call
>>>> mark_page_dirty_in_slot for the kvm_read/write functions but not
>>>> for those done by the guest on fault. We do account those lazily in
>>>> kvm_arch_sync_dirty_log (like x96 in the past).
>>>>
>>>
>>> I think we need to rework the page fault handling on s390 to actually make
>>> use of this. This has to happen anyway somewhen (as indicated by the guest
>>> enter/exit rework from Mark). Right now we handle KVM page faults directly
>>> in the normal system fault handler. It seems we need to make a side turn
>>> into KVM for page faults on guests in the long run.
>>
>> s390's page table management came up in conversation at KVM Forum in the context
>> of a common MMU (or MMUs)[*].  If/when we get an MMU that supports multiple
>> architectures, that would be a perfect opportunity for s390 to switch.  I don't
>> know that it'll be feasible to make a common MMU support s390 from the get-go,
>> but at the very least we can keep y'all in the loop and not make it unnecessarily
>> difficult to support s390.
>>
>> [*] https://kvmforum2022.sched.com/event/15jJk
> Thank you Sean for the reference to the KVM Forum talk. Should I skip the support for s390 (patch no. 4) for the next patchset?

yes please.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 5/5] KVM: selftests: Add selftests for dirty quota throttling
  2022-10-09 19:26     ` Shivam Kumar
@ 2022-10-10 15:47       ` Sean Christopherson
  0 siblings, 0 replies; 28+ messages in thread
From: Sean Christopherson @ 2022-10-10 15:47 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Mon, Oct 10, 2022, Shivam Kumar wrote:
> 
> 
> On 07/10/22 11:59 pm, Sean Christopherson wrote:
> > On Thu, Sep 15, 2022, Shivam Kumar wrote:
> > > +	TEST_ASSERT(count <= quota + PML_BUFFER_SIZE, "Invalid number of pages
> > 
> > Clarify _why_ the count is invalid.
> In the worst case, the vcpu will be able to dirty 512 more pages than its
> dirty quota due to PML.

Heh, I know that, I wasn't asking a question.  I was saying that the assert should
tell the user/developer running the test why the count is "invalid", i.e. why the
assert failed.  Put yourself in the shoes of a random developer that's new to KVM
and knows almost nothing about dirty logging and isn't even aware PML is a thing.
Write the assert message for _that_ person.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-10-09 18:49       ` Shivam Kumar
@ 2022-10-10 16:09         ` Sean Christopherson
  0 siblings, 0 replies; 28+ messages in thread
From: Sean Christopherson @ 2022-10-10 16:09 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Mon, Oct 10, 2022, Shivam Kumar wrote:
> 
> 
> On 08/10/22 12:50 am, Sean Christopherson wrote:
> > On Fri, Oct 07, 2022, Sean Christopherson wrote:
> > > On Thu, Sep 15, 2022, Shivam Kumar wrote:
> > > Let's keep kvm_vcpu_check_dirty_quota(), IMO that's still the least awful name.
> > > 
> > > [*] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_Yo-2B82LjHSOdyxKzT-40google.com&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=0-XNirx6DRihxIvWzzJHJnErbZelq39geArwcitkIRgMl23nTXBs57QP543DuFnw&s=7zXRbLuhXLpsET-zMv7muSajxOFUoktaL97P3huVuhA&e=
> > 
> > Actually, I take that back.  The code snippet itself is also flawed.  If userspace
> > increases the quota (or disables it entirely) between KVM snapshotting the quota
> > and making the request, then there's no need for KVM to exit to userspace.
> > 
> > So I think this can be:
> > 
> > static void kvm_vcpu_is_dirty_quota_exchausted(struct kvm_vcpu *vcpu)
> > {
> > #ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> > 	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
> > 
> > 	return dirty_quota && (vcpu->stat.generic.pages_dirtied >= dirty_quota);
> > #else
> > 	return false;
> > #endif
> > }
> > 
> > and the usage becomes:
> > 
> > 		if (kvm_vcpu_is_dirty_quota_exhausted(vcpu))
> > 			kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
> > 
> > More thoughts in the x86 patch.
> 
> Snapshotting is not a requirement for now anyway. We have plans to lazily
> update the quota, i.e. only when it needs to do more dirtying. This helps us
> prevent overthrottling of the VM due to skewed cases where some vcpus are
> mostly reading and the others are mostly wirting.

I don't see how snapshotting can ever be a sane requirement.  Requiring KVM to
exit if KVM detects an exhausted quota even if userspace changes the quota is
nonsensical as the resulting behavior is 100% non-determinstic unless userspace
is spying on the number of dirty pages.  And if userspace is constly polling the
number of dirty pages, what's the point of the exit?  Requiring KVM to exit in
this case puts extra burden on KVM without any meaningful benefit.

In other words, we need consider about how KVM's behavior impacts KVM's uABI, not
just about what userspace "needs".

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-10-10  5:41   ` Shivam Kumar
@ 2022-10-17  5:28     ` Shivam Kumar
  2022-10-19 16:01       ` Sean Christopherson
  0 siblings, 1 reply; 28+ messages in thread
From: Shivam Kumar @ 2022-10-17  5:28 UTC (permalink / raw)
  To: pbonzini, seanjc, maz, james.morse, borntraeger, david
  Cc: kvm, Shaju Abraham, Manish Mishra, Anurag Madnawat



On 10/10/22 11:11 am, Shivam Kumar wrote:
> 
> 
> On 15/09/22 3:40 pm, Shivam Kumar wrote:
>> Define variables to track and throttle memory dirtying for every vcpu.
>>
>> dirty_count:    Number of pages the vcpu has dirtied since its creation,
>>                  while dirty logging is enabled.
>> dirty_quota:    Number of pages the vcpu is allowed to dirty. To dirty
>>                  more, it needs to request more quota by exiting to
>>                  userspace.
>>
>> Implement the flow for throttling based on dirty quota.
>>
>> i) Increment dirty_count for the vcpu whenever it dirties a page.
>> ii) Exit to userspace whenever the dirty quota is exhausted (i.e. dirty
>> count equals/exceeds dirty quota) to request more dirty quota.
>>
>> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
>> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
>> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
>> ---
>>   Documentation/virt/kvm/api.rst | 35 ++++++++++++++++++++++++++++++++++
>>   include/linux/kvm_host.h       | 20 ++++++++++++++++++-
>>   include/linux/kvm_types.h      |  1 +
>>   include/uapi/linux/kvm.h       | 12 ++++++++++++
>>   virt/kvm/kvm_main.c            | 26 ++++++++++++++++++++++---
>>   5 files changed, 90 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst 
>> b/Documentation/virt/kvm/api.rst
>> index abd7c32126ce..97030a6a35b4 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -6614,6 +6614,26 @@ array field represents return values. The 
>> userspace should update the return
>>   values of SBI call before resuming the VCPU. For more details on 
>> RISC-V SBI
>>   spec refer, https://github.com/riscv/riscv-sbi-doc.
>> +::
>> +
>> +        /* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
>> +        struct {
>> +            __u64 count;
>> +            __u64 quota;
>> +        } dirty_quota_exit;
>> +
>> +If exit reason is KVM_EXIT_DIRTY_QUOTA_EXHAUSTED, it indicates that 
>> the VCPU has
>> +exhausted its dirty quota. The 'dirty_quota_exit' member of kvm_run 
>> structure
>> +makes the following information available to the userspace:
>> +    count: the current count of pages dirtied by the VCPU, can be
>> +    skewed based on the size of the pages accessed by each vCPU.
>> +    quota: the observed dirty quota just before the exit to userspace.
>> +
>> +The userspace can design a strategy to allocate the overall scope of 
>> dirtying
>> +for the VM among the vcpus. Based on the strategy and the current 
>> state of dirty
>> +quota throttling, the userspace can make a decision to either update 
>> (increase)
>> +the quota or to put the VCPU to sleep for some time.
>> +
>>   ::
>>       /* KVM_EXIT_NOTIFY */
>> @@ -6668,6 +6688,21 @@ values in kvm_run even if the corresponding bit 
>> in kvm_dirty_regs is not set.
>>   ::
>> +    /*
>> +     * Number of pages the vCPU is allowed to have dirtied over its 
>> entire
>> +     * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 
>> if the quota
>> +     * is reached/exceeded.
>> +     */
>> +    __u64 dirty_quota;
>> +
>> +Please note that enforcing the quota is best effort, as the guest may 
>> dirty
>> +multiple pages before KVM can recheck the quota.  However, unless KVM 
>> is using
>> +a hardware-based dirty ring buffer, e.g. Intel's Page Modification 
>> Logging,
>> +KVM will detect quota exhaustion within a handful of dirtied pages.  
>> If a
>> +hardware ring buffer is used, the overrun is bounded by the size of 
>> the buffer
>> +(512 entries for PML).
>> +
>> +::
>>     };
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index f4519d3689e1..9acb28635d94 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -151,12 +151,13 @@ static inline bool is_error_page(struct page *page)
>>   #define KVM_REQUEST_NO_ACTION      BIT(10)
>>   /*
>>    * Architecture-independent vcpu->requests bit members
>> - * Bits 4-7 are reserved for more arch-independent bits.
>> + * Bits 5-7 are reserved for more arch-independent bits.
>>    */
>>   #define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | 
>> KVM_REQUEST_NO_WAKEUP)
>>   #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | 
>> KVM_REQUEST_NO_WAKEUP)
>>   #define KVM_REQ_UNBLOCK           2
>>   #define KVM_REQ_UNHALT            3
>> +#define KVM_REQ_DIRTY_QUOTA_EXIT  4
>>   #define KVM_REQUEST_ARCH_BASE     8
>>   /*
>> @@ -380,6 +381,8 @@ struct kvm_vcpu {
>>        */
>>       struct kvm_memory_slot *last_used_slot;
>>       u64 last_used_slot_gen;
>> +
>> +    u64 dirty_quota;
>>   };
>>   /*
>> @@ -542,6 +545,21 @@ static inline int 
>> kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
>>       return cmpxchg(&vcpu->mode, IN_GUEST_MODE, EXITING_GUEST_MODE);
>>   }
>> +static inline int kvm_vcpu_check_dirty_quota(struct kvm_vcpu *vcpu)
>> +{
>> +    struct kvm_run *run = vcpu->run;
>> +    u64 dirty_quota = READ_ONCE(run->dirty_quota);
>> +    u64 pages_dirtied = vcpu->stat.generic.pages_dirtied;
>> +
>> +    if (!dirty_quota || (pages_dirtied < dirty_quota))
>> +        return 1;
>> +
>> +    run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
>> +    run->dirty_quota_exit.count = pages_dirtied;
>> +    run->dirty_quota_exit.quota = dirty_quota;
>> +    return 0;
>> +}
>> +
>>   /*
>>    * Some of the bitops functions do not support too long bitmaps.
>>    * This number must be determined not to exceed such limits.
>> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
>> index 3ca3db020e0e..263a588f3cd3 100644
>> --- a/include/linux/kvm_types.h
>> +++ b/include/linux/kvm_types.h
>> @@ -118,6 +118,7 @@ struct kvm_vcpu_stat_generic {
>>       u64 halt_poll_fail_hist[HALT_POLL_HIST_COUNT];
>>       u64 halt_wait_hist[HALT_POLL_HIST_COUNT];
>>       u64 blocking;
>> +    u64 pages_dirtied;
> I am reworking the QEMU patches and I am not sure how I can access the
> pages_dirtied info from the userspace side when the migration starts, i.e.
> without a dirty quota exit.
> 
> I need this info to initialise the dirty quota. This is what I am looking
> to do on the userspace side at the start of dirty quota migration:
>      dirty_quota = pages_dirtied + some initial quota
> 
> Hoping if you could help, Sean. Thanks in advance.
I think I can set dirty_quota initially to 1 and let the vpcu exit with 
exit reason KVM_EXIT_DIRTY_QUOTA_EXHAUSTED. Then, I can set the quota.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Dirty quota-based throttling of vcpus
  2022-10-09 19:05     ` Shivam Kumar
@ 2022-10-18 17:43       ` Shivam Kumar
  2022-10-19 15:42         ` Sean Christopherson
  0 siblings, 1 reply; 28+ messages in thread
From: Shivam Kumar @ 2022-10-18 17:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

>>> @@ -10379,6 +10379,15 @@ static int vcpu_enter_guest(struct kvm_vcpu 
>>> *vcpu)
>>>               r = 0;
>>>               goto out;
>>>           }
>>> +        if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
>>> +            struct kvm_run *run = vcpu->run;
>>> +
>>
>>> +            run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
>>> +            run->dirty_quota_exit.count = 
>>> vcpu->stat.generic.pages_dirtied;
>>> +            run->dirty_quota_exit.quota = vcpu->dirty_quota;
>>
>> As mentioned in patch 1, the code code snippet I suggested is bad.  
>> With a request,
>> there's no need to snapshot the quota prior to making the request.  If 
>> userspace
>> changes the quota, then using the new quota is perfectly ok since KVM 
>> is still
>> providing sane data.  If userspace lowers the quota, an exit will 
>> still ocurr as
>> expected.  If userspace disables or increases the quota to the point 
>> where no exit
>> is necessary, then userspace can't expect and exit and won't even be 
>> aware that KVM
>> temporarily detected an exhausted quota.
>>
>> And all of this belongs in a common KVM helper.  Name isn't pefect, 
>> but it aligns
>> with kvm_check_request().
>>
>> #ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
>> static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
>> {
>>     struct kvm_run *run = vcpu->run;
>>
>>     run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
>>     run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
>>     run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
>>
>>     /*
>>      * Re-check the quota and exit if and only if the vCPU still 
>> exceeds its
>>      * quota.  If userspace increases (or disables entirely) the 
>> quota, then
>>      * no exit is required as KVM is still honoring its ABI, e.g. 
>> userspace
>>      * won't even be aware that KVM temporarily detected an exhausted 
>> quota.
>>      */
>>     return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
>> }
>> #endif
>>
>> And then arch usage is simply something like:
>>
>>         if (kvm_check_dirty_quota_request(vcpu)) {
>>             r = 0;
>>             goto out;
>>         }
> If we are not even checking for request KVM_REQ_DIRTY_QUOTA_EXIT, what's 
> the use of kvm_make_request in patch 1?
Ok, so we don't need to explicitely check for request here because we 
are checking the quota directly but we do need to make the request when 
the quota is exhausted so as to guarantee that we enter the if block "if 
(kvm_request_pending(vcpu))".

Please let me know if my understanding is correct or if I am missing 
something.

Also, should I add ifdef here:

	#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
	if (kvm_check_dirty_quota_request(vcpu)) {
		r = 0;
		goto out;
	}
	#endif

Or should I bring the ifdef inside kvm_check_dirty_quota_request and 
return false if not defined.

static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
{
#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
	struct kvm_run *run = vcpu->run;

	run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
	run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
	run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);

	/*
	 * Re-check the quota and exit if and only if the vCPU still exceeds its
	 * quota.  If userspace increases (or disables entirely) the quota, then
	 * no exit is required as KVM is still honoring its ABI, e.g. userspace
	 * won't even be aware that KVM temporarily detected an exhausted quota.
	 */
	return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
#else
	return false;
#endif
}



Thanks a lot for the reviews so far. Looking forward to your reply.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 2/5] KVM: x86: Dirty quota-based throttling of vcpus
  2022-10-18 17:43       ` Shivam Kumar
@ 2022-10-19 15:42         ` Sean Christopherson
  0 siblings, 0 replies; 28+ messages in thread
From: Sean Christopherson @ 2022-10-19 15:42 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Tue, Oct 18, 2022, Shivam Kumar wrote:
> > > #ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> > > static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
> > > {
> > >     struct kvm_run *run = vcpu->run;
> > > 
> > >     run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> > >     run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
> > >     run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
> > > 
> > >     /*
> > >      * Re-check the quota and exit if and only if the vCPU still
> > > exceeds its
> > >      * quota.  If userspace increases (or disables entirely) the
> > > quota, then
> > >      * no exit is required as KVM is still honoring its ABI, e.g.
> > > userspace
> > >      * won't even be aware that KVM temporarily detected an
> > > exhausted quota.
> > >      */
> > >     return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
> > > }
> > > #endif
> > > 
> > > And then arch usage is simply something like:
> > > 
> > >         if (kvm_check_dirty_quota_request(vcpu)) {
> > >             r = 0;
> > >             goto out;
> > >         }
> > If we are not even checking for request KVM_REQ_DIRTY_QUOTA_EXIT, what's
> > the use of kvm_make_request in patch 1?
> Ok, so we don't need to explicitely check for request here because we are
> checking the quota directly but we do need to make the request when the
> quota is exhausted so as to guarantee that we enter the if block "if
> (kvm_request_pending(vcpu))".
> 
> Please let me know if my understanding is correct or if I am missing
> something.

The request needs to be explicitly checked, I simply unintentionally omitted that
code in the above snippet.  kvm_check_request() also _clears_ the request, which
is very important as not clearing the request will prevent KVM from entering the
guest.

The intent is to check KVM_REQ_DIRTY_QUOTA_EXIT in kvm_check_dirty_quota_request().

> 
> Also, should I add ifdef here:
> 
> 	#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> 	if (kvm_check_dirty_quota_request(vcpu)) {
> 		r = 0;
> 		goto out;
> 	}
> 	#endif

No need for an #ifdef in the caller, the call is from arch code and architectures
that don't "select HAVE_KVM_DIRTY_QUOTA" shouldn't be checking for a dirty quota
exit.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus
  2022-10-17  5:28     ` Shivam Kumar
@ 2022-10-19 16:01       ` Sean Christopherson
  0 siblings, 0 replies; 28+ messages in thread
From: Sean Christopherson @ 2022-10-19 16:01 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, maz, james.morse, borntraeger, david, kvm,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

On Mon, Oct 17, 2022, Shivam Kumar wrote:
> 
> On 10/10/22 11:11 am, Shivam Kumar wrote:
> > 
> > On 15/09/22 3:40 pm, Shivam Kumar wrote:
> > > diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> > > index 3ca3db020e0e..263a588f3cd3 100644
> > > --- a/include/linux/kvm_types.h
> > > +++ b/include/linux/kvm_types.h
> > > @@ -118,6 +118,7 @@ struct kvm_vcpu_stat_generic {
> > >       u64 halt_poll_fail_hist[HALT_POLL_HIST_COUNT];
> > >       u64 halt_wait_hist[HALT_POLL_HIST_COUNT];
> > >       u64 blocking;
> > > +    u64 pages_dirtied;
> > I am reworking the QEMU patches and I am not sure how I can access the
> > pages_dirtied info from the userspace side when the migration starts, i.e.
> > without a dirty quota exit.
> > 
> > I need this info to initialise the dirty quota. This is what I am looking
> > to do on the userspace side at the start of dirty quota migration:
> >      dirty_quota = pages_dirtied + some initial quota
> > 
> > Hoping if you could help, Sean. Thanks in advance.
> I think I can set dirty_quota initially to 1 and let the vpcu exit with exit
> reason KVM_EXIT_DIRTY_QUOTA_EXHAUSTED. Then, I can set the quota.

The vCPU doesn't need to be paused to read stats, pages_dirtied can be read while
the vCPU is running via KVM_GET_STATS_FD.  Though at a glance, QEMU doesn't yet
utilize KVM_GETS_STATS_FD, i.e. QEMU would a need a decent chunk of infrastructure
improvements to avoid the unnecessary exit.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-10-19 16:02 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-15 10:10 [PATCH v6 0/5] KVM: Dirty quota-based throttling Shivam Kumar
2022-09-15 10:10 ` [PATCH v6 1/5] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
2022-09-15 13:21   ` Christian Borntraeger
2022-09-15 14:34     ` Christian Borntraeger
2022-10-07 18:18       ` Sean Christopherson
2022-10-09 18:36         ` Shivam Kumar
2022-10-10  6:12           ` Christian Borntraeger
2022-10-07 19:08   ` Sean Christopherson
2022-10-07 19:20     ` Sean Christopherson
2022-10-09 18:49       ` Shivam Kumar
2022-10-10 16:09         ` Sean Christopherson
2022-10-09 19:30     ` Shivam Kumar
2022-10-10  5:41   ` Shivam Kumar
2022-10-17  5:28     ` Shivam Kumar
2022-10-19 16:01       ` Sean Christopherson
2022-09-15 10:10 ` [PATCH v6 2/5] KVM: x86: Dirty " Shivam Kumar
2022-10-07 19:30   ` Sean Christopherson
2022-10-09 19:05     ` Shivam Kumar
2022-10-18 17:43       ` Shivam Kumar
2022-10-19 15:42         ` Sean Christopherson
2022-10-09 19:17     ` Shivam Kumar
2022-09-15 10:10 ` [PATCH v6 3/5] KVM: arm64: " Shivam Kumar
2022-09-15 10:10 ` [PATCH v6 4/5] KVM: s390x: " Shivam Kumar
2022-09-15 13:24   ` Christian Borntraeger
2022-09-15 10:10 ` [PATCH v6 5/5] KVM: selftests: Add selftests for dirty quota throttling Shivam Kumar
2022-10-07 18:29   ` Sean Christopherson
2022-10-09 19:26     ` Shivam Kumar
2022-10-10 15:47       ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.