All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/3] Per-vCPU dirty quota-based throttling
@ 2024-02-21 19:51 Shivam Kumar
  2024-02-21 19:51 ` [PATCH v10 1/3] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Shivam Kumar @ 2024-02-21 19:51 UTC (permalink / raw)
  To: maz, pbonzini, seanjc, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, aravind.retnakaran, carl.waldspurger,
	david.vrabel, david, will
  Cc: kvm, Shivam Kumar

This patchset introduces a new mechanism (dirty-quota-based
throttling) to throttle the rate at which memory pages can be dirtied.
This is done by setting a limit on the number of bytes  that each vCPU
is allowed to dirty at a time, until it is allocated additional quota.

This new throttling mechanism is exposed to userspace through a new
KVM capability, KVM_CAP_DIRTY_QUOTA. If this capability is enabled by
userspace, each vCPU will exit to userspace (with exit reason
KVM_EXIT_DIRTY_QUOTA_EXHAUSTED) as soon as its dirty quota is
exhausted (in other words, a given vCPU will exit to userspace as soon
as it has dirtied as many bytes as the limit set for it). When the
vCPU exits to userspace, userspace may increase the dirty quota of the
vCPU (after optionally sleeping for an appropriate period of time) so
that it can continue dirtying more memory.

Dirty-quota-based throttling is a very effective choice for live
migration, for the following reasons:

1. With dirty-quota-based throttling, we can precisely set the amount
of memory we can afford to dirty for the migration to converge (and
within reasonable time). This behaviour is much more effective than
the current state-of-the-art auto-converge mechanism that implements
time-based throttling (making vCPUs sleep for some time to throttle
dirtying), since some workloads can dirty a huge amount of memory even
if its vCPUs are given a very small interval to run, thus causing
migrations to take longer and possibly failing to converge.

2. While the current auto-converge mechanism makes the whole VM sleep
to throttle memory dirtying, we can selectively throttle vCPUs with
dirty-quota-based throttling (i.e. only causing vCPUs that are
dirtying more than a threshold to sleep). Furthermore, if we choose
very small intervals to compute and enforce dirty quota, we can
achieve micro-stunning (i.e. stunning the vCPUs precisely when they
are dirtying the memory). Both of these behaviors help the
dirty-quota-based scheme to throttle only those vCPUs that are
dirtying memory, and only when they are dirtying the memory. Hence,
while the current auto-converge scheme is prone to throttling reads
and writes equally, dirty-quota-based throttling has minimal impact on
read performance.

3. Dirty-quota-based throttling can adapt quickly to changes in
network bandwidth if it is enforced in very small intervals.  In other
words, we can consider the current available network bandwidth when
computing an appropriate dirty quota for the next interval.

The benefits of dirty-quota-based throttling are not limited to live
migration.  The dirty-quota mechanism can also be leveraged to
support other use cases that would benefit from effective throttling
of memory writes.  The update_dirty_quota hook in the implementation
can be used outside the context of live migration, but note that such
alternative uses must also write-protect the memory.

We have evaluated dirty-quota-based throttling using two key metrics:
A. Live migration performance (time to migrate)
B. Guest performance during live migration

We have used a synthetic workload that dirties memory sequentially in
a loop. It is characterised by three variables m, n and l. A given
instance of this workload (m=x,n=y,l=z) is a workload dirtying x GB of
memory with y threads at a rate of z GBps. In the following table, b
is network bandwidth configured for the live migration, t_curr is the
total time to migrate with the current throttling logic and t_dq is
the total time to migrate with dirty-quota-based throttling.

    A. Live migration performance

+--------+----+----------+----------+---------------+----------+----------+
| m (GB) |  n | l (GBps) | b (MBps) |    t_curr (s) | t_dq (s) | Diff (%) |
+--------+----+----------+----------+---------------+----------+----------+
|      8 |  2 |     8.00 |      640 |         60.38 |    15.22 |     74.8 |
|     16 |  4 |     1.26 |      640 |         75.99 |    32.22 |     57.6 |
|     32 |  6 |     0.10 |      640 |         49.81 |    49.80 |      0.0 |
|     48 |  8 |     2.20 |      640 |        287.78 |   115.65 |     59.8 |
|     32 |  6 |    32.00 |      640 |        364.30 |    84.26 |     76.9 |
|      8 |  2 |     8.00 |      128 |        452.91 |    94.99 |     79.0 |
|    512 | 32 |     0.10 |      640 |        868.94 |   841.92 |      3.1 |
|     16 |  4 |     1.26 |       64 |       1538.94 |   426.21 |     72.3 |
|     32 |  6 |     1.80 |     1024 |       1406.80 |   452.82 |     67.8 |
|    512 | 32 |     7.20 |      640 |       4561.30 |   906.60 |     80.1 |
|    128 | 16 |     3.50 |      128 |       7009.98 |  1689.61 |     75.9 |
|     16 |  4 |    16.00 |       64 | "Unconverged" |   461.47 |      N/A |
|     32 |  6 |    32.00 |      128 | "Unconverged" |   454.27 |      N/A |
|    512 | 32 |   512.00 |      640 | "Unconverged" |   917.37 |      N/A |
|    128 | 16 |   128.00 |      128 | "Unconverged" |  1946.00 |      N/A |
+--------+----+----------+----------+---------------+----------+----------+

    B. Guest performance:

+=====================+===================+===================+==========+
|        Case         | Guest Runtime (%) | Guest Runtime (%) | Diff (%) |
+=====================+===================+===================+==========+
|                     | (Current)         | (Dirty Quota)     |          |
+---------------------+-------------------+-------------------+----------+
| Write-intensive     | 26.4              | 35.3              |     33.7 |
+---------------------+-------------------+-------------------+----------+
| Read-write-balanced | 40.6              | 70.8              |     74.4 |
+---------------------+-------------------+-------------------+----------+
| Read-intensive      | 63.1              | 81.8              |     29.6 |
+---------------------+-------------------+-------------------+----------+

Guest Runtime (in percentage) in the above table is the percentage of
time a guest vCPU is actually running, averaged across all vCPUs of
the guest. For B, we have run variants of the afore-mentioned
synthetic workload dirtying memory sequentially in a loop on some
threads and just reading memory sequentially on the other threads. We
have also conducted similar experiments with more realistic benchmarks
/ workloads e.g. redis, and obtained similar results.

Dirty-quota-based throttling was presented in KVM Forum 2021. Please
find the details here:
https://kvmforum2021.sched.com/event/ke4A/dirty-quota-based-vm-live-migration-auto-converge-manish-mishra-shivam-kumar-nutanix-india

The current v10 patchset includes the following changes over v9:

1. Use vma_pagesize as the dirty granularity for updating dirty quota
on arm64.
2. Do not update dirty quota for instances where the hypervisor is
writing into guest memory. Accounting for these instances in vCPUs'
dirty quota is unfair to the vCPUs. Also, some of these instances,
such as record_steal_time, frequently try to redundantly mark the same
set of pages dirty again and again. To avoid these distortions, we had
previously relied on checking the dirty bitmap to avoid redundantly
updating quotas. Since we have now decoupled dirty-quota-based
throttling from the live-migration dirty-tracking path, we have
resolved this issue by simply avoiding the mis-accounting caused by
these hypervisor-induced writes to guest memory.  Through extensive
experiments, we have verified that this new approach is approximately
as effective as the prior approach that relied on checking the dirty
bitmap.

v1:
https://lore.kernel.org/kvm/20211114145721.209219-1-shivam.kumar1@xxxxxxxxxxx/
v2: https://lore.kernel.org/kvm/Ydx2EW6U3fpJoJF0@xxxxxxxxxx/T/
v3: https://lore.kernel.org/kvm/YkT1kzWidaRFdQQh@xxxxxxxxxx/T/
v4:
https://lore.kernel.org/all/20220521202937.184189-1-shivam.kumar1@xxxxxxxxxxx/
v5: https://lore.kernel.org/all/202209130532.2BJwW65L-lkp@xxxxxxxxx/T/
v6:
https://lore.kernel.org/all/20220915101049.187325-1-shivam.kumar1@xxxxxxxxxxx/
v7:
https://lore.kernel.org/all/a64d9818-c68d-1e33-5783-414e9a9bdbd1@xxxxxxxxxxx/t/
v8:
https://lore.kernel.org/all/20230225204758.17726-1-shivam.kumar1@nutanix.com/
v9:
https://lore.kernel.org/kvm/20230504144328.139462-1-shivam.kumar1@nutanix.com/

Thanks,
Shivam

Shivam Kumar (3):
  KVM: Implement dirty quota-based throttling of vcpus
  KVM: x86: Dirty quota-based throttling of vcpus
  KVM: arm64: Dirty quota-based throttling of vcpus

 Documentation/virt/kvm/api.rst | 17 +++++++++++++++++
 arch/arm64/kvm/Kconfig         |  1 +
 arch/arm64/kvm/arm.c           |  5 +++++
 arch/arm64/kvm/mmu.c           |  1 +
 arch/x86/kvm/Kconfig           |  1 +
 arch/x86/kvm/mmu/mmu.c         |  6 +++++-
 arch/x86/kvm/mmu/spte.c        |  1 +
 arch/x86/kvm/vmx/vmx.c         |  3 +++
 arch/x86/kvm/x86.c             |  6 +++++-
 include/linux/kvm_host.h       |  9 +++++++++
 include/uapi/linux/kvm.h       |  8 ++++++++
 tools/include/uapi/linux/kvm.h |  1 +
 virt/kvm/Kconfig               |  3 +++
 virt/kvm/kvm_main.c            | 27 +++++++++++++++++++++++++++
 14 files changed, 87 insertions(+), 2 deletions(-)

-- 
2.22.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v10 1/3] KVM: Implement dirty quota-based throttling of vcpus
  2024-02-21 19:51 [PATCH v10 0/3] Per-vCPU dirty quota-based throttling Shivam Kumar
@ 2024-02-21 19:51 ` Shivam Kumar
  2024-02-22  2:00   ` Anish Moorthy
  2024-04-16 16:59   ` Sean Christopherson
  2024-02-21 19:51 ` [PATCH v10 2/3] KVM: x86: Dirty " Shivam Kumar
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Shivam Kumar @ 2024-02-21 19:51 UTC (permalink / raw)
  To: maz, pbonzini, seanjc, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, aravind.retnakaran, carl.waldspurger,
	david.vrabel, david, will
  Cc: kvm, Shivam Kumar, Shaju Abraham, Manish Mishra, Anurag Madnawat

Define dirty_quota_bytes variable to track and throttle memory
dirtying for every vcpu. This variable stores the number of bytes the
vcpu is allowed to dirty. To dirty more, the vcpu needs to request
more quota by exiting to userspace.

Implement update_dirty_quota function which

i) Decreases dirty_quota_bytes by arch-specific page size whenever a
page is dirtied.
ii) Raises a KVM request KVM_REQ_DIRTY_QUOTA_EXIT whenever the dirty
quota is exhausted (i.e. dirty_quota_bytes <= 0).

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 Documentation/virt/kvm/api.rst | 17 +++++++++++++++++
 include/linux/kvm_host.h       |  9 +++++++++
 include/uapi/linux/kvm.h       |  8 ++++++++
 tools/include/uapi/linux/kvm.h |  1 +
 virt/kvm/Kconfig               |  3 +++
 virt/kvm/kvm_main.c            | 27 +++++++++++++++++++++++++++
 6 files changed, 65 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 3ec0b7a455a0..1858db8b0698 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7031,6 +7031,23 @@ Please note that the kernel is allowed to use the kvm_run structure as the
 primary storage for certain register types. Therefore, the kernel may use the
 values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 
+::
+
+	/*
+	 * Number of bytes the vCPU is allowed to dirty if KVM_CAP_DIRTY_QUOTA is
+	 * enabled. KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if this quota
+	 * is exhausted, i.e. dirty_quota_bytes <= 0.
+	 */
+	long dirty_quota_bytes;
+
+Please note that enforcing the quota is best effort. Dirty quota is reduced by
+arch-specific page size when any guest page is dirtied. Also, the guest may dirty
+multiple pages before KVM can recheck the quota, e.g. when PML is enabled.
+
+::
+  };
+
+
 
 6. Capabilities that can be enabled on vCPUs
 ============================================
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e7fd25b09b3..994ecc4e5194 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -167,6 +167,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_VM_DEAD			(1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UNBLOCK			2
 #define KVM_REQ_DIRTY_RING_SOFT_FULL	3
+#define KVM_REQ_DIRTY_QUOTA_EXIT	4
 #define KVM_REQUEST_ARCH_BASE		8
 
 /*
@@ -831,6 +832,7 @@ struct kvm {
 	bool dirty_ring_with_bitmap;
 	bool vm_bugged;
 	bool vm_dead;
+	bool dirty_quota_enabled;
 
 #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
 	struct notifier_block pm_notifier;
@@ -1291,6 +1293,13 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
 bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
+#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
+void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes);
+#else
+static inline void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes)
+{
+}
+#endif
 void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c3308536482b..217f19100003 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -210,6 +210,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_LOONGARCH_IOCSR  38
 #define KVM_EXIT_MEMORY_FAULT     39
+#define KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 40
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -491,6 +492,12 @@ struct kvm_run {
 		struct kvm_sync_regs regs;
 		char padding[SYNC_REGS_SIZE_BYTES];
 	} s;
+	/*
+	 * Number of bytes the vCPU is allowed to dirty if KVM_CAP_DIRTY_QUOTA is
+	 * enabled. KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if this quota
+	 * is exhausted, i.e. dirty_quota_bytes <= 0.
+	 */
+	long dirty_quota_bytes;
 };
 
 /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
@@ -1155,6 +1162,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_MEMORY_ATTRIBUTES 233
 #define KVM_CAP_GUEST_MEMFD 234
 #define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_DIRTY_QUOTA 236
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index c3308536482b..cf880e26f55f 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -1155,6 +1155,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_MEMORY_ATTRIBUTES 233
 #define KVM_CAP_GUEST_MEMFD 234
 #define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_DIRTY_QUOTA 236
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 184dab4ee871..c4071cb14d15 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -22,6 +22,9 @@ config HAVE_KVM_IRQ_ROUTING
 config HAVE_KVM_DIRTY_RING
        bool
 
+config HAVE_KVM_DIRTY_QUOTA
+       bool
+
 # Only strongly ordered architectures can select this, as it doesn't
 # put any explicit constraint on userspace ordering. They can also
 # select the _ACQ_REL version.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 10bfc88a69f7..9a1e67187735 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3626,6 +3626,19 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
 }
 EXPORT_SYMBOL_GPL(kvm_clear_guest);
 
+void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes)
+{
+	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+
+	if (!vcpu || (vcpu->kvm != kvm) || !READ_ONCE(kvm->dirty_quota_enabled))
+		return;
+
+	vcpu->run->dirty_quota_bytes -= page_size_bytes;
+	if (vcpu->run->dirty_quota_bytes <= 0)
+		kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
+}
+EXPORT_SYMBOL_GPL(update_dirty_quota);
+
 void mark_page_dirty_in_slot(struct kvm *kvm,
 			     const struct kvm_memory_slot *memslot,
 		 	     gfn_t gfn)
@@ -3656,6 +3669,7 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 	struct kvm_memory_slot *memslot;
 
 	memslot = gfn_to_memslot(kvm, gfn);
+	update_dirty_quota(kvm, PAGE_SIZE);
 	mark_page_dirty_in_slot(kvm, memslot, gfn);
 }
 EXPORT_SYMBOL_GPL(mark_page_dirty);
@@ -3665,6 +3679,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn)
 	struct kvm_memory_slot *memslot;
 
 	memslot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+	update_dirty_quota(vcpu->kvm, PAGE_SIZE);
 	mark_page_dirty_in_slot(vcpu->kvm, memslot, gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);
@@ -4877,6 +4892,8 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_GUEST_MEMFD:
 		return !kvm || kvm_arch_has_private_mem(kvm);
 #endif
+	case KVM_CAP_DIRTY_QUOTA:
+		return !!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_QUOTA);
 	default:
 		break;
 	}
@@ -5027,6 +5044,16 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 
 		return r;
 	}
+	case KVM_CAP_DIRTY_QUOTA: {
+		int r = -EINVAL;
+
+		if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_QUOTA)) {
+			WRITE_ONCE(kvm->dirty_quota_enabled, cap->args[0]);
+			r = 0;
+		}
+
+		return r;
+	}
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 2/3] KVM: x86: Dirty quota-based throttling of vcpus
  2024-02-21 19:51 [PATCH v10 0/3] Per-vCPU dirty quota-based throttling Shivam Kumar
  2024-02-21 19:51 ` [PATCH v10 1/3] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
@ 2024-02-21 19:51 ` Shivam Kumar
  2024-04-16 17:44   ` Sean Christopherson
  2024-02-21 19:51 ` [PATCH v10 3/3] KVM: arm64: " Shivam Kumar
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Shivam Kumar @ 2024-02-21 19:51 UTC (permalink / raw)
  To: maz, pbonzini, seanjc, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, aravind.retnakaran, carl.waldspurger,
	david.vrabel, david, will
  Cc: kvm, Shivam Kumar, Shaju Abraham, Manish Mishra, Anurag Madnawat

Call update_dirty_quota whenever a page is marked dirty with
appropriate arch-specific page size. Process the KVM request
KVM_REQ_DIRTY_QUOTA_EXIT (raised by update_dirty_quota) to exit to
userspace with exit reason KVM_EXIT_DIRTY_QUOTA_EXHAUSTED.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 arch/x86/kvm/Kconfig    | 1 +
 arch/x86/kvm/mmu/mmu.c  | 6 +++++-
 arch/x86/kvm/mmu/spte.c | 1 +
 arch/x86/kvm/vmx/vmx.c  | 3 +++
 arch/x86/kvm/x86.c      | 6 +++++-
 5 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 87e3da7b0439..791456233f28 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -44,6 +44,7 @@ config KVM
 	select KVM_XFER_TO_GUEST_WORK
 	select KVM_GENERIC_DIRTYLOG_READ_PROTECT
 	select KVM_VFIO
+	select HAVE_KVM_DIRTY_QUOTA
 	select HAVE_KVM_PM_NOTIFIER if PM
 	select KVM_GENERIC_HARDWARE_ENABLING
 	help
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2d6cdeab1f8a..fa0b3853ee31 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3397,8 +3397,12 @@ static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
 	if (!try_cmpxchg64(sptep, &old_spte, new_spte))
 		return false;
 
-	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte))
+	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) {
+		struct kvm_mmu_page *sp = sptep_to_sp(sptep);
+
+		update_dirty_quota(vcpu->kvm, (1L << SPTE_LEVEL_SHIFT(sp->role.level)));
 		mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn);
+	}
 
 	return true;
 }
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 4a599130e9c9..550f9c1d03af 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -241,6 +241,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
 		/* Enforced by kvm_mmu_hugepage_adjust. */
 		WARN_ON_ONCE(level > PG_LEVEL_4K);
+		update_dirty_quota(vcpu->kvm, (1L << SPTE_LEVEL_SHIFT(level)));
 		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
 	}
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1111d9d08903..e2f8764c16ff 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5864,6 +5864,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 		 */
 		if (__xfer_to_guest_mode_work_pending())
 			return 1;
+
+		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
+			return 1;
 	}
 
 	return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 48a61d283406..4f36c0efb542 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10829,7 +10829,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			r = 0;
 			goto out;
 		}
-
+		if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
+			vcpu->run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
+			r = 0;
+			goto out;
+		}
 		/*
 		 * KVM_REQ_HV_STIMER has to be processed after
 		 * KVM_REQ_CLOCK_UPDATE, because Hyper-V SynIC timers
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 3/3] KVM: arm64: Dirty quota-based throttling of vcpus
  2024-02-21 19:51 [PATCH v10 0/3] Per-vCPU dirty quota-based throttling Shivam Kumar
  2024-02-21 19:51 ` [PATCH v10 1/3] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
  2024-02-21 19:51 ` [PATCH v10 2/3] KVM: x86: Dirty " Shivam Kumar
@ 2024-02-21 19:51 ` Shivam Kumar
  2024-03-21  5:48 ` [PATCH v10 0/3] Per-vCPU dirty quota-based throttling Shivam Kumar
  2024-04-16 17:44 ` Sean Christopherson
  4 siblings, 0 replies; 14+ messages in thread
From: Shivam Kumar @ 2024-02-21 19:51 UTC (permalink / raw)
  To: maz, pbonzini, seanjc, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, aravind.retnakaran, carl.waldspurger,
	david.vrabel, david, will
  Cc: kvm, Shivam Kumar, Shaju Abraham, Manish Mishra, Anurag Madnawat

Call update_dirty_quota whenever a page is marked dirty with
appropriate arch-specific page size. Process the KVM request
KVM_REQ_DIRTY_QUOTA_EXIT (raised by update_dirty_quota) to exit to
userspace with exit reason KVM_EXIT_DIRTY_QUOTA_EXHAUSTED.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 arch/arm64/kvm/Kconfig | 1 +
 arch/arm64/kvm/arm.c   | 5 +++++
 arch/arm64/kvm/mmu.c   | 1 +
 3 files changed, 7 insertions(+)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 27ca89b628a0..f66d872d0830 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -39,6 +39,7 @@ menuconfig KVM
 	select SCHED_INFO
 	select GUEST_PERF_EVENTS if PERF_EVENTS
 	select XARRAY_MULTI
+	select HAVE_KVM_DIRTY_QUOTA
 	help
 	  Support hosting virtualized guest machines.
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a25265aca432..dde02c372551 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -872,6 +872,11 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_dirty_ring_check_request(vcpu))
 			return 0;
+
+		if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
+			vcpu->run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
+			return 0;
+		}
 	}
 
 	return 1;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d14504821b79..77088bf9a502 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1579,6 +1579,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
 		kvm_set_pfn_dirty(pfn);
+		update_dirty_quota(kvm, vma_pagesize);
 		mark_page_dirty_in_slot(kvm, memslot, gfn);
 	}
 
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 1/3] KVM: Implement dirty quota-based throttling of vcpus
  2024-02-21 19:51 ` [PATCH v10 1/3] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
@ 2024-02-22  2:00   ` Anish Moorthy
  2024-04-16 16:52     ` Sean Christopherson
  2024-04-16 16:59   ` Sean Christopherson
  1 sibling, 1 reply; 14+ messages in thread
From: Anish Moorthy @ 2024-02-22  2:00 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: maz, pbonzini, seanjc, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, aravind.retnakaran, carl.waldspurger,
	david.vrabel, david, will, kvm, Shaju Abraham, Manish Mishra,
	Anurag Madnawat

I just saw this on the mailing list and had a couple minor thoughts,
apologies if I'm contradicting any of the feedback you've received on
previous versions

On Wed, Feb 21, 2024 at 12:01 PM Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>
> Define dirty_quota_bytes variable to track and throttle memory
> dirtying for every vcpu. This variable stores the number of bytes the
> vcpu is allowed to dirty. To dirty more, the vcpu needs to request
> more quota by exiting to userspace.
>
> Implement update_dirty_quota function which

Tiny nit, but can we just rename this to "reduce_dirty_quota"? It's
easy to see what an "update" is, but might as well make it even
clearer.

> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> +void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes);
> +#else
> +static inline void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes)
> +{
> +}
> +#endif

Is there a reason to #ifdef like this instead of just having a single
definition and doing

> void update_dirty_quota(,,,) {
>     if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_QUOTA)) return;
>     // actual body here
> }

in the body? I figure the compiler elides the no-op call, though I've
never bothered to check...

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 10bfc88a69f7..9a1e67187735 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3626,6 +3626,19 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
>  }
>  EXPORT_SYMBOL_GPL(kvm_clear_guest);
>
> +void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes)
> +{
> +       struct kvm_vcpu *vcpu = kvm_get_running_vcpu();

Can we just make update_dirty_quota() take a kvm_vcpu* instead of a
kvm* as its first parameter? Since the quota is per-vcpu, that seems
to make sense, and most of the callers of this function look like

> update_dirty_quota(vcpu->kvm, some_size_here);

anyways. The only one that's not is the addition in mark_page_dirty()

>  void mark_page_dirty_in_slot(struct kvm *kvm,
>                              const struct kvm_memory_slot *memslot,
>                              gfn_t gfn)
> @@ -3656,6 +3669,7 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
>         struct kvm_memory_slot *memslot;
>
>         memslot = gfn_to_memslot(kvm, gfn);
> +       update_dirty_quota(kvm, PAGE_SIZE);
>         mark_page_dirty_in_slot(kvm, memslot, gfn);
>  }

Is mark_page_dirty() allowed to be used outside of a vCPU context? The
lack of a vcpu* makes me think it is- I assume we don't want to charge
vCPUs for accesses they're not making.

Unfortunately we do seem to use it *in* vCPU contexts (see
kvm_update_stolen_time() on arm64?), although not on x86 AFAICT.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 0/3] Per-vCPU dirty quota-based throttling
  2024-02-21 19:51 [PATCH v10 0/3] Per-vCPU dirty quota-based throttling Shivam Kumar
                   ` (2 preceding siblings ...)
  2024-02-21 19:51 ` [PATCH v10 3/3] KVM: arm64: " Shivam Kumar
@ 2024-03-21  5:48 ` Shivam Kumar
  2024-04-04  9:19   ` Marc Zyngier
  2024-04-16 17:44 ` Sean Christopherson
  4 siblings, 1 reply; 14+ messages in thread
From: Shivam Kumar @ 2024-03-21  5:48 UTC (permalink / raw)
  To: maz, pbonzini, seanjc, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, Aravind Retnakaran,
	Carl Waldspurger [C],
	David Vrabel, david, will
  Cc: kvm


> On 22-Feb-2024, at 1:22 AM, Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> 
> The current v10 patchset includes the following changes over v9:
> 
> 1. Use vma_pagesize as the dirty granularity for updating dirty quota
> on arm64.
> 2. Do not update dirty quota for instances where the hypervisor is
> writing into guest memory. Accounting for these instances in vCPUs'
> dirty quota is unfair to the vCPUs. Also, some of these instances,
> such as record_steal_time, frequently try to redundantly mark the same
> set of pages dirty again and again. To avoid these distortions, we had
> previously relied on checking the dirty bitmap to avoid redundantly
> updating quotas. Since we have now decoupled dirty-quota-based
> throttling from the live-migration dirty-tracking path, we have
> resolved this issue by simply avoiding the mis-accounting caused by
> these hypervisor-induced writes to guest memory.  Through extensive
> experiments, we have verified that this new approach is approximately
> as effective as the prior approach that relied on checking the dirty
> bitmap.
> 

Hi Marc,

I’ve tried my best to address all the concerns raised in the previous patchset. I’d really appreciate it if you could share your thoughts and any feedback you might have on this one.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 0/3] Per-vCPU dirty quota-based throttling
  2024-03-21  5:48 ` [PATCH v10 0/3] Per-vCPU dirty quota-based throttling Shivam Kumar
@ 2024-04-04  9:19   ` Marc Zyngier
  2024-04-18 10:46     ` Shivam Kumar
  0 siblings, 1 reply; 14+ messages in thread
From: Marc Zyngier @ 2024-04-04  9:19 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: pbonzini, seanjc, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, Aravind Retnakaran,
	Carl Waldspurger [C],
	David Vrabel, david, will, kvm

On Thu, 21 Mar 2024 05:48:01 +0000,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> 
> 
> > On 22-Feb-2024, at 1:22 AM, Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> > 
> > The current v10 patchset includes the following changes over v9:
> > 
> > 1. Use vma_pagesize as the dirty granularity for updating dirty quota
> > on arm64.
> > 2. Do not update dirty quota for instances where the hypervisor is
> > writing into guest memory. Accounting for these instances in vCPUs'
> > dirty quota is unfair to the vCPUs. Also, some of these instances,
> > such as record_steal_time, frequently try to redundantly mark the same
> > set of pages dirty again and again. To avoid these distortions, we had
> > previously relied on checking the dirty bitmap to avoid redundantly
> > updating quotas. Since we have now decoupled dirty-quota-based
> > throttling from the live-migration dirty-tracking path, we have
> > resolved this issue by simply avoiding the mis-accounting caused by
> > these hypervisor-induced writes to guest memory.  Through extensive
> > experiments, we have verified that this new approach is approximately
> > as effective as the prior approach that relied on checking the dirty
> > bitmap.
> > 
> 
> Hi Marc,
> 
> I’ve tried my best to address all the concerns raised in the
> previous patchset. I’d really appreciate it if you could share your
> thoughts and any feedback you might have on this one.

I'll get to it at some point. However, given that it has you taken the
best part of a year to respin this, I need to page it all back it,
which is going to take a bit of time as well.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 1/3] KVM: Implement dirty quota-based throttling of vcpus
  2024-02-22  2:00   ` Anish Moorthy
@ 2024-04-16 16:52     ` Sean Christopherson
  0 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2024-04-16 16:52 UTC (permalink / raw)
  To: Anish Moorthy
  Cc: Shivam Kumar, maz, pbonzini, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, catalin.marinas, aravind.retnakaran,
	carl.waldspurger, david.vrabel, david, will, kvm, Shaju Abraham,
	Manish Mishra, Anurag Madnawat

On Wed, Feb 21, 2024, Anish Moorthy wrote:
> > @@ -3656,6 +3669,7 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
> >         struct kvm_memory_slot *memslot;
> >
> >         memslot = gfn_to_memslot(kvm, gfn);
> > +       update_dirty_quota(kvm, PAGE_SIZE);
> >         mark_page_dirty_in_slot(kvm, memslot, gfn);
> >  }
> 
> Is mark_page_dirty() allowed to be used outside of a vCPU context?

It's allowed, but only because we don't have a better option, i.e. it's more
tolerated than allowed. :-)

> The lack of a vcpu* makes me think it is- I assume we don't want to charge
> vCPUs for accesses they're not making.
> 
> Unfortunately we do seem to use it *in* vCPU contexts (see
> kvm_update_stolen_time() on arm64?), although not on x86 AFAICT.

Use what?  mark_page_dirty_in_slot()?  x86 _only_ uses it from vCPU context.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 1/3] KVM: Implement dirty quota-based throttling of vcpus
  2024-02-21 19:51 ` [PATCH v10 1/3] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
  2024-02-22  2:00   ` Anish Moorthy
@ 2024-04-16 16:59   ` Sean Christopherson
  2024-04-18 10:36     ` Shivam Kumar
  1 sibling, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2024-04-16 16:59 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: maz, pbonzini, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, aravind.retnakaran, carl.waldspurger,
	david.vrabel, david, will, kvm, Shaju Abraham, Manish Mishra,
	Anurag Madnawat

On Wed, Feb 21, 2024, Shivam Kumar wrote:
> @@ -1291,6 +1293,13 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
>  bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
>  bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
>  unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> +void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes);
> +#else
> +static inline void update_dirty_quota(struct kvm *kvm, unsigned long page_size_bytes)
> +{
> +}
> +#endif
>  void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn);
>  void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index c3308536482b..217f19100003 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -210,6 +210,7 @@ struct kvm_xen_exit {
>  #define KVM_EXIT_NOTIFY           37
>  #define KVM_EXIT_LOONGARCH_IOCSR  38
>  #define KVM_EXIT_MEMORY_FAULT     39
> +#define KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 40
>  
>  /* For KVM_EXIT_INTERNAL_ERROR */
>  /* Emulate instruction failed. */
> @@ -491,6 +492,12 @@ struct kvm_run {
>  		struct kvm_sync_regs regs;
>  		char padding[SYNC_REGS_SIZE_BYTES];
>  	} s;
> +	/*
> +	 * Number of bytes the vCPU is allowed to dirty if KVM_CAP_DIRTY_QUOTA is
> +	 * enabled. KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if this quota
> +	 * is exhausted, i.e. dirty_quota_bytes <= 0.
> +	 */
> +	long dirty_quota_bytes;

This needs to be a u64 so that the size is consistent for 32-bit and 64-bit
userspace vs. kernel.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 2/3] KVM: x86: Dirty quota-based throttling of vcpus
  2024-02-21 19:51 ` [PATCH v10 2/3] KVM: x86: Dirty " Shivam Kumar
@ 2024-04-16 17:44   ` Sean Christopherson
  0 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2024-04-16 17:44 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: maz, pbonzini, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, aravind.retnakaran, carl.waldspurger,
	david.vrabel, david, will, kvm, Shaju Abraham, Manish Mishra,
	Anurag Madnawat

On Wed, Feb 21, 2024, Shivam Kumar wrote:
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 2d6cdeab1f8a..fa0b3853ee31 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3397,8 +3397,12 @@ static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
>  	if (!try_cmpxchg64(sptep, &old_spte, new_spte))
>  		return false;
>  
> -	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte))
> +	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) {
> +		struct kvm_mmu_page *sp = sptep_to_sp(sptep);
> +
> +		update_dirty_quota(vcpu->kvm, (1L << SPTE_LEVEL_SHIFT(sp->role.level)));
>  		mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn);

Forcing KVM to manually call update_dirty_quota() whenever mark_page_dirty_in_slot()
is invoked is not maintainable, as we inevitably will forget to update the quota
and probably not notice.  We've already had bugs escape where KVM fails to mark
gfns dirty, and those flows are much more testable.

Stepping back, I feel like this series has gone off the rails a bit.
 
I understand Marc's objections to the uAPI not differentiating between page sizes,
but simply updating the quota based on KVM's page size is also flawed.  E.g. if
the guest is backed with 1GiB pages, odds are very good that the dirty quotas are
going to be completely out of whack due to the first vCPU that writes a given 1GiB
region being charged with the entire 1GiB page.

And without a way to trigger detection of writes, e.g. by enabling PML or write-
protecting memory, I don't see how userspace can build anything on the "bytes
dirtied" information.

From v7[*], Marc was specifically objecting to the proposed API effectively being
presented as a general purpose API, but in reality the API was heavily reliant
on dirty logging being enabled.

 : My earlier comments still stand: the proposed API is not usable as a
 : general purpose memory-tracking API because it counts faults instead
 : of memory, making it inadequate except for the most trivial cases.
 : And I cannot believe you were serious when you mentioned that you were
 : happy to make that the API.

To avoid going in circles, I think we need to first agree on the scope of the uAPI.
Specifically, do we want to shoot for a generic write-tracking API, or do we want
something that is explicitly tied to dirty logging?


Marc,

If we figured out a clean-ish way to tie the "gfns dirtied" information to
dirty logging, i.e. didn't misconstrue the counts as generally useful data, would
that be acceptable?  While I like the idea of a generic solution, I don't see a
path to an implementation that isn't deeply flawed without basically doing dirty
logging, i.e. without forcing the use of non-huge pages and write-protecting memory
to intercept "new" writes based on input from userspace.

[*] https://lore.kernel.org/all/20221113170507.208810-2-shivam.kumar1@nutanix.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 0/3] Per-vCPU dirty quota-based throttling
  2024-02-21 19:51 [PATCH v10 0/3] Per-vCPU dirty quota-based throttling Shivam Kumar
                   ` (3 preceding siblings ...)
  2024-03-21  5:48 ` [PATCH v10 0/3] Per-vCPU dirty quota-based throttling Shivam Kumar
@ 2024-04-16 17:44 ` Sean Christopherson
  2024-04-18 10:42   ` Shivam Kumar
  4 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2024-04-16 17:44 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: maz, pbonzini, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, aravind.retnakaran, carl.waldspurger,
	david.vrabel, david, will, kvm

On Wed, Feb 21, 2024, Shivam Kumar wrote:
> v1:
> https://lore.kernel.org/kvm/20211114145721.209219-1-shivam.kumar1@xxxxxxxxxxx/
> v2: https://lore.kernel.org/kvm/Ydx2EW6U3fpJoJF0@xxxxxxxxxx/T/
> v3: https://lore.kernel.org/kvm/YkT1kzWidaRFdQQh@xxxxxxxxxx/T/
> v4:
> https://lore.kernel.org/all/20220521202937.184189-1-shivam.kumar1@xxxxxxxxxxx/
> v5: https://lore.kernel.org/all/202209130532.2BJwW65L-lkp@xxxxxxxxx/T/
> v6:
> https://lore.kernel.org/all/20220915101049.187325-1-shivam.kumar1@xxxxxxxxxxx/
> v7:
> https://lore.kernel.org/all/a64d9818-c68d-1e33-5783-414e9a9bdbd1@xxxxxxxxxxx/t/

These links are all busted, which was actually quite annoying because I wanted to
go back and look at Marc's input.

> v8:
> https://lore.kernel.org/all/20230225204758.17726-1-shivam.kumar1@nutanix.com/
> v9:
> https://lore.kernel.org/kvm/20230504144328.139462-1-shivam.kumar1@nutanix.com/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 1/3] KVM: Implement dirty quota-based throttling of vcpus
  2024-04-16 16:59   ` Sean Christopherson
@ 2024-04-18 10:36     ` Shivam Kumar
  0 siblings, 0 replies; 14+ messages in thread
From: Shivam Kumar @ 2024-04-18 10:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: maz, pbonzini, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, Aravind Retnakaran,
	Carl Waldspurger [C],
	David Vrabel, david, will, kvm, Shaju Abraham, Manish Mishra,
	Anurag Madnawat


> On 16-Apr-2024, at 10:29 PM, Sean Christopherson <seanjc@google.com> wrote:
> On Wed, Feb 21, 2024, Shivam Kumar wrote:
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index c3308536482b..217f19100003 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -210,6 +210,7 @@ struct kvm_xen_exit {
>> #define KVM_EXIT_NOTIFY           37
>> #define KVM_EXIT_LOONGARCH_IOCSR  38
>> #define KVM_EXIT_MEMORY_FAULT     39
>> +#define KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 40
>> 
>> /* For KVM_EXIT_INTERNAL_ERROR */
>> /* Emulate instruction failed. */
>> @@ -491,6 +492,12 @@ struct kvm_run {
>> 		struct kvm_sync_regs regs;
>> 		char padding[SYNC_REGS_SIZE_BYTES];
>> 	} s;
>> +	/*
>> +	 * Number of bytes the vCPU is allowed to dirty if KVM_CAP_DIRTY_QUOTA is
>> +	 * enabled. KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if this quota
>> +	 * is exhausted, i.e. dirty_quota_bytes <= 0.
>> +	 */
>> +	long dirty_quota_bytes;
> 
> This needs to be a u64 so that the size is consistent for 32-bit and 64-bit
> userspace vs. kernel.
Ack.

Thanks,
Shivam.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 0/3] Per-vCPU dirty quota-based throttling
  2024-04-16 17:44 ` Sean Christopherson
@ 2024-04-18 10:42   ` Shivam Kumar
  0 siblings, 0 replies; 14+ messages in thread
From: Shivam Kumar @ 2024-04-18 10:42 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: maz, pbonzini, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, Aravind Retnakaran,
	Carl Waldspurger [C],
	David Vrabel, david, will, kvm



> On 16-Apr-2024, at 11:14 PM, Sean Christopherson <seanjc@google.com> wrote:
> On Wed, Feb 21, 2024, Shivam Kumar wrote:
>> v1:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_kvm_20211114145721.209219-2D1-2Dshivam.kumar1-40xxxxxxxxxxx_&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=npf2bNeivHu5BXcy66M81khdW0sy4qDh5d4kC_VThlzr1X2JvYVuDHMBYmNYzXMM&s=buLjKsfeC2-NhTOg3Gq9bQJg9XFUMlvJsi6vYIiVI9k&e= 
>> v2: https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_kvm_Ydx2EW6U3fpJoJF0-40xxxxxxxxxx_T_&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=npf2bNeivHu5BXcy66M81khdW0sy4qDh5d4kC_VThlzr1X2JvYVuDHMBYmNYzXMM&s=UUUIpjYiKj6G3_SlR40R9KS6UmuIlLU089Ai6SdPrC8&e= 
>> v3: https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_kvm_YkT1kzWidaRFdQQh-40xxxxxxxxxx_T_&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=npf2bNeivHu5BXcy66M81khdW0sy4qDh5d4kC_VThlzr1X2JvYVuDHMBYmNYzXMM&s=oQqOZNHdDOMAEkLEKPjwiffKaQdK3T4kZf_DRRUTuxo&e= 
>> v4:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_20220521202937.184189-2D1-2Dshivam.kumar1-40xxxxxxxxxxx_&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=npf2bNeivHu5BXcy66M81khdW0sy4qDh5d4kC_VThlzr1X2JvYVuDHMBYmNYzXMM&s=4fJ-Dzy7gsEnExqmGF0nP8K41YdVWUC3v9urCMn8RQI&e= 
>> v5: https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_202209130532.2BJwW65L-2Dlkp-40xxxxxxxxx_T_&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=npf2bNeivHu5BXcy66M81khdW0sy4qDh5d4kC_VThlzr1X2JvYVuDHMBYmNYzXMM&s=5GXvSQngNeqX62nS-3Yve0-bCtHxKYLFfl4AZiFO-u0&e= 
>> v6:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_20220915101049.187325-2D1-2Dshivam.kumar1-40xxxxxxxxxxx_&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=npf2bNeivHu5BXcy66M81khdW0sy4qDh5d4kC_VThlzr1X2JvYVuDHMBYmNYzXMM&s=S8mqK70ZETRAaQ0pmpYz9fzoJDYcDVMSgMtcUmCL4fE&e= 
>> v7:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_a64d9818-2Dc68d-2D1e33-2D5783-2D414e9a9bdbd1-40xxxxxxxxxxx_t_&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=npf2bNeivHu5BXcy66M81khdW0sy4qDh5d4kC_VThlzr1X2JvYVuDHMBYmNYzXMM&s=R9mCz9k87Sbv1QYREMeuD4l9fH-duqb1RInN3lmRBeo&e= 
> 
> These links are all busted, which was actually quite annoying because I wanted to
> go back and look at Marc's input.
Extremely sorry about that. Will fix them. I didn’t realise this when I copied the links from the previous patch.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 0/3] Per-vCPU dirty quota-based throttling
  2024-04-04  9:19   ` Marc Zyngier
@ 2024-04-18 10:46     ` Shivam Kumar
  0 siblings, 0 replies; 14+ messages in thread
From: Shivam Kumar @ 2024-04-18 10:46 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: pbonzini, seanjc, james.morse, suzuki.poulose, oliver.upton,
	yuzenghui, catalin.marinas, Aravind Retnakaran,
	Carl Waldspurger [C],
	David Vrabel, david, will, kvm


> On 04-Apr-2024, at 2:49 PM, Marc Zyngier <maz@kernel.org> wrote:
> On Thu, 21 Mar 2024 05:48:01 +0000,
> Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>> 
>> 
>>> On 22-Feb-2024, at 1:22 AM, Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>>> 
>>> The current v10 patchset includes the following changes over v9:
>>> 
>>> 1. Use vma_pagesize as the dirty granularity for updating dirty quota
>>> on arm64.
>>> 2. Do not update dirty quota for instances where the hypervisor is
>>> writing into guest memory. Accounting for these instances in vCPUs'
>>> dirty quota is unfair to the vCPUs. Also, some of these instances,
>>> such as record_steal_time, frequently try to redundantly mark the same
>>> set of pages dirty again and again. To avoid these distortions, we had
>>> previously relied on checking the dirty bitmap to avoid redundantly
>>> updating quotas. Since we have now decoupled dirty-quota-based
>>> throttling from the live-migration dirty-tracking path, we have
>>> resolved this issue by simply avoiding the mis-accounting caused by
>>> these hypervisor-induced writes to guest memory.  Through extensive
>>> experiments, we have verified that this new approach is approximately
>>> as effective as the prior approach that relied on checking the dirty
>>> bitmap.
>>> 
>> 
>> Hi Marc,
>> 
>> I’ve tried my best to address all the concerns raised in the
>> previous patchset. I’d really appreciate it if you could share your
>> thoughts and any feedback you might have on this one.
> 
> I'll get to it at some point. However, given that it has you taken the
> best part of a year to respin this, I need to page it all back it,
> which is going to take a bit of time as well.
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
> 
No problem. Thank you for acknowledging.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-04-18 10:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-21 19:51 [PATCH v10 0/3] Per-vCPU dirty quota-based throttling Shivam Kumar
2024-02-21 19:51 ` [PATCH v10 1/3] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
2024-02-22  2:00   ` Anish Moorthy
2024-04-16 16:52     ` Sean Christopherson
2024-04-16 16:59   ` Sean Christopherson
2024-04-18 10:36     ` Shivam Kumar
2024-02-21 19:51 ` [PATCH v10 2/3] KVM: x86: Dirty " Shivam Kumar
2024-04-16 17:44   ` Sean Christopherson
2024-02-21 19:51 ` [PATCH v10 3/3] KVM: arm64: " Shivam Kumar
2024-03-21  5:48 ` [PATCH v10 0/3] Per-vCPU dirty quota-based throttling Shivam Kumar
2024-04-04  9:19   ` Marc Zyngier
2024-04-18 10:46     ` Shivam Kumar
2024-04-16 17:44 ` Sean Christopherson
2024-04-18 10:42   ` Shivam Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.