kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge
@ 2021-10-26 16:35 Shivam Kumar
  2021-10-26 16:35 ` [PATCH 1/6] Define data structures needed for dirty quota migration Shivam Kumar
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Shivam Kumar @ 2021-10-26 16:35 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar

This patchset is the KVM-side implementation of a (new) dirty
"quota" based throttling algorithm that selectively throttles vCPUs
based on their individual contribution to overall memory dirtying and also
dynamically adapts the throttle based on the available network bandwidth.

Overview
--------
--------

To throttle memory dirtying, we propose to set a limit on the number of
pages a vCPU can dirty in given fixed microscopic size time intervals. This
limit depends on the network throughput calculated over the last few
intervals so as to throttle the vCPUs based on available network bandwidth.
We are referring to this limit as the "dirty quota" of a vCPU and
the fixed size intervals as the "dirty quota intervals". 

One possible approach to distributing the overall scope of dirtying for a
dirty quota interval is to equally distribute it among all the vCPUs. This
approach to the distribution doesn't make sense if the distribution of
workloads among vCPUs is skewed. So, to counter such skewed cases, we
propose that if any vCPU doesn't need its quota for any given dirty
quota interval, we add this quota to a common pool. This common pool (or
"common quota") can be consumed on a first come first serve basis
by all vCPUs in the upcoming dirty quota intervals.


Design
------
------

Initialization

vCPUDirtyQuotaContext keeps the dirty quota context for each vCPU. It keeps
the number of pages the vCPU has dirtied (dirty_counter) in the ongoing
dirty quota interval and the maximum number of dirties allowed for the vCPU
(dirty_quota) in the ongoing dirty quota interval.

struct vCPUDirtyQuotaContext {
	u64 dirty_counter;
	u64 dirty_quota;
};

The flag dirty_quota_migration_enabled determines whether dirty quota-based
throttling is enabled for an ongoing migration or not.


Handling page dirtying

When the guest tries to dirty a page, it leads to a vmexit as each page is
write-protected. In the vmexit path, we increment the dirty_counter for the
corresponding vCPU. Then, we check if the vCPU has exceeded its quota. If
yes, we exit to userspace with a new exit reason KVM_EXIT_DIRTY_QUOTA_FULL.
This "quota full" event is further handled on the userspace side. 


Please find the KVM Forum presentation on dirty quota-based throttling
here: https://www.youtube.com/watch?v=ZBkkJf78zFA

Shivam Kumar (6):
  Define data structures needed for dirty quota migration.
  Allocate memory for dirty quota context and initialize dirty quota
    migration flag.
  Add dirty quota migration capability and handle vCPU page fault for
    dirty quota context.
  Increment dirty counter for vmexit due to page write fault.
  Exit to userspace when dirty quota is full.
  Free space allocated for the vCPU's dirty quota context upon destroy.

 arch/x86/kvm/Makefile                 |  3 ++-
 arch/x86/kvm/x86.c                    |  9 ++++++++
 include/linux/dirty_quota_migration.h | 21 ++++++++++++++++++
 include/linux/kvm_host.h              |  3 +++
 include/uapi/linux/kvm.h              |  2 ++
 virt/kvm/dirty_quota_migration.c      | 31 ++++++++++++++++++++++++++
 virt/kvm/kvm_main.c                   | 32 ++++++++++++++++++++++++++-
 7 files changed, 99 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/dirty_quota_migration.h
 create mode 100644 virt/kvm/dirty_quota_migration.c

-- 
2.22.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/6] Define data structures needed for dirty quota migration.
  2021-10-26 16:35 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
@ 2021-10-26 16:35 ` Shivam Kumar
  2021-10-26 16:35 ` [PATCH 2/6] Allocate memory for dirty quota context and initialize dirty quota migration flag Shivam Kumar
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Shivam Kumar @ 2021-10-26 16:35 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham, Manish Mishra

Define the data structures to be used on the KVM side:

vCPUDirtyQuotaContext: stores the dirty quota context for individual vCPUs
(shared between QEMU and KVM).
  dirty_counter: number of pages dirtied by the vCPU
  dirty_quota: limit on the number of pages the vCPU can dirty
dirty_quota_migration_enabled: flag to see if migration is on or off.

Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
---
 include/linux/dirty_quota_migration.h | 11 +++++++++++
 include/linux/kvm_host.h              |  3 +++
 2 files changed, 14 insertions(+)
 create mode 100644 include/linux/dirty_quota_migration.h

diff --git a/include/linux/dirty_quota_migration.h b/include/linux/dirty_quota_migration.h
new file mode 100644
index 000000000000..6338cb6984df
--- /dev/null
+++ b/include/linux/dirty_quota_migration.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef DIRTY_QUOTA_MIGRATION_H
+#define DIRTY_QUOTA_MIGRATION_H
+#include <linux/kvm.h>
+
+struct vCPUDirtyQuotaContext {
+	u64 dirty_counter;
+	u64 dirty_quota;
+};
+
+#endif  /* DIRTY_QUOTA_MIGRATION_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0f18df7fe874..9f6165617c38 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -38,6 +38,7 @@
 
 #include <asm/kvm_host.h>
 #include <linux/kvm_dirty_ring.h>
+#include <linux/dirty_quota_migration.h>
 
 #ifndef KVM_MAX_VCPU_ID
 #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS
@@ -361,6 +362,7 @@ struct kvm_vcpu {
 	 * it is a valid slot.
 	 */
 	int last_used_slot;
+	struct vCPUDirtyQuotaContext *vCPUdqctx;
 };
 
 /* must be called with irqs disabled */
@@ -618,6 +620,7 @@ struct kvm {
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
 	bool vm_bugged;
+	bool dirty_quota_migration_enabled;
 
 #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
 	struct notifier_block pm_notifier;
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/6] Allocate memory for dirty quota context and initialize dirty quota migration flag.
  2021-10-26 16:35 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
  2021-10-26 16:35 ` [PATCH 1/6] Define data structures needed for dirty quota migration Shivam Kumar
@ 2021-10-26 16:35 ` Shivam Kumar
  2021-10-27  4:50   ` kernel test robot
  2021-10-27  7:31   ` kernel test robot
  2021-10-26 16:35 ` [PATCH 3/6] Add dirty quota migration capability and handle vCPU page fault for dirty quota context Shivam Kumar
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 12+ messages in thread
From: Shivam Kumar @ 2021-10-26 16:35 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham, Manish Mishra

When the VM is created, we initialize the flag to track the start of a
dirty quota migration as false. It is set to true when the dirty quota
migration starts.
When a vCPU is created, we allocate memory for the dirty quota context of
the vCPU. This dirty quota context is mmaped to QEMU when a dirty quota
migration starts.

Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
---
 arch/x86/kvm/Makefile                 |  3 ++-
 include/linux/dirty_quota_migration.h |  2 ++
 virt/kvm/dirty_quota_migration.c      | 14 ++++++++++++++
 virt/kvm/kvm_main.c                   |  6 ++++++
 4 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 virt/kvm/dirty_quota_migration.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 75dfd27b6e8a..a26fc0c94a83 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -11,7 +11,8 @@ KVM := ../../../virt/kvm
 
 kvm-y			+= $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
 				$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o \
-				$(KVM)/dirty_ring.o $(KVM)/binary_stats.o
+				$(KVM)/dirty_ring.o $(KVM)/binary_stats.o \
+				$(KVM)/dirty_quota_migration.o
 kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o
 
 kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
diff --git a/include/linux/dirty_quota_migration.h b/include/linux/dirty_quota_migration.h
index 6338cb6984df..2d6e5cd17be6 100644
--- a/include/linux/dirty_quota_migration.h
+++ b/include/linux/dirty_quota_migration.h
@@ -8,4 +8,6 @@ struct vCPUDirtyQuotaContext {
 	u64 dirty_quota;
 };
 
+int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx);
+
 #endif  /* DIRTY_QUOTA_MIGRATION_H */
diff --git a/virt/kvm/dirty_quota_migration.c b/virt/kvm/dirty_quota_migration.c
new file mode 100644
index 000000000000..262f071aac0c
--- /dev/null
+++ b/virt/kvm/dirty_quota_migration.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/dirty_quota_migration.h>
+
+int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx)
+{
+	u64 size = sizeof(struct vCPUDirtyQuotaContext);
+	*vCPUdqctx = vmalloc(size);
+	if (!(*vCPUdqctx))
+		return -ENOMEM;
+	memset((*vCPUdqctx), 0, size);
+	return 0;
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7851f3a1b5f7..f232a16a26e7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -66,6 +66,7 @@
 #include <trace/events/kvm.h>
 
 #include <linux/kvm_dirty_ring.h>
+#include <linux/dirty_quota_migration.h>
 
 /* Worst case buffer size needed for holding an integer. */
 #define ITOA_MAX_LEN 12
@@ -1071,6 +1072,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	}
 
 	kvm->max_halt_poll_ns = halt_poll_ns;
+	kvm->dirty_quota_migration_enabled = false;
 
 	r = kvm_arch_init_vm(kvm, type);
 	if (r)
@@ -3630,6 +3632,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 			goto arch_vcpu_destroy;
 	}
 
+	r = kvm_vcpu_dirty_quota_alloc(&vcpu->vCPUdqctx);
+	if (r)
+		goto arch_vcpu_destroy;
+
 	mutex_lock(&kvm->lock);
 	if (kvm_get_vcpu_by_id(kvm, id)) {
 		r = -EEXIST;
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/6] Add dirty quota migration capability and handle vCPU page fault for dirty quota context.
  2021-10-26 16:35 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
  2021-10-26 16:35 ` [PATCH 1/6] Define data structures needed for dirty quota migration Shivam Kumar
  2021-10-26 16:35 ` [PATCH 2/6] Allocate memory for dirty quota context and initialize dirty quota migration flag Shivam Kumar
@ 2021-10-26 16:35 ` Shivam Kumar
  2021-10-27  7:19   ` kernel test robot
  2021-10-26 16:35 ` [PATCH 4/6] Increment dirty counter for vmexit due to page write fault Shivam Kumar
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Shivam Kumar @ 2021-10-26 16:35 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham, Manish Mishra

When a dirty quota migration is initiated from QEMU side, the following
things happen:

1. An mmap ioctl is called for each vCPU to mmap the dirty quota context.
This results into vCPU page fault which needs to be handled.
2. An ioctl to start dirty quota migration is called from QEMU and must be
handled. This happens once QEMU is ready to start the migration.

Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
---
 include/linux/dirty_quota_migration.h |  6 ++++++
 include/uapi/linux/kvm.h              |  1 +
 virt/kvm/dirty_quota_migration.c      |  6 ++++++
 virt/kvm/kvm_main.c                   | 15 +++++++++++++++
 4 files changed, 28 insertions(+)

diff --git a/include/linux/dirty_quota_migration.h b/include/linux/dirty_quota_migration.h
index 2d6e5cd17be6..a9a54c38ee54 100644
--- a/include/linux/dirty_quota_migration.h
+++ b/include/linux/dirty_quota_migration.h
@@ -3,11 +3,17 @@
 #define DIRTY_QUOTA_MIGRATION_H
 #include <linux/kvm.h>
 
+#ifndef KVM_DIRTY_QUOTA_PAGE_OFFSET
+#define KVM_DIRTY_QUOTA_PAGE_OFFSET 64
+#endif
+
 struct vCPUDirtyQuotaContext {
 	u64 dirty_counter;
 	u64 dirty_quota;
 };
 
 int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx);
+struct page *kvm_dirty_quota_context_get_page(
+		struct vCPUDirtyQuotaContext *vCPUdqctx, u32 offset);
 
 #endif  /* DIRTY_QUOTA_MIGRATION_H */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a067410ebea5..3649a3bb9bb8 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_DIRTY_QUOTA_MIGRATION 206
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/dirty_quota_migration.c b/virt/kvm/dirty_quota_migration.c
index 262f071aac0c..7e9ace760939 100644
--- a/virt/kvm/dirty_quota_migration.c
+++ b/virt/kvm/dirty_quota_migration.c
@@ -12,3 +12,9 @@ int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx)
 	memset((*vCPUdqctx), 0, size);
 	return 0;
 }
+
+struct page *kvm_dirty_quota_context_get_page(
+		struct vCPUDirtyQuotaContext *vCPUdqctx, u32 offset)
+{
+	return vmalloc_to_page((void *)vCPUdqctx + offset * PAGE_SIZE);
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f232a16a26e7..95f857c50bf2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3511,6 +3511,9 @@ static vm_fault_t kvm_vcpu_fault(struct vm_fault *vmf)
 		page = kvm_dirty_ring_get_page(
 		    &vcpu->dirty_ring,
 		    vmf->pgoff - KVM_DIRTY_LOG_PAGE_OFFSET);
+	else if (vmf->pgoff == KVM_DIRTY_QUOTA_PAGE_OFFSET)
+		page = kvm_dirty_quota_context_get_page(vcpu->vCPUdqctx,
+				vmf->pgoff - KVM_DIRTY_QUOTA_PAGE_OFFSET);
 	else
 		return kvm_arch_vcpu_fault(vcpu, vmf);
 	get_page(page);
@@ -4263,6 +4266,15 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm)
 	return cleared;
 }
 
+static int kvm_vm_ioctl_enable_dirty_quota_migration(struct kvm *kvm,
+		bool dirty_quota_migration_enabled)
+{
+	mutex_lock(&kvm->lock);
+	kvm->dirty_quota_migration_enabled = dirty_quota_migration_enabled;
+	mutex_unlock(&kvm->lock);
+	return 0;
+}
+
 int __attribute__((weak)) kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 						  struct kvm_enable_cap *cap)
 {
@@ -4295,6 +4307,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 	}
 	case KVM_CAP_DIRTY_LOG_RING:
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+	case KVM_CAP_DIRTY_QUOTA_MIGRATION:
+		return kvm_vm_ioctl_enable_dirty_quota_migration(kvm,
+				cap->args[0]);
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/6] Increment dirty counter for vmexit due to page write fault.
  2021-10-26 16:35 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
                   ` (2 preceding siblings ...)
  2021-10-26 16:35 ` [PATCH 3/6] Add dirty quota migration capability and handle vCPU page fault for dirty quota context Shivam Kumar
@ 2021-10-26 16:35 ` Shivam Kumar
  2021-10-26 16:35 ` [PATCH 5/6] Exit to userspace when dirty quota is full Shivam Kumar
  2021-10-26 16:35 ` [PATCH 6/6] Free space allocated for the vCPU's dirty quota context upon destroy Shivam Kumar
  5 siblings, 0 replies; 12+ messages in thread
From: Shivam Kumar @ 2021-10-26 16:35 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham, Manish Mishra

For a page write fault or "page dirty", the dirty counter of the
corresponding vCPU is incremented.

Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
---
 virt/kvm/kvm_main.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 95f857c50bf2..c41b85af8682 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3083,8 +3083,15 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		if (kvm->dirty_ring_size)
 			kvm_dirty_ring_push(kvm_dirty_ring_get(kvm),
 					    slot, rel_gfn);
-		else
+		else {
+			struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+
+			if (vcpu && vcpu->kvm->dirty_quota_migration_enabled &&
+					vcpu->vCPUdqctx)
+				vcpu->vCPUdqctx->dirty_counter++;
+
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
+		}
 	}
 }
 EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot);
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/6] Exit to userspace when dirty quota is full.
  2021-10-26 16:35 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
                   ` (3 preceding siblings ...)
  2021-10-26 16:35 ` [PATCH 4/6] Increment dirty counter for vmexit due to page write fault Shivam Kumar
@ 2021-10-26 16:35 ` Shivam Kumar
  2021-10-26 16:35 ` [PATCH 6/6] Free space allocated for the vCPU's dirty quota context upon destroy Shivam Kumar
  5 siblings, 0 replies; 12+ messages in thread
From: Shivam Kumar @ 2021-10-26 16:35 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham, Manish Mishra

Whenever dirty quota is full (i.e. dirty counter equals dirty quota),
control is passed to the QEMU side, through a KVM exit with the custom exit
reason KVM_EXIT_DIRTY_QUOTA_FULL, to handle the dirty quota full event.

Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
---
 arch/x86/kvm/x86.c                    | 9 +++++++++
 include/linux/dirty_quota_migration.h | 1 +
 include/uapi/linux/kvm.h              | 1 +
 virt/kvm/dirty_quota_migration.c      | 5 +++++
 4 files changed, 16 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b26647a5ea22..ee9464d71f01 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -59,6 +59,7 @@
 #include <linux/mem_encrypt.h>
 #include <linux/entry-kvm.h>
 #include <linux/suspend.h>
+#include <linux/dirty_quota_migration.h>
 
 #include <trace/events/kvm.h>
 
@@ -9843,6 +9844,14 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 				return r;
 			vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
 		}
+
+		/* check for dirty quota migration exit condition if it is enabled */
+		if (vcpu->kvm->dirty_quota_migration_enabled &&
+				is_dirty_quota_full(vcpu->vCPUdqctx)) {
+			vcpu->run->exit_reason = KVM_EXIT_DIRTY_QUOTA_FULL;
+			r = 0;
+			break;
+		}
 	}
 
 	srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
diff --git a/include/linux/dirty_quota_migration.h b/include/linux/dirty_quota_migration.h
index a9a54c38ee54..f343c073f38d 100644
--- a/include/linux/dirty_quota_migration.h
+++ b/include/linux/dirty_quota_migration.h
@@ -15,5 +15,6 @@ struct vCPUDirtyQuotaContext {
 int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx);
 struct page *kvm_dirty_quota_context_get_page(
 		struct vCPUDirtyQuotaContext *vCPUdqctx, u32 offset);
+bool is_dirty_quota_full(struct vCPUDirtyQuotaContext *vCPUdqctx);
 
 #endif  /* DIRTY_QUOTA_MIGRATION_H */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3649a3bb9bb8..0f04cd99fc8d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -269,6 +269,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_AP_RESET_HOLD    32
 #define KVM_EXIT_X86_BUS_LOCK     33
 #define KVM_EXIT_XEN              34
+#define KVM_EXIT_DIRTY_QUOTA_FULL 35
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
diff --git a/virt/kvm/dirty_quota_migration.c b/virt/kvm/dirty_quota_migration.c
index 7e9ace760939..eeef19347af4 100644
--- a/virt/kvm/dirty_quota_migration.c
+++ b/virt/kvm/dirty_quota_migration.c
@@ -18,3 +18,8 @@ struct page *kvm_dirty_quota_context_get_page(
 {
 	return vmalloc_to_page((void *)vCPUdqctx + offset * PAGE_SIZE);
 }
+
+bool is_dirty_quota_full(struct vCPUDirtyQuotaContext *vCPUdqctx)
+{
+	return (vCPUdqctx->dirty_counter >= vCPUdqctx->dirty_quota);
+}
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/6] Free space allocated for the vCPU's dirty quota context upon destroy.
  2021-10-26 16:35 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
                   ` (4 preceding siblings ...)
  2021-10-26 16:35 ` [PATCH 5/6] Exit to userspace when dirty quota is full Shivam Kumar
@ 2021-10-26 16:35 ` Shivam Kumar
  5 siblings, 0 replies; 12+ messages in thread
From: Shivam Kumar @ 2021-10-26 16:35 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham, Manish Mishra

When the vCPU is destroyed, we must free the space allocated to the dirty
quota context for the vCPU.

Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
---
 include/linux/dirty_quota_migration.h | 1 +
 virt/kvm/dirty_quota_migration.c      | 6 ++++++
 virt/kvm/kvm_main.c                   | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/include/linux/dirty_quota_migration.h b/include/linux/dirty_quota_migration.h
index f343c073f38d..d3ccab153d44 100644
--- a/include/linux/dirty_quota_migration.h
+++ b/include/linux/dirty_quota_migration.h
@@ -16,5 +16,6 @@ int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx);
 struct page *kvm_dirty_quota_context_get_page(
 		struct vCPUDirtyQuotaContext *vCPUdqctx, u32 offset);
 bool is_dirty_quota_full(struct vCPUDirtyQuotaContext *vCPUdqctx);
+void kvm_vcpu_dirty_quota_free(struct vCPUDirtyQuotaContext **vCPUdqctx);
 
 #endif  /* DIRTY_QUOTA_MIGRATION_H */
diff --git a/virt/kvm/dirty_quota_migration.c b/virt/kvm/dirty_quota_migration.c
index eeef19347af4..3f74af2ccab9 100644
--- a/virt/kvm/dirty_quota_migration.c
+++ b/virt/kvm/dirty_quota_migration.c
@@ -23,3 +23,9 @@ bool is_dirty_quota_full(struct vCPUDirtyQuotaContext *vCPUdqctx)
 {
 	return (vCPUdqctx->dirty_counter >= vCPUdqctx->dirty_quota);
 }
+
+void kvm_vcpu_dirty_quota_free(struct vCPUDirtyQuotaContext **vCPUdqctx)
+{
+	vfree(*vCPUdqctx);
+	*vCPUdqctx = NULL;
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c41b85af8682..30fce3f93ce0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -430,6 +430,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 
 void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
+	kvm_vcpu_dirty_quota_free(&vcpu->vCPUdqctx);
 	kvm_dirty_ring_free(&vcpu->dirty_ring);
 	kvm_arch_vcpu_destroy(vcpu);
 
@@ -3683,6 +3684,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 
 unlock_vcpu_destroy:
 	mutex_unlock(&kvm->lock);
+	kvm_vcpu_dirty_quota_free(&vcpu->vCPUdqctx);
 	kvm_dirty_ring_free(&vcpu->dirty_ring);
 arch_vcpu_destroy:
 	kvm_arch_vcpu_destroy(vcpu);
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] Allocate memory for dirty quota context and initialize dirty quota migration flag.
  2021-10-26 16:35 ` [PATCH 2/6] Allocate memory for dirty quota context and initialize dirty quota migration flag Shivam Kumar
@ 2021-10-27  4:50   ` kernel test robot
  2021-10-27  7:31   ` kernel test robot
  1 sibling, 0 replies; 12+ messages in thread
From: kernel test robot @ 2021-10-27  4:50 UTC (permalink / raw)
  To: Shivam Kumar, pbonzini
  Cc: kbuild-all, kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham,
	Manish Mishra

[-- Attachment #1: Type: text/plain, Size: 1840 bytes --]

Hi Shivam,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on mst-vhost/linux-next]
[also build test ERROR on linus/master v5.15-rc7 next-20211026]
[cannot apply to kvm/queue]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Shivam-Kumar/KVM-Dirty-Quota-Based-VM-Live-Migration-Auto-Converge/20211027-003852
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: s390-alldefconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/96d8525a5ba81c559c00d6c0fa1f2b2e84fe74ce
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Shivam-Kumar/KVM-Dirty-Quota-Based-VM-Live-Migration-Auto-Converge/20211027-003852
        git checkout 96d8525a5ba81c559c00d6c0fa1f2b2e84fe74ce
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   s390-linux-ld: arch/s390/../../virt/kvm/kvm_main.o: in function `kvm_vm_ioctl_create_vcpu':
>> kvm_main.c:(.text+0x3766): undefined reference to `kvm_vcpu_dirty_quota_alloc'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 8146 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/6] Add dirty quota migration capability and handle vCPU page fault for dirty quota context.
  2021-10-26 16:35 ` [PATCH 3/6] Add dirty quota migration capability and handle vCPU page fault for dirty quota context Shivam Kumar
@ 2021-10-27  7:19   ` kernel test robot
  0 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2021-10-27  7:19 UTC (permalink / raw)
  To: Shivam Kumar, pbonzini
  Cc: kbuild-all, kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham,
	Manish Mishra

[-- Attachment #1: Type: text/plain, Size: 2012 bytes --]

Hi Shivam,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on mst-vhost/linux-next]
[also build test ERROR on linus/master v5.15-rc7 next-20211026]
[cannot apply to kvm/queue]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Shivam-Kumar/KVM-Dirty-Quota-Based-VM-Live-Migration-Auto-Converge/20211027-003852
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: s390-alldefconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/609ed7d1183e7cea180b18da8e7bf137cbcb22d2
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Shivam-Kumar/KVM-Dirty-Quota-Based-VM-Live-Migration-Auto-Converge/20211027-003852
        git checkout 609ed7d1183e7cea180b18da8e7bf137cbcb22d2
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   s390-linux-ld: arch/s390/../../virt/kvm/kvm_main.o: in function `kvm_vcpu_fault':
>> kvm_main.c:(.text+0x4f0): undefined reference to `kvm_dirty_quota_context_get_page'
   s390-linux-ld: arch/s390/../../virt/kvm/kvm_main.o: in function `kvm_vm_ioctl_create_vcpu':
   kvm_main.c:(.text+0x37ae): undefined reference to `kvm_vcpu_dirty_quota_alloc'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 8146 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] Allocate memory for dirty quota context and initialize dirty quota migration flag.
  2021-10-26 16:35 ` [PATCH 2/6] Allocate memory for dirty quota context and initialize dirty quota migration flag Shivam Kumar
  2021-10-27  4:50   ` kernel test robot
@ 2021-10-27  7:31   ` kernel test robot
  1 sibling, 0 replies; 12+ messages in thread
From: kernel test robot @ 2021-10-27  7:31 UTC (permalink / raw)
  To: Shivam Kumar, pbonzini
  Cc: kbuild-all, kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham,
	Manish Mishra

[-- Attachment #1: Type: text/plain, Size: 1728 bytes --]

Hi Shivam,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on mst-vhost/linux-next]
[also build test ERROR on linus/master v5.15-rc7 next-20211026]
[cannot apply to kvm/queue]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Shivam-Kumar/KVM-Dirty-Quota-Based-VM-Live-Migration-Auto-Converge/20211027-003852
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: mips-malta_kvm_defconfig (attached as .config)
compiler: mipsel-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/96d8525a5ba81c559c00d6c0fa1f2b2e84fe74ce
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Shivam-Kumar/KVM-Dirty-Quota-Based-VM-Live-Migration-Auto-Converge/20211027-003852
        git checkout 96d8525a5ba81c559c00d6c0fa1f2b2e84fe74ce
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "kvm_vcpu_dirty_quota_alloc" [arch/mips/kvm/kvm.ko] undefined!

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 21280 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge
  2021-11-14 14:57 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
@ 2021-11-18 17:46 ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2021-11-18 17:46 UTC (permalink / raw)
  To: Shivam Kumar; +Cc: pbonzini, kvm

On Sun, Nov 14, 2021, Shivam Kumar wrote:
> One possible approach to distributing the overall scope of dirtying for a
> dirty quota interval is to equally distribute it among all the vCPUs. This
> approach to the distribution doesn't make sense if the distribution of
> workloads among vCPUs is skewed. So, to counter such skewed cases, we
> propose that if any vCPU doesn't need its quota for any given dirty
> quota interval, we add this quota to a common pool. This common pool (or
> "common quota") can be consumed on a first come first serve basis
> by all vCPUs in the upcoming dirty quota intervals.

Why not simply use a per-VM quota in combination with a percpu_counter to avoid bouncing
the dirty counter?

> Design
> ----------
> ----------
> 
> Initialization
> 

Feedback that applies to all patches:

> vCPUDirtyQuotaContext keeps the dirty quota context for each vCPU. It keeps

CamelCase is very frowned upon, please use whatever_case_this_is_called.

The SOB chains are wrong.  The person physically posting the patches needs to have
their SOB last, as they are the person who last handled the patches.

  Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
  Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
  Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
  Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
  Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>

These needs a Co-developed-by.  The only other scenario is that you and Anurag
wrote the patches, then handed them off to Shaju, who sent them to Manish, who
sent them back to you for posting.  I highly doubt that's the case, and if so,
I would hope you've done due diligence to ensure what you handed off is the same
as what you posted, i.e. the SOB chains for Shaju and Manish can be omitted.

In general, please read through most of the stuff in Documentation/process.

> the number of pages the vCPU has dirtied (dirty_counter) in the ongoing
> dirty quota interval, and the maximum number of dirties allowed for the
> vCPU (dirty_quota) in the ongoing dirty quota interval.
> 
> struct vCPUDirtyQuotaContext {
> u64 dirty_counter;
> u64 dirty_quota;
> };
> 
> The flag dirty_quota_migration_enabled determines whether dirty quota-based
> throttling is enabled for an ongoing migration or not.
> 
> 
> Handling page dirtying
> 
> When the guest tries to dirty a page, it leads to a vmexit as each page is
> write-protected. In the vmexit path, we increment the dirty_counter for the
> corresponding vCPU. Then, we check if the vCPU has exceeded its quota. If
> yes, we exit to userspace with a new exit reason KVM_EXIT_DIRTY_QUOTA_FULL.
> This "quota full" event is further handled on the userspace side. 
> 
> 
> Please find the KVM Forum presentation on dirty quota-based throttling
> here: https://www.youtube.com/watch?v=ZBkkJf78zFA
> 
> 
> Shivam Kumar (6):
>   Define data structures for dirty quota migration.
>   Init dirty quota flag and allocate memory for vCPUdqctx.
>   Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults.
>   Increment dirty counter for vmexit due to page write fault.
>   Exit to userspace when dirty quota is full.
>   Free vCPUdqctx memory on vCPU destroy.

Freeing memory in a later patch is not an option.  The purpose of splitting is
to aid bisection and make the patches more reviewable, not to break bisection and
confuse reviewers.  In general, there are too many patches and things are split in
weird ways, making this hard to review.  This can probably be smushed to two
patches: 1) implement the guts, 2) exposed to userspace and document.

>  Documentation/virt/kvm/api.rst        | 39 +++++++++++++++++++
>  arch/x86/include/uapi/asm/kvm.h       |  1 +
>  arch/x86/kvm/Makefile                 |  3 +-
>  arch/x86/kvm/x86.c                    |  9 +++++
>  include/linux/dirty_quota_migration.h | 52 +++++++++++++++++++++++++
>  include/linux/kvm_host.h              |  3 ++
>  include/uapi/linux/kvm.h              | 11 ++++++
>  virt/kvm/dirty_quota_migration.c      | 31 +++++++++++++++

I do not see any reason to add two new files for 84 lines, which I'm pretty sure
we can trim down significantly in any case.  Paolo has suggested creating files
for the mm side of generic kvm, the helpers can go wherever that lands.

>  virt/kvm/kvm_main.c                   | 56 ++++++++++++++++++++++++++-
>  9 files changed, 203 insertions(+), 2 deletions(-)
>  create mode 100644 include/linux/dirty_quota_migration.h
>  create mode 100644 virt/kvm/dirty_quota_migration.c

As for the design, allocating a separate page for 16 bytes is wasteful and adds
complexity that I don't think is strictly necessary.  Assuming the quota isn't
simply a per-VM thing....

Rather than have both the count and the quote writable by userspace, what about
having KVM_CAP_DIRTY_QUOTA_MIGRATION (renamed to just KVM_CAP_DIRTY_QUOTA, because
dirty logging can technically be used for things other than migration) define a
default, per-VM dirty quota, that is snapshotted by each vCPU on creation.  The
ioctl() would need to be rejected if vCPUs have been created, but it already needs
something along those lines because currently it has a TOCTOU race and can also
race with vCPU readers.

Anyways, vCPUs snapshot a default quota on creation, and then use struct kvm_run to
update the quota upon return from userspace after KVM_EXIT_DIRTY_QUOTA_FULL instead
of giving userspace free reign to change it the quota at will.  There are a variety
of ways to leverage kvm_run, the simplest I can think of would be to define the ABI
such that calling KVM_RUN with "exit_reason == KVM_EXIT_DIRTY_QUOTA_FULL" would
trigger an update.  That would do the right thing even if userspace _doesn't_ update
the count/quota, as KVM would simply copy back the original quota/count and exit back
to userspace.

E.g.

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 78f0719cc2a3..d4a7d1b7019e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -487,6 +487,11 @@ struct kvm_run {
                        unsigned long args[6];
                        unsigned long ret[2];
                } riscv_sbi;
+               /* KVM_EXIT_DIRTY_QUOTA_FULL */
+               struct {
+                       u64 dirty_count;
+                       u64 dirty_quota;
+               }
                /* Fix the size of the union. */
                char padding[256];
        };


Side topic, it might make sense to have the counter be a stat, the per-vCPU dirty
rate could be useful info even if userspace isn't using quotas.

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge
@ 2021-11-14 14:57 Shivam Kumar
  2021-11-18 17:46 ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: Shivam Kumar @ 2021-11-14 14:57 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar

This patchset is the KVM-side implementation of a (new) dirty "quota"
based throttling algorithm that selectively throttles vCPUs based on their
individual contribution to overall memory dirtying and also dynamically
adapts the throttle based on the available network bandwidth.

Overview
----------
----------

To throttle memory dirtying, we propose to set a limit on the number of
pages a vCPU can dirty in given fixed microscopic size time intervals. This
limit depends on the network throughput calculated over the last few
intervals so as to throttle the vCPUs based on available network bandwidth.
We are referring to this limit as the "dirty quota" of a vCPU and
the fixed size intervals as the "dirty quota intervals". 

One possible approach to distributing the overall scope of dirtying for a
dirty quota interval is to equally distribute it among all the vCPUs. This
approach to the distribution doesn't make sense if the distribution of
workloads among vCPUs is skewed. So, to counter such skewed cases, we
propose that if any vCPU doesn't need its quota for any given dirty
quota interval, we add this quota to a common pool. This common pool (or
"common quota") can be consumed on a first come first serve basis
by all vCPUs in the upcoming dirty quota intervals.

Design
----------
----------

Initialization

vCPUDirtyQuotaContext keeps the dirty quota context for each vCPU. It keeps
the number of pages the vCPU has dirtied (dirty_counter) in the ongoing
dirty quota interval, and the maximum number of dirties allowed for the
vCPU (dirty_quota) in the ongoing dirty quota interval.

struct vCPUDirtyQuotaContext {
u64 dirty_counter;
u64 dirty_quota;
};

The flag dirty_quota_migration_enabled determines whether dirty quota-based
throttling is enabled for an ongoing migration or not.


Handling page dirtying

When the guest tries to dirty a page, it leads to a vmexit as each page is
write-protected. In the vmexit path, we increment the dirty_counter for the
corresponding vCPU. Then, we check if the vCPU has exceeded its quota. If
yes, we exit to userspace with a new exit reason KVM_EXIT_DIRTY_QUOTA_FULL.
This "quota full" event is further handled on the userspace side. 


Please find the KVM Forum presentation on dirty quota-based throttling
here: https://www.youtube.com/watch?v=ZBkkJf78zFA


Shivam Kumar (6):
  Define data structures for dirty quota migration.
  Init dirty quota flag and allocate memory for vCPUdqctx.
  Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults.
  Increment dirty counter for vmexit due to page write fault.
  Exit to userspace when dirty quota is full.
  Free vCPUdqctx memory on vCPU destroy.

 Documentation/virt/kvm/api.rst        | 39 +++++++++++++++++++
 arch/x86/include/uapi/asm/kvm.h       |  1 +
 arch/x86/kvm/Makefile                 |  3 +-
 arch/x86/kvm/x86.c                    |  9 +++++
 include/linux/dirty_quota_migration.h | 52 +++++++++++++++++++++++++
 include/linux/kvm_host.h              |  3 ++
 include/uapi/linux/kvm.h              | 11 ++++++
 virt/kvm/dirty_quota_migration.c      | 31 +++++++++++++++
 virt/kvm/kvm_main.c                   | 56 ++++++++++++++++++++++++++-
 9 files changed, 203 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/dirty_quota_migration.h
 create mode 100644 virt/kvm/dirty_quota_migration.c

-- 
2.22.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-11-18 17:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-26 16:35 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
2021-10-26 16:35 ` [PATCH 1/6] Define data structures needed for dirty quota migration Shivam Kumar
2021-10-26 16:35 ` [PATCH 2/6] Allocate memory for dirty quota context and initialize dirty quota migration flag Shivam Kumar
2021-10-27  4:50   ` kernel test robot
2021-10-27  7:31   ` kernel test robot
2021-10-26 16:35 ` [PATCH 3/6] Add dirty quota migration capability and handle vCPU page fault for dirty quota context Shivam Kumar
2021-10-27  7:19   ` kernel test robot
2021-10-26 16:35 ` [PATCH 4/6] Increment dirty counter for vmexit due to page write fault Shivam Kumar
2021-10-26 16:35 ` [PATCH 5/6] Exit to userspace when dirty quota is full Shivam Kumar
2021-10-26 16:35 ` [PATCH 6/6] Free space allocated for the vCPU's dirty quota context upon destroy Shivam Kumar
2021-11-14 14:57 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
2021-11-18 17:46 ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).