[PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge
@ 2021-11-14 14:57 Shivam Kumar
  2021-11-14 14:57 ` [PATCH 1/6] Define data structures for dirty quota migration Shivam Kumar
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Shivam Kumar @ 2021-11-14 14:57 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar

This patchset is the KVM-side implementation of a (new) dirty "quota"
based throttling algorithm that selectively throttles vCPUs based on their
individual contribution to overall memory dirtying and also dynamically
adapts the throttle based on the available network bandwidth.

Overview
----------
----------

To throttle memory dirtying, we propose to set a limit on the number of
pages a vCPU can dirty in given fixed microscopic size time intervals. This
limit depends on the network throughput calculated over the last few
intervals so as to throttle the vCPUs based on available network bandwidth.
We are referring to this limit as the "dirty quota" of a vCPU and
the fixed size intervals as the "dirty quota intervals". 

One possible approach to distributing the overall scope of dirtying for a
dirty quota interval is to equally distribute it among all the vCPUs. This
approach to the distribution doesn't make sense if the distribution of
workloads among vCPUs is skewed. So, to counter such skewed cases, we
propose that if any vCPU doesn't need its quota for any given dirty
quota interval, we add this quota to a common pool. This common pool (or
"common quota") can be consumed on a first come first serve basis
by all vCPUs in the upcoming dirty quota intervals.

Design
----------
----------

Initialization

vCPUDirtyQuotaContext keeps the dirty quota context for each vCPU. It keeps
the number of pages the vCPU has dirtied (dirty_counter) in the ongoing
dirty quota interval, and the maximum number of dirties allowed for the
vCPU (dirty_quota) in the ongoing dirty quota interval.

struct vCPUDirtyQuotaContext {
u64 dirty_counter;
u64 dirty_quota;
};

The flag dirty_quota_migration_enabled determines whether dirty quota-based
throttling is enabled for an ongoing migration or not.

Handling page dirtying

When the guest tries to dirty a page, it leads to a vmexit as each page is
write-protected. In the vmexit path, we increment the dirty_counter for the
corresponding vCPU. Then, we check if the vCPU has exceeded its quota. If
yes, we exit to userspace with a new exit reason KVM_EXIT_DIRTY_QUOTA_FULL.
This "quota full" event is further handled on the userspace side. 

Please find the KVM Forum presentation on dirty quota-based throttling
here: https://www.youtube.com/watch?v=ZBkkJf78zFA

Shivam Kumar (6):
  Define data structures for dirty quota migration.
  Init dirty quota flag and allocate memory for vCPUdqctx.
  Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults.
  Increment dirty counter for vmexit due to page write fault.
  Exit to userspace when dirty quota is full.
  Free vCPUdqctx memory on vCPU destroy.

 Documentation/virt/kvm/api.rst        | 39 +++++++++++++++++++
 arch/x86/include/uapi/asm/kvm.h       |  1 +
 arch/x86/kvm/Makefile                 |  3 +-
 arch/x86/kvm/x86.c                    |  9 +++++
 include/linux/dirty_quota_migration.h | 52 +++++++++++++++++++++++++
 include/linux/kvm_host.h              |  3 ++
 include/uapi/linux/kvm.h              | 11 ++++++
 virt/kvm/dirty_quota_migration.c      | 31 +++++++++++++++
 virt/kvm/kvm_main.c                   | 56 ++++++++++++++++++++++++++-
 9 files changed, 203 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/dirty_quota_migration.h
 create mode 100644 virt/kvm/dirty_quota_migration.c

-- 
2.22.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] Define data structures for dirty quota migration.
  2021-11-14 14:57 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
@ 2021-11-14 14:57 ` Shivam Kumar
  2021-11-14 14:57 ` [PATCH 2/6] Init dirty quota flag and allocate memory for vCPUdqctx Shivam Kumar
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Shivam Kumar @ 2021-11-14 14:57 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham, Manish Mishra

Define the data structures to be used on the KVM side:

vCPUDirtyQuotaContext: stores the dirty quota context for individual vCPUs
(shared between QEMU and KVM).
  dirty_counter: number of pages dirtied by the vCPU
  dirty_quota: limit on the number of pages the vCPU can dirty
dirty_quota_migration_enabled: flag to see if migration is on or off.

Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
---
 include/linux/dirty_quota_migration.h | 17 +++++++++++++++++
 include/linux/kvm_host.h              |  3 +++
 2 files changed, 20 insertions(+)
 create mode 100644 include/linux/dirty_quota_migration.h

diff --git a/include/linux/dirty_quota_migration.h b/include/linux/dirty_quota_migration.h
new file mode 100644
index 000000000000..4f4e0d80a04d
--- /dev/null
+++ b/include/linux/dirty_quota_migration.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef DIRTY_QUOTA_MIGRATION_H
+#define DIRTY_QUOTA_MIGRATION_H
+#include <linux/kvm.h>
+
+/**
+ * vCPUDirtyQuotaContext:  dirty quota context of a vCPU
+ *
+ * @dirty_counter:	number of pages dirtied by the vCPU
+ * @dirty_quota:	limit on the number of pages the vCPU can dirty
+ */
+struct vCPUDirtyQuotaContext {
+	u64 dirty_counter;
+	u64 dirty_quota;
+};
+
+#endif  /* DIRTY_QUOTA_MIGRATION_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9e0667e3723e..3cb6a43da01c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -38,6 +38,7 @@
 
 #include <asm/kvm_host.h>
 #include <linux/kvm_dirty_ring.h>
+#include <linux/dirty_quota_migration.h>
 
 #ifndef KVM_MAX_VCPU_IDS
 #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS
@@ -360,6 +361,7 @@ struct kvm_vcpu {
 	 * it is a valid slot.
 	 */
 	int last_used_slot;
+	struct vCPUDirtyQuotaContext *vCPUdqctx;
 };
 
 /* must be called with irqs disabled */
@@ -618,6 +620,7 @@ struct kvm {
 	u32 dirty_ring_size;
 	bool vm_bugged;
 	bool vm_dead;
+	bool dirty_quota_migration_enabled;
 
 #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
 	struct notifier_block pm_notifier;
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] Init dirty quota flag and allocate memory for vCPUdqctx.
  2021-11-14 14:57 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
  2021-11-14 14:57 ` [PATCH 1/6] Define data structures for dirty quota migration Shivam Kumar
@ 2021-11-14 14:57 ` Shivam Kumar
  2021-11-14 14:57 ` [PATCH 3/6] Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults Shivam Kumar
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Shivam Kumar @ 2021-11-14 14:57 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham, Manish Mishra

When the VM is created, we initialize the flag to track the start of a
dirty quota migration as false. It is set to true when the dirty quota
migration starts.
When a vCPU is created, we allocate memory for the dirty quota context of
the vCPU. This dirty quota context is mmaped to the userspace when a dirty
quota migration starts.

Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
---
 arch/x86/include/uapi/asm/kvm.h       |  1 +
 arch/x86/kvm/Makefile                 |  3 ++-
 include/linux/dirty_quota_migration.h | 16 ++++++++++++++++
 include/uapi/linux/kvm.h              |  9 +++++++++
 virt/kvm/dirty_quota_migration.c      | 14 ++++++++++++++
 virt/kvm/kvm_main.c                   |  8 ++++++++
 6 files changed, 50 insertions(+), 1 deletion(-)
 create mode 100644 virt/kvm/dirty_quota_migration.c

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 5a776a08f78c..c4270dd9219b 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -13,6 +13,7 @@
 #define KVM_PIO_PAGE_OFFSET 1
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 2
 #define KVM_DIRTY_LOG_PAGE_OFFSET 64
+#define KVM_DIRTY_QUOTA_PAGE_OFFSET 64
 
 #define DE_VECTOR 0
 #define DB_VECTOR 1
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 75dfd27b6e8a..a26fc0c94a83 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -11,7 +11,8 @@ KVM := ../../../virt/kvm
 
 kvm-y			+= $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
 				$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o \
-				$(KVM)/dirty_ring.o $(KVM)/binary_stats.o
+				$(KVM)/dirty_ring.o $(KVM)/binary_stats.o \
+				$(KVM)/dirty_quota_migration.o
 kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o
 
 kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
diff --git a/include/linux/dirty_quota_migration.h b/include/linux/dirty_quota_migration.h
index 4f4e0d80a04d..8c12fa428436 100644
--- a/include/linux/dirty_quota_migration.h
+++ b/include/linux/dirty_quota_migration.h
@@ -14,4 +14,20 @@ struct vCPUDirtyQuotaContext {
 	u64 dirty_quota;
 };
 
+#if (KVM_DIRTY_QUOTA_PAGE_OFFSET == 0)
+/*
+ * If KVM_DIRTY_QUOTA_PAGE_OFFSET is not defined by the arch, exclude
+ * dirty_quota_migration.o by defining these nop functions for the arch.
+ */
+static inline int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx)
+{
+	return 0;
+}
+
+#else /* KVM_DIRTY_QUOTA_PAGE_OFFSET == 0 */
+
+int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx);
+
+#endif /* KVM_DIRTY_QUOTA_PAGE_OFFSET == 0 */
+
 #endif  /* DIRTY_QUOTA_MIGRATION_H */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1daa45268de2..647f7e1a04dc 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1905,6 +1905,15 @@ struct kvm_hyperv_eventfd {
 #define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE    (1 << 0)
 #define KVM_DIRTY_LOG_INITIALLY_SET            (1 << 1)
 
+/*
+ * KVM_DIRTY_QUOTA_PAGE_OFFSET will be defined, and set to the
+ * starting page offset of dirty quota context structure, by the
+ * arch implementing dirty quota migration.
+ */
+#ifndef KVM_DIRTY_QUOTA_PAGE_OFFSET
+#define KVM_DIRTY_QUOTA_PAGE_OFFSET 0
+#endif
+
 /*
  * Arch needs to define the macro after implementing the dirty ring
  * feature.  KVM_DIRTY_LOG_PAGE_OFFSET should be defined as the
diff --git a/virt/kvm/dirty_quota_migration.c b/virt/kvm/dirty_quota_migration.c
new file mode 100644
index 000000000000..262f071aac0c
--- /dev/null
+++ b/virt/kvm/dirty_quota_migration.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/dirty_quota_migration.h>
+
+int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx)
+{
+	u64 size = sizeof(struct vCPUDirtyQuotaContext);
+	*vCPUdqctx = vmalloc(size);
+	if (!(*vCPUdqctx))
+		return -ENOMEM;
+	memset((*vCPUdqctx), 0, size);
+	return 0;
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d31724500501..5626ae1b92ce 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -66,6 +66,7 @@
 #include <trace/events/kvm.h>
 
 #include <linux/kvm_dirty_ring.h>
+#include <linux/dirty_quota_migration.h>
 
 /* Worst case buffer size needed for holding an integer. */
 #define ITOA_MAX_LEN 12
@@ -1079,6 +1080,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	}
 
 	kvm->max_halt_poll_ns = halt_poll_ns;
+	kvm->dirty_quota_migration_enabled = false;
 
 	r = kvm_arch_init_vm(kvm, type);
 	if (r)
@@ -3638,6 +3640,12 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 			goto arch_vcpu_destroy;
 	}
 
+	if (KVM_DIRTY_QUOTA_PAGE_OFFSET) {
+		r = kvm_vcpu_dirty_quota_alloc(&vcpu->vCPUdqctx);
+		if (r)
+			goto arch_vcpu_destroy;
+	}
+
 	mutex_lock(&kvm->lock);
 	if (kvm_get_vcpu_by_id(kvm, id)) {
 		r = -EEXIST;
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults.
  2021-11-14 14:57 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
  2021-11-14 14:57 ` [PATCH 1/6] Define data structures for dirty quota migration Shivam Kumar
  2021-11-14 14:57 ` [PATCH 2/6] Init dirty quota flag and allocate memory for vCPUdqctx Shivam Kumar
@ 2021-11-14 14:57 ` Shivam Kumar
  2021-11-18 17:57   ` Sean Christopherson
  2021-11-14 14:57 ` [PATCH 4/6] Increment dirty counter for vmexit due to page write fault Shivam Kumar
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Shivam Kumar @ 2021-11-14 14:57 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham, Manish Mishra

When a dirty quota migration is initiated from QEMU side, the following
things happen:

1. An mmap ioctl is called for each vCPU to mmap the dirty quota context.
This results into vCPU page fault which needs to be handled.
2. An ioctl to start dirty quota migration is called from QEMU and must be
handled. This happens once QEMU is ready to start the migration.

Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
---
 Documentation/virt/kvm/api.rst        | 39 +++++++++++++++++++++++++++
 include/linux/dirty_quota_migration.h |  8 ++++++
 include/uapi/linux/kvm.h              |  1 +
 virt/kvm/dirty_quota_migration.c      |  6 +++++
 virt/kvm/kvm_main.c                   | 37 +++++++++++++++++++++++++
 5 files changed, 91 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index aeeb071c7688..6679bceee649 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -277,6 +277,10 @@ the VCPU file descriptor can be mmap-ed, including:
   KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE.  For more information on
   KVM_CAP_DIRTY_LOG_RING, see section 8.3.
 
+- if KVM_CAP_DIRTY_QUOTA_MIGRATION is available, a number of pages at
+  KVM_DIRTY_QUOTA_PAGE_OFFSET * PAGE_SIZE.  For more information on
+  KVM_CAP_DIRTY_QUOTA_MIGRATION, see section 8.35.
+
 
 4.6 KVM_SET_MEMORY_REGION
 -------------------------
@@ -7484,3 +7488,38 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
 of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
 the hypercalls whose corresponding bit is in the argument, and return
 ENOSYS for the others.
+
+8.35 KVM_CAP_DIRTY_QUOTA_MIGRATION
+---------------------------
+
+:Architectures: x86
+:Parameters: args[0] - boolean value specifying whether to enable or
+disable dirty quota migration (true and false respectively)
+
+With dirty quota migration, memory dirtying is throttled by setting a
+limit on the number of pages a vCPU can dirty in given fixed microscopic
+size time intervals. This limit depends on the network throughput
+calculated over the last few intervals so as to throttle the vCPUs based
+on available network bandwidth. We are referring to this limit as the
+"dirty quota" of a vCPU and the fixed size intervals as the "dirty quota
+intervals".
+
+vCPUDirtyQuotaContext keeps the dirty quota context for each vCPU. It
+keeps the number of pages the vCPU has dirtied (dirty_counter) in the
+ongoing dirty quota interval and the maximum number of dirties allowed for
+the vCPU (dirty_quota) in the ongoing dirty quota interval.
+
+  struct vCPUDirtyQuotaContext {
+          u64 dirty_counter;
+          u64 dirty_quota;
+  };
+
+The flag dirty_quota_migration_enabled determines whether dirty quota-
+based throttling is enabled for an ongoing migration or not.
+
+When the guest tries to dirty a page, it leads to a vmexit as each page
+is write-protected. In the vmexit path, we increment the dirty_counter
+for the corresponding vCPU. Then, we check if the vCPU has exceeded its
+quota. If yes, we exit to userspace with a new exit reason
+KVM_EXIT_DIRTY_QUOTA_FULL. This "quota full" event is further handled on
+the userspace side.
diff --git a/include/linux/dirty_quota_migration.h b/include/linux/dirty_quota_migration.h
index 8c12fa428436..b6c6f5f896dd 100644
--- a/include/linux/dirty_quota_migration.h
+++ b/include/linux/dirty_quota_migration.h
@@ -24,9 +24,17 @@ static inline int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPU
 	return 0;
 }
 
+static inline struct page *kvm_dirty_quota_context_get_page(
+		struct vCPUDirtyQuotaContext *vCPUdqctx, u32 offset)
+{
+	return NULL;
+}
+
 #else /* KVM_DIRTY_QUOTA_PAGE_OFFSET == 0 */
 
 int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx);
+struct page *kvm_dirty_quota_context_get_page(
+		struct vCPUDirtyQuotaContext *vCPUdqctx, u32 offset);
 
 #endif /* KVM_DIRTY_QUOTA_PAGE_OFFSET == 0 */
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 647f7e1a04dc..a6785644bf47 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
+#define KVM_CAP_DIRTY_QUOTA_MIGRATION 207
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/dirty_quota_migration.c b/virt/kvm/dirty_quota_migration.c
index 262f071aac0c..7e9ace760939 100644
--- a/virt/kvm/dirty_quota_migration.c
+++ b/virt/kvm/dirty_quota_migration.c
@@ -12,3 +12,9 @@ int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx)
 	memset((*vCPUdqctx), 0, size);
 	return 0;
 }
+
+struct page *kvm_dirty_quota_context_get_page(
+		struct vCPUDirtyQuotaContext *vCPUdqctx, u32 offset)
+{
+	return vmalloc_to_page((void *)vCPUdqctx + offset * PAGE_SIZE);
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5626ae1b92ce..1564d3a3f608 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3519,6 +3519,9 @@ static vm_fault_t kvm_vcpu_fault(struct vm_fault *vmf)
 		page = kvm_dirty_ring_get_page(
 		    &vcpu->dirty_ring,
 		    vmf->pgoff - KVM_DIRTY_LOG_PAGE_OFFSET);
+	else if (vmf->pgoff == KVM_DIRTY_QUOTA_PAGE_OFFSET)
+		page = kvm_dirty_quota_context_get_page(vcpu->vCPUdqctx,
+				vmf->pgoff - KVM_DIRTY_QUOTA_PAGE_OFFSET);
 	else
 		return kvm_arch_vcpu_fault(vcpu, vmf);
 	get_page(page);
@@ -4207,6 +4210,12 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #endif
 	case KVM_CAP_BINARY_STATS_FD:
 		return 1;
+	case KVM_CAP_DIRTY_QUOTA_MIGRATION:
+#if KVM_DIRTY_QUOTA_PAGE_OFFSET > 0
+		return 1;
+#else
+		return 0;
+#endif
 	default:
 		break;
 	}
@@ -4273,6 +4282,31 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm)
 	return cleared;
 }
 
+static int kvm_vm_ioctl_enable_dirty_quota_migration(struct kvm *kvm,
+		bool enabled)
+{
+	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
+		return -EINVAL;
+
+	/*
+	 * For now, dirty quota migration works with dirty bitmap so don't
+	 * enable it if dirty ring interface is enabled. In future, dirty
+	 * quota migration may work with dirty ring interface was well.
+	 */
+	if (kvm->dirty_ring_size)
+		return -EINVAL;
+
+	/* Return if no change */
+	if (kvm->dirty_quota_migration_enabled == enabled)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+	kvm->dirty_quota_migration_enabled = enabled;
+	mutex_unlock(&kvm->lock);
+
+	return 0;
+}
+
 int __attribute__((weak)) kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 						  struct kvm_enable_cap *cap)
 {
@@ -4305,6 +4339,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 	}
 	case KVM_CAP_DIRTY_LOG_RING:
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+	case KVM_CAP_DIRTY_QUOTA_MIGRATION:
+		return kvm_vm_ioctl_enable_dirty_quota_migration(kvm,
+				cap->args[0]);
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] Increment dirty counter for vmexit due to page write fault.
  2021-11-14 14:57 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
                   ` (2 preceding siblings ...)
  2021-11-14 14:57 ` [PATCH 3/6] Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults Shivam Kumar
@ 2021-11-14 14:57 ` Shivam Kumar
  2021-11-18 17:48   ` Sean Christopherson
  2021-11-14 14:57 ` [PATCH 5/6] Exit to userspace when dirty quota is full Shivam Kumar
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Shivam Kumar @ 2021-11-14 14:57 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham, Manish Mishra

For a page write fault or "page dirty", the dirty counter of the
corresponding vCPU is incremented.

Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
---
 virt/kvm/kvm_main.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1564d3a3f608..55bf92cf9f4f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3091,8 +3091,15 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		if (kvm->dirty_ring_size)
 			kvm_dirty_ring_push(kvm_dirty_ring_get(kvm),
 					    slot, rel_gfn);
-		else
+		else {
+			struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+
+			if (vcpu && vcpu->kvm->dirty_quota_migration_enabled &&
+					vcpu->vCPUdqctx)
+				vcpu->vCPUdqctx->dirty_counter++;
+
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
+		}
 	}
 }
 EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot);
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] Exit to userspace when dirty quota is full.
  2021-11-14 14:57 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
                   ` (3 preceding siblings ...)
  2021-11-14 14:57 ` [PATCH 4/6] Increment dirty counter for vmexit due to page write fault Shivam Kumar
@ 2021-11-14 14:57 ` Shivam Kumar
  2021-11-14 14:57 ` [PATCH 6/6] Free vCPUdqctx memory on vCPU destroy Shivam Kumar
  2021-11-18 17:46 ` [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Sean Christopherson
  6 siblings, 0 replies; 15+ messages in thread
From: Shivam Kumar @ 2021-11-14 14:57 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham, Manish Mishra

Whenever dirty quota is full (i.e. dirty counter equals dirty quota),
control is passed to the QEMU side, through a KVM exit with the custom exit
reason KVM_EXIT_DIRTY_QUOTA_FULL, to handle the dirty quota full event.

Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
---
 arch/x86/kvm/x86.c                    | 9 +++++++++
 include/linux/dirty_quota_migration.h | 6 ++++++
 include/uapi/linux/kvm.h              | 1 +
 virt/kvm/dirty_quota_migration.c      | 5 +++++
 4 files changed, 21 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dc7eb5fddfd3..32fc7a6f8b86 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -59,6 +59,7 @@
 #include <linux/mem_encrypt.h>
 #include <linux/entry-kvm.h>
 #include <linux/suspend.h>
+#include <linux/dirty_quota_migration.h>
 
 #include <trace/events/kvm.h>
 
@@ -10028,6 +10029,14 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 				return r;
 			vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
 		}
+
+		/* check for dirty quota migration exit condition if it is enabled */
+		if (vcpu->kvm->dirty_quota_migration_enabled &&
+				is_dirty_quota_full(vcpu->vCPUdqctx)) {
+			vcpu->run->exit_reason = KVM_EXIT_DIRTY_QUOTA_FULL;
+			r = 0;
+			break;
+		}
 	}
 
 	srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
diff --git a/include/linux/dirty_quota_migration.h b/include/linux/dirty_quota_migration.h
index b6c6f5f896dd..b9b3bedd9682 100644
--- a/include/linux/dirty_quota_migration.h
+++ b/include/linux/dirty_quota_migration.h
@@ -30,11 +30,17 @@ static inline struct page *kvm_dirty_quota_context_get_page(
 	return NULL;
 }
 
+static inline bool is_dirty_quota_full(struct vCPUDirtyQuotaContext *vCPUdqctx)
+{
+	return true;
+}
+
 #else /* KVM_DIRTY_QUOTA_PAGE_OFFSET == 0 */
 
 int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx);
 struct page *kvm_dirty_quota_context_get_page(
 		struct vCPUDirtyQuotaContext *vCPUdqctx, u32 offset);
+bool is_dirty_quota_full(struct vCPUDirtyQuotaContext *vCPUdqctx);
 
 #endif /* KVM_DIRTY_QUOTA_PAGE_OFFSET == 0 */
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a6785644bf47..6ba39a6015b0 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -270,6 +270,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_X86_BUS_LOCK     33
 #define KVM_EXIT_XEN              34
 #define KVM_EXIT_RISCV_SBI        35
+#define KVM_EXIT_DIRTY_QUOTA_FULL 36
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
diff --git a/virt/kvm/dirty_quota_migration.c b/virt/kvm/dirty_quota_migration.c
index 7e9ace760939..eeef19347af4 100644
--- a/virt/kvm/dirty_quota_migration.c
+++ b/virt/kvm/dirty_quota_migration.c
@@ -18,3 +18,8 @@ struct page *kvm_dirty_quota_context_get_page(
 {
 	return vmalloc_to_page((void *)vCPUdqctx + offset * PAGE_SIZE);
 }
+
+bool is_dirty_quota_full(struct vCPUDirtyQuotaContext *vCPUdqctx)
+{
+	return (vCPUdqctx->dirty_counter >= vCPUdqctx->dirty_quota);
+}
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] Free vCPUdqctx memory on vCPU destroy.
  2021-11-14 14:57 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
                   ` (4 preceding siblings ...)
  2021-11-14 14:57 ` [PATCH 5/6] Exit to userspace when dirty quota is full Shivam Kumar
@ 2021-11-14 14:57 ` Shivam Kumar
  2021-11-18 17:46 ` [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Sean Christopherson
  6 siblings, 0 replies; 15+ messages in thread
From: Shivam Kumar @ 2021-11-14 14:57 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, Shivam Kumar, Anurag Madnawat, Shaju Abraham, Manish Mishra

When the vCPU is destroyed, we must free the space allocated to the dirty
quota context for the vCPU.

Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
---
 include/linux/dirty_quota_migration.h | 5 +++++
 virt/kvm/dirty_quota_migration.c      | 6 ++++++
 virt/kvm/kvm_main.c                   | 2 ++
 3 files changed, 13 insertions(+)

diff --git a/include/linux/dirty_quota_migration.h b/include/linux/dirty_quota_migration.h
index b9b3bedd9682..a31f333a37bc 100644
--- a/include/linux/dirty_quota_migration.h
+++ b/include/linux/dirty_quota_migration.h
@@ -35,12 +35,17 @@ static inline bool is_dirty_quota_full(struct vCPUDirtyQuotaContext *vCPUdqctx)
 	return true;
 }
 
+static inline void kvm_vcpu_dirty_quota_free(struct vCPUDirtyQuotaContext **vCPUdqctx)
+{
+}
+
 #else /* KVM_DIRTY_QUOTA_PAGE_OFFSET == 0 */
 
 int kvm_vcpu_dirty_quota_alloc(struct vCPUDirtyQuotaContext **vCPUdqctx);
 struct page *kvm_dirty_quota_context_get_page(
 		struct vCPUDirtyQuotaContext *vCPUdqctx, u32 offset);
 bool is_dirty_quota_full(struct vCPUDirtyQuotaContext *vCPUdqctx);
+void kvm_vcpu_dirty_quota_free(struct vCPUDirtyQuotaContext **vCPUdqctx);
 
 #endif /* KVM_DIRTY_QUOTA_PAGE_OFFSET == 0 */
 
diff --git a/virt/kvm/dirty_quota_migration.c b/virt/kvm/dirty_quota_migration.c
index eeef19347af4..3f74af2ccab9 100644
--- a/virt/kvm/dirty_quota_migration.c
+++ b/virt/kvm/dirty_quota_migration.c
@@ -23,3 +23,9 @@ bool is_dirty_quota_full(struct vCPUDirtyQuotaContext *vCPUdqctx)
 {
 	return (vCPUdqctx->dirty_counter >= vCPUdqctx->dirty_quota);
 }
+
+void kvm_vcpu_dirty_quota_free(struct vCPUDirtyQuotaContext **vCPUdqctx)
+{
+	vfree(*vCPUdqctx);
+	*vCPUdqctx = NULL;
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 55bf92cf9f4f..9bf0c728f926 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -438,6 +438,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 
 void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
+	kvm_vcpu_dirty_quota_free(&vcpu->vCPUdqctx);
 	kvm_dirty_ring_free(&vcpu->dirty_ring);
 	kvm_arch_vcpu_destroy(vcpu);
 
@@ -3693,6 +3694,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 
 unlock_vcpu_destroy:
 	mutex_unlock(&kvm->lock);
+	kvm_vcpu_dirty_quota_free(&vcpu->vCPUdqctx);
 	kvm_dirty_ring_free(&vcpu->dirty_ring);
 arch_vcpu_destroy:
 	kvm_arch_vcpu_destroy(vcpu);
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge
  2021-11-14 14:57 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
                   ` (5 preceding siblings ...)
  2021-11-14 14:57 ` [PATCH 6/6] Free vCPUdqctx memory on vCPU destroy Shivam Kumar
@ 2021-11-18 17:46 ` Sean Christopherson
  6 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2021-11-18 17:46 UTC (permalink / raw)
  To: Shivam Kumar; +Cc: pbonzini, kvm

On Sun, Nov 14, 2021, Shivam Kumar wrote:
> One possible approach to distributing the overall scope of dirtying for a
> dirty quota interval is to equally distribute it among all the vCPUs. This
> approach to the distribution doesn't make sense if the distribution of
> workloads among vCPUs is skewed. So, to counter such skewed cases, we
> propose that if any vCPU doesn't need its quota for any given dirty
> quota interval, we add this quota to a common pool. This common pool (or
> "common quota") can be consumed on a first come first serve basis
> by all vCPUs in the upcoming dirty quota intervals.

Why not simply use a per-VM quota in combination with a percpu_counter to avoid bouncing
the dirty counter?

> Design
> ----------
> ----------
> 
> Initialization
> 

Feedback that applies to all patches:

> vCPUDirtyQuotaContext keeps the dirty quota context for each vCPU. It keeps

CamelCase is very frowned upon, please use whatever_case_this_is_called.

The SOB chains are wrong.  The person physically posting the patches needs to have
their SOB last, as they are the person who last handled the patches.

  Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
  Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
  Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
  Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
  Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>

These needs a Co-developed-by.  The only other scenario is that you and Anurag
wrote the patches, then handed them off to Shaju, who sent them to Manish, who
sent them back to you for posting.  I highly doubt that's the case, and if so,
I would hope you've done due diligence to ensure what you handed off is the same
as what you posted, i.e. the SOB chains for Shaju and Manish can be omitted.

In general, please read through most of the stuff in Documentation/process.

> the number of pages the vCPU has dirtied (dirty_counter) in the ongoing
> dirty quota interval, and the maximum number of dirties allowed for the
> vCPU (dirty_quota) in the ongoing dirty quota interval.
> 
> struct vCPUDirtyQuotaContext {
> u64 dirty_counter;
> u64 dirty_quota;
> };
> 
> The flag dirty_quota_migration_enabled determines whether dirty quota-based
> throttling is enabled for an ongoing migration or not.
> 
> 
> Handling page dirtying
> 
> When the guest tries to dirty a page, it leads to a vmexit as each page is
> write-protected. In the vmexit path, we increment the dirty_counter for the
> corresponding vCPU. Then, we check if the vCPU has exceeded its quota. If
> yes, we exit to userspace with a new exit reason KVM_EXIT_DIRTY_QUOTA_FULL.
> This "quota full" event is further handled on the userspace side. 
> 
> 
> Please find the KVM Forum presentation on dirty quota-based throttling
> here: https://www.youtube.com/watch?v=ZBkkJf78zFA
> 
> 
> Shivam Kumar (6):
>   Define data structures for dirty quota migration.
>   Init dirty quota flag and allocate memory for vCPUdqctx.
>   Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults.
>   Increment dirty counter for vmexit due to page write fault.
>   Exit to userspace when dirty quota is full.
>   Free vCPUdqctx memory on vCPU destroy.

Freeing memory in a later patch is not an option.  The purpose of splitting is
to aid bisection and make the patches more reviewable, not to break bisection and
confuse reviewers.  In general, there are too many patches and things are split in
weird ways, making this hard to review.  This can probably be smushed to two
patches: 1) implement the guts, 2) exposed to userspace and document.

>  Documentation/virt/kvm/api.rst        | 39 +++++++++++++++++++
>  arch/x86/include/uapi/asm/kvm.h       |  1 +
>  arch/x86/kvm/Makefile                 |  3 +-
>  arch/x86/kvm/x86.c                    |  9 +++++
>  include/linux/dirty_quota_migration.h | 52 +++++++++++++++++++++++++
>  include/linux/kvm_host.h              |  3 ++
>  include/uapi/linux/kvm.h              | 11 ++++++
>  virt/kvm/dirty_quota_migration.c      | 31 +++++++++++++++

I do not see any reason to add two new files for 84 lines, which I'm pretty sure
we can trim down significantly in any case.  Paolo has suggested creating files
for the mm side of generic kvm, the helpers can go wherever that lands.

>  virt/kvm/kvm_main.c                   | 56 ++++++++++++++++++++++++++-
>  9 files changed, 203 insertions(+), 2 deletions(-)
>  create mode 100644 include/linux/dirty_quota_migration.h
>  create mode 100644 virt/kvm/dirty_quota_migration.c

As for the design, allocating a separate page for 16 bytes is wasteful and adds
complexity that I don't think is strictly necessary.  Assuming the quota isn't
simply a per-VM thing....

Rather than have both the count and the quote writable by userspace, what about
having KVM_CAP_DIRTY_QUOTA_MIGRATION (renamed to just KVM_CAP_DIRTY_QUOTA, because
dirty logging can technically be used for things other than migration) define a
default, per-VM dirty quota, that is snapshotted by each vCPU on creation.  The
ioctl() would need to be rejected if vCPUs have been created, but it already needs
something along those lines because currently it has a TOCTOU race and can also
race with vCPU readers.

Anyways, vCPUs snapshot a default quota on creation, and then use struct kvm_run to
update the quota upon return from userspace after KVM_EXIT_DIRTY_QUOTA_FULL instead
of giving userspace free reign to change it the quota at will.  There are a variety
of ways to leverage kvm_run, the simplest I can think of would be to define the ABI
such that calling KVM_RUN with "exit_reason == KVM_EXIT_DIRTY_QUOTA_FULL" would
trigger an update.  That would do the right thing even if userspace _doesn't_ update
the count/quota, as KVM would simply copy back the original quota/count and exit back
to userspace.

E.g.

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 78f0719cc2a3..d4a7d1b7019e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -487,6 +487,11 @@ struct kvm_run {
                        unsigned long args[6];
                        unsigned long ret[2];
                } riscv_sbi;
+               /* KVM_EXIT_DIRTY_QUOTA_FULL */
+               struct {
+                       u64 dirty_count;
+                       u64 dirty_quota;
+               }
                /* Fix the size of the union. */
                char padding[256];
        };

Side topic, it might make sense to have the counter be a stat, the per-vCPU dirty
rate could be useful info even if userspace isn't using quotas.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] Increment dirty counter for vmexit due to page write fault.
  2021-11-14 14:57 ` [PATCH 4/6] Increment dirty counter for vmexit due to page write fault Shivam Kumar
@ 2021-11-18 17:48   ` Sean Christopherson
  2021-11-19 19:54     ` Shivam Kumar
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2021-11-18 17:48 UTC (permalink / raw)
  To: Shivam Kumar; +Cc: pbonzini, kvm, Anurag Madnawat, Shaju Abraham, Manish Mishra

On Sun, Nov 14, 2021, Shivam Kumar wrote:
> For a page write fault or "page dirty", the dirty counter of the
> corresponding vCPU is incremented.
> 
> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
> Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
> ---
>  virt/kvm/kvm_main.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1564d3a3f608..55bf92cf9f4f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3091,8 +3091,15 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>  		if (kvm->dirty_ring_size)
>  			kvm_dirty_ring_push(kvm_dirty_ring_get(kvm),
>  					    slot, rel_gfn);
> -		else
> +		else {
> +			struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> +
> +			if (vcpu && vcpu->kvm->dirty_quota_migration_enabled &&
> +					vcpu->vCPUdqctx)
> +				vcpu->vCPUdqctx->dirty_counter++;

Checking dirty_quota_migration_enabled can race, and it'd be far faster to
unconditionally update a counter, e.g. a per-vCPU stat.

> +
>  			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> +		}
>  	}
>  }
>  EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot);
> -- 
> 2.22.3
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults.
  2021-11-14 14:57 ` [PATCH 3/6] Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults Shivam Kumar
@ 2021-11-18 17:57   ` Sean Christopherson
  2021-11-19 20:03     ` Shivam Kumar
       [not found]     ` <02b8fa86-a86b-969e-2137-1953639cb6d2@nutanix.com>
  0 siblings, 2 replies; 15+ messages in thread
From: Sean Christopherson @ 2021-11-18 17:57 UTC (permalink / raw)
  To: Shivam Kumar; +Cc: pbonzini, kvm, Anurag Madnawat, Shaju Abraham, Manish Mishra

On Sun, Nov 14, 2021, Shivam Kumar wrote:
> +static int kvm_vm_ioctl_enable_dirty_quota_migration(struct kvm *kvm,
> +		bool enabled)
> +{
> +	if (!KVM_DIRTY_LOG_PAGE_OFFSET)

I don't think we should force architectures to opt in.  It would be trivial to
add 

		if (kvm_dirty_quota_is_full(vcpu)) {
			vcpu->run->exit_reason = KVM_EXIT_DIRTY_QUOTA_FULL;
			r = 0;
			break;
		}

in the run loops of each architecture.  And we can do that in incremental patches
without #ifdeffery since it's only the exiting aspect that requires arch help.

> +		return -EINVAL;
> +
> +	/*
> +	 * For now, dirty quota migration works with dirty bitmap so don't
> +	 * enable it if dirty ring interface is enabled. In future, dirty
> +	 * quota migration may work with dirty ring interface was well.
> +	 */

Why does KVM care?  This is a very simple concept.  QEMU not using it for the
dirty ring doesn't mean KVM can't support it.

> +	if (kvm->dirty_ring_size)
> +		return -EINVAL;
> +
> +	/* Return if no change */
> +	if (kvm->dirty_quota_migration_enabled == enabled)

Needs to be check under lock.

> +		return -EINVAL;

Probably more idiomatic to return 0 if the desired value is the current value.

> +	mutex_lock(&kvm->lock);
> +	kvm->dirty_quota_migration_enabled = enabled;

Needs to check vCPU creation.

> +	mutex_unlock(&kvm->lock);
> +
> +	return 0;
> +}
> +
>  int __attribute__((weak)) kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  						  struct kvm_enable_cap *cap)
>  {
> @@ -4305,6 +4339,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  	}
>  	case KVM_CAP_DIRTY_LOG_RING:
>  		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +	case KVM_CAP_DIRTY_QUOTA_MIGRATION:
> +		return kvm_vm_ioctl_enable_dirty_quota_migration(kvm,
> +				cap->args[0]);
>  	default:
>  		return kvm_vm_ioctl_enable_cap(kvm, cap);
>  	}
> -- 
> 2.22.3
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] Increment dirty counter for vmexit due to page write fault.
  2021-11-18 17:48   ` Sean Christopherson
@ 2021-11-19 19:54     ` Shivam Kumar
  0 siblings, 0 replies; 15+ messages in thread
From: Shivam Kumar @ 2021-11-19 19:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, Anurag Madnawat, Shaju Abraham, Manish Mishra


On 18/11/21 11:18 pm, Sean Christopherson wrote:
> On Sun, Nov 14, 2021, Shivam Kumar wrote:
>> For a page write fault or "page dirty", the dirty counter of the
>> corresponding vCPU is incremented.
>>
>> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
>> Signed-off-by: Shaju Abraham <shaju.abraham@nutanix.com>
>> Signed-off-by: Manish Mishra <manish.mishra@nutanix.com>
>> ---
>>   virt/kvm/kvm_main.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 1564d3a3f608..55bf92cf9f4f 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -3091,8 +3091,15 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>   		if (kvm->dirty_ring_size)
>>   			kvm_dirty_ring_push(kvm_dirty_ring_get(kvm),
>>   					    slot, rel_gfn);
>> -		else
>> +		else {
>> +			struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>> +
>> +			if (vcpu && vcpu->kvm->dirty_quota_migration_enabled &&
>> +					vcpu->vCPUdqctx)
>> +				vcpu->vCPUdqctx->dirty_counter++;
> Checking dirty_quota_migration_enabled can race, and it'd be far faster to
> unconditionally update a counter, e.g. a per-vCPU stat.
Yes, unconditional update seems fine as it is not required to reset the 
dirty counter every time a new migration starts (as per our discussion 
on PATCH 0 in this patchset). Thanks.
>
>> +
>>   			set_bit_le(rel_gfn, memslot->dirty_bitmap);
>> +		}
>>   	}
>>   }
>>   EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot);
>> -- 
>> 2.22.3
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults.
  2021-11-18 17:57   ` Sean Christopherson
@ 2021-11-19 20:03     ` Shivam Kumar
       [not found]     ` <02b8fa86-a86b-969e-2137-1953639cb6d2@nutanix.com>
  1 sibling, 0 replies; 15+ messages in thread
From: Shivam Kumar @ 2021-11-19 20:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, Anurag Madnawat, Shaju Abraham, Manish Mishra


On 18/11/21 11:27 pm, Sean Christopherson wrote:
> On Sun, Nov 14, 2021, Shivam Kumar wrote:
>> +static int kvm_vm_ioctl_enable_dirty_quota_migration(struct kvm *kvm,
>> +		bool enabled)
>> +{
>> +	if (!KVM_DIRTY_LOG_PAGE_OFFSET)
> I don't think we should force architectures to opt in.  It would be trivial to
> add
>
> 		if (kvm_dirty_quota_is_full(vcpu)) {
> 			vcpu->run->exit_reason = KVM_EXIT_DIRTY_QUOTA_FULL;
> 			r = 0;
> 			break;
> 		}
>
> in the run loops of each architecture.  And we can do that in incremental patches
> without #ifdeffery since it's only the exiting aspect that requires arch help.
Noted. Thanks.
>
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * For now, dirty quota migration works with dirty bitmap so don't
>> +	 * enable it if dirty ring interface is enabled. In future, dirty
>> +	 * quota migration may work with dirty ring interface was well.
>> +	 */
> Why does KVM care?  This is a very simple concept.  QEMU not using it for the
> dirty ring doesn't mean KVM can't support it.
>

The dirty ring interface, if enabled, blocks the path that updates the 
dirty bitmap. Our current implementation depends on that path. We were 
planning to make the required changes in our implementation for it work 
with dirty ring as well in upcoming patches. Will explore the 
possibility of doing it in the next patchset only.


>> +	if (kvm->dirty_ring_size)
>> +		return -EINVAL;
>> +
>> +	/* Return if no change */
>> +	if (kvm->dirty_quota_migration_enabled == enabled)
> Needs to be check under lock.


Noted. Thanks.


>
>> +		return -EINVAL;
> Probably more idiomatic to return 0 if the desired value is the current value.


Keeping the case in mind when the userspace is trying to enable it while 
the migration is already going on(which shouldn't happen), we are 
returning -EINVAL. Please let me know if 0 still makes more sense.


>
>> +	mutex_lock(&kvm->lock);
>> +	kvm->dirty_quota_migration_enabled = enabled;
> Needs to check vCPU creation.


In our current implementation, we are using the 
KVM_CAP_DIRTY_QUOTA_MIGRATION ioctl to start dirty logging (through 
dirty counter) on the kernel side. This ioctl is called each time a new 
migration starts and ends.

The dirty quota context of each vCPU is stored in two variables dirty 
counter and quota which we are allocating at vCPU creation and freeing 
at vCPU destroy.


>
>> +	mutex_unlock(&kvm->lock);
>> +
>> +	return 0;
>> +}
>> +
>>   int __attribute__((weak)) kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>>   						  struct kvm_enable_cap *cap)
>>   {
>> @@ -4305,6 +4339,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>   	}
>>   	case KVM_CAP_DIRTY_LOG_RING:
>>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>> +	case KVM_CAP_DIRTY_QUOTA_MIGRATION:
>> +		return kvm_vm_ioctl_enable_dirty_quota_migration(kvm,
>> +				cap->args[0]);
>>   	default:
>>   		return kvm_vm_ioctl_enable_cap(kvm, cap);
>>   	}
>> -- 
>> 2.22.3
>>
Thank you very much for your feedback.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults.
       [not found]       ` <YZgD0D4536s2DMem@google.com>
@ 2021-11-19 20:21         ` Shivam Kumar
  2021-11-25  8:43           ` Shivam Kumar
  0 siblings, 1 reply; 15+ messages in thread
From: Shivam Kumar @ 2021-11-19 20:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, Anurag Madnawat, Shaju Abraham, Manish Mishra


On 20/11/21 1:36 am, Sean Christopherson wrote:
> On Sat, Nov 20, 2021, Shivam Kumar wrote:
>> On 18/11/21 11:27 pm, Sean Christopherson wrote:
>>>> +		return -EINVAL;
>>> Probably more idiomatic to return 0 if the desired value is the current value.
>> Keeping the case in mind when the userspace is trying to enable it while the
>> migration is already going on(which shouldn't happen), we are returning
>> -EINVAL. Please let me know if 0 still makes more sense.
> If the semantics are not "enable/disable", but rather "(re)set the quota",
> then it makes sense to allow changing the quota arbitrarily.
I agree that the semantics are not apt. Will modify it. Thanks.
>
>>>> +	mutex_lock(&kvm->lock);
>>>> +	kvm->dirty_quota_migration_enabled = enabled;
>>> Needs to check vCPU creation.
>> In our current implementation, we are using the
>> KVM_CAP_DIRTY_QUOTA_MIGRATION ioctl to start dirty logging (through dirty
>> counter) on the kernel side. This ioctl is called each time a new migration
>> starts and ends.
> Ah, and from the cover letter discussion, you want the count and quota to be
> reset when a new migration occurs.  That makes sense.
>
> Actually, if we go the route of using kvm_run to report and update the count/quota,
> we don't even need a capability.  Userspace can signal each vCPU to induce an exit
> to userspace, e.g. at the start of migration, then set the desired quota/count in
> vcpu->kvm_run and stuff exit_reason so that KVM updates the quota/count on the
> subsequent KVM_RUN.  No locking or requests needed, and userspace can reset the
> count at will, it just requires a signal.
>
> It's a little weird to overload exit_reason like that, but if that's a sticking
> point we could add a flag in kvm_run somewhere.  Requiring an exit to userspace
> at the start of migration doesn't seem too onerous.
Yes, this path looks flaw-free. We will explore the complexity and how 
we can simplify its implementation.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults.
  2021-11-19 20:21         ` Shivam Kumar
@ 2021-11-25  8:43           ` Shivam Kumar
  2021-12-01 17:22             ` Sean Christopherson
  0 siblings, 1 reply; 15+ messages in thread
From: Shivam Kumar @ 2021-11-25  8:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, Anurag Madnawat, Shaju Abraham, Manish Mishra


On 20/11/21 1:51 am, Shivam Kumar wrote:
>
> On 20/11/21 1:36 am, Sean Christopherson wrote:
>> On Sat, Nov 20, 2021, Shivam Kumar wrote:
>>> On 18/11/21 11:27 pm, Sean Christopherson wrote:
>>>>> +        return -EINVAL;
>>>> Probably more idiomatic to return 0 if the desired value is the 
>>>> current value.
>>> Keeping the case in mind when the userspace is trying to enable it 
>>> while the
>>> migration is already going on(which shouldn't happen), we are returning
>>> -EINVAL. Please let me know if 0 still makes more sense.
>> If the semantics are not "enable/disable", but rather "(re)set the 
>> quota",
>> then it makes sense to allow changing the quota arbitrarily.
> I agree that the semantics are not apt. Will modify it. Thanks.
>>
>>>>> +    mutex_lock(&kvm->lock);
>>>>> +    kvm->dirty_quota_migration_enabled = enabled;
>>>> Needs to check vCPU creation.
>>> In our current implementation, we are using the
>>> KVM_CAP_DIRTY_QUOTA_MIGRATION ioctl to start dirty logging (through 
>>> dirty
>>> counter) on the kernel side. This ioctl is called each time a new 
>>> migration
>>> starts and ends.
>> Ah, and from the cover letter discussion, you want the count and 
>> quota to be
>> reset when a new migration occurs.  That makes sense.
>>
>> Actually, if we go the route of using kvm_run to report and update 
>> the count/quota,
>> we don't even need a capability.  Userspace can signal each vCPU to 
>> induce an exit
>> to userspace, e.g. at the start of migration, then set the desired 
>> quota/count in
>> vcpu->kvm_run and stuff exit_reason so that KVM updates the 
>> quota/count on the
>> subsequent KVM_RUN.  No locking or requests needed, and userspace can 
>> reset the
>> count at will, it just requires a signal.
>>
>> It's a little weird to overload exit_reason like that, but if that's 
>> a sticking
>> point we could add a flag in kvm_run somewhere.  Requiring an exit to 
>> userspace
>> at the start of migration doesn't seem too onerous.
> Yes, this path looks flaw-free. We will explore the complexity and how 
> we can simplify its implementation.
Is it okay to define the per-vcpu dirty quota and dirty count in the 
kvm_run structure itself? It can save space and reduce the complexity of 
the implemenation by large margin.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults.
  2021-11-25  8:43           ` Shivam Kumar
@ 2021-12-01 17:22             ` Sean Christopherson
  0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2021-12-01 17:22 UTC (permalink / raw)
  To: Shivam Kumar; +Cc: pbonzini, kvm, Anurag Madnawat, Shaju Abraham, Manish Mishra

On Thu, Nov 25, 2021, Shivam Kumar wrote:
> 
> On 20/11/21 1:51 am, Shivam Kumar wrote:
> > 
> > On 20/11/21 1:36 am, Sean Christopherson wrote:
> > > Actually, if we go the route of using kvm_run to report and update the
> > > count/quota, we don't even need a capability.  Userspace can signal each
> > > vCPU to induce an exit to userspace, e.g. at the start of migration, then
> > > set the desired quota/count in vcpu->kvm_run and stuff exit_reason so
> > > that KVM updates the quota/count on the subsequent KVM_RUN.  No locking
> > > or requests needed, and userspace can reset the count at will, it just
> > > requires a signal.
> > > 
> > > It's a little weird to overload exit_reason like that, but if that's a
> > > sticking point we could add a flag in kvm_run somewhere.  Requiring an
> > > exit to userspace at the start of migration doesn't seem too onerous.
> >
> > Yes, this path looks flaw-free. We will explore the complexity and how
> > we can simplify its implementation.
>
> Is it okay to define the per-vcpu dirty quota and dirty count in the kvm_run
> structure itself? It can save space and reduce the complexity of the
> implemenation by large margin.

Paolo, I'm guessing this question is directed at you since I made the suggestion :-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-12-01 17:22 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-14 14:57 [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Shivam Kumar
2021-11-14 14:57 ` [PATCH 1/6] Define data structures for dirty quota migration Shivam Kumar
2021-11-14 14:57 ` [PATCH 2/6] Init dirty quota flag and allocate memory for vCPUdqctx Shivam Kumar
2021-11-14 14:57 ` [PATCH 3/6] Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults Shivam Kumar
2021-11-18 17:57   ` Sean Christopherson
2021-11-19 20:03     ` Shivam Kumar
     [not found]     ` <02b8fa86-a86b-969e-2137-1953639cb6d2@nutanix.com>
     [not found]       ` <YZgD0D4536s2DMem@google.com>
2021-11-19 20:21         ` Shivam Kumar
2021-11-25  8:43           ` Shivam Kumar
2021-12-01 17:22             ` Sean Christopherson
2021-11-14 14:57 ` [PATCH 4/6] Increment dirty counter for vmexit due to page write fault Shivam Kumar
2021-11-18 17:48   ` Sean Christopherson
2021-11-19 19:54     ` Shivam Kumar
2021-11-14 14:57 ` [PATCH 5/6] Exit to userspace when dirty quota is full Shivam Kumar
2021-11-14 14:57 ` [PATCH 6/6] Free vCPUdqctx memory on vCPU destroy Shivam Kumar
2021-11-18 17:46 ` [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).