All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] QEMU: Dirty quota-based throttling of vcpus
@ 2022-11-20 22:54 Shivam Kumar
  2022-11-20 22:54 ` [RFC PATCH 1/1] " Shivam Kumar
  2022-12-06  5:48 ` [RFC PATCH 0/1] QEMU: " Shivam Kumar
  0 siblings, 2 replies; 10+ messages in thread
From: Shivam Kumar @ 2022-11-20 22:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, peterx, david, quintela, dgilbert, kvm, Shivam Kumar

This patchset is the QEMU-side implementation of a (new) dirty "quota"
based throttling algorithm that selectively throttles vCPUs based on their
individual contribution to overall memory dirtying and also dynamically
adapts the throttle based on the available network bandwidth.

Overview
----------
----------

To throttle memory dirtying, we propose to set a limit on the number of
pages a vCPU can dirty in given fixed microscopic size time intervals. This
limit depends on the network throughput calculated over the last few
intervals so as to throttle the vCPUs based on available network bandwidth.
We are referring to this limit as the "dirty quota" of a vCPU and
the fixed size intervals as the "dirty quota intervals". 

One possible approach to distributing the overall scope of dirtying for a
dirty quota interval is to equally distribute it among all the vCPUs. This
approach to the distribution doesn't make sense if the distribution of
workloads among vCPUs is skewed. So, to counter such skewed cases, we
propose that if any vCPU doesn't need its quota for any given dirty
quota interval, we add this quota to a common pool. This common pool (or
"common quota") can be consumed on a first come first serve basis
by all vCPUs in the upcoming dirty quota intervals.


Design
----------
----------

Userspace                                 KVM

[At the start of dirty logging]
Initialize dirty quota to some            
non-zero value for each vcpu.    ----->   [When dirty logging starts]
                                          Start incrementing dirty count
                                          for every dirty by the vcpu.

                                          [Dirty count equals/exceeds
                                          dirty quota]
If the vcpu has already claimed  <-----   Exit to userspace.
its quota for the current dirty           
quota interval:

        1) If common quota is
        available, give the vcpu
        its quota from common pool.

        2) Else sleep the vcpu until
        the next interval starts.

Give the vcpu its share for the
current(fresh) dirty quota       ----->  Continue dirtying with the newly
interval.                                received quota.  

[At the end of dirty logging]             
Set dirty quota back to zero
for every vcpu.                 ----->   Throttling disabled.


References
----------
----------

KVM Forum Talk: https://www.youtube.com/watch?v=ZBkkJf78zFA
Kernel Patchset:
https://lore.kernel.org/all/20221113170507.208810-1-shivam.kumar1@nutanix.com/


Note
----------
----------

We understand that there is a good scope of improvement in the current
implementation. Here is a list of things we are working on:
1) Adding dirty quota as a migration capability so that it can be toggled
through QMP command.
2) Adding support for throttling guest DMAs.
3) Not enabling dirty quota for the first migration iteration.
4) Falling back to current auto-converge based throttling in cases where dirty
quota throttling can overthrottle.

Please stay tuned for the next patchset.

Shivam Kumar (1):
  Dirty quota-based throttling of vcpus

 accel/kvm/kvm-all.c       | 91 +++++++++++++++++++++++++++++++++++++++
 include/exec/memory.h     |  3 ++
 include/hw/core/cpu.h     |  5 +++
 include/sysemu/kvm_int.h  |  1 +
 linux-headers/linux/kvm.h |  9 ++++
 migration/migration.c     | 22 ++++++++++
 migration/migration.h     | 31 +++++++++++++
 softmmu/memory.c          | 64 +++++++++++++++++++++++++++
 8 files changed, 226 insertions(+)

-- 
2.22.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/1] Dirty quota-based throttling of vcpus
  2022-11-20 22:54 [RFC PATCH 0/1] QEMU: Dirty quota-based throttling of vcpus Shivam Kumar
@ 2022-11-20 22:54 ` Shivam Kumar
  2022-11-21 11:35   ` Philippe Mathieu-Daudé
  2022-12-06  5:48 ` [RFC PATCH 0/1] QEMU: " Shivam Kumar
  1 sibling, 1 reply; 10+ messages in thread
From: Shivam Kumar @ 2022-11-20 22:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, peterx, david, quintela, dgilbert, kvm, Shivam Kumar,
	Shaju Abraham, Manish Mishra, Anurag Madnawat

Introduces a (new) throttling scheme where QEMU defines a limit on the dirty
rate of each vcpu of the VM. This limit is enfored on the vcpus in small
intervals (dirty quota intervals) by allowing the vcpus to dirty only as many
pages in these intervals as to maintain a dirty rate below the set limit.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 accel/kvm/kvm-all.c       | 91 +++++++++++++++++++++++++++++++++++++++
 include/exec/memory.h     |  3 ++
 include/hw/core/cpu.h     |  5 +++
 include/sysemu/kvm_int.h  |  1 +
 linux-headers/linux/kvm.h |  9 ++++
 migration/migration.c     | 22 ++++++++++
 migration/migration.h     | 31 +++++++++++++
 softmmu/memory.c          | 64 +++++++++++++++++++++++++++
 8 files changed, 226 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f99b0becd8..ea50605592 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -46,6 +46,8 @@
 #include "sysemu/hw_accel.h"
 #include "kvm-cpus.h"
 #include "sysemu/dirtylimit.h"
+#include "hw/core/cpu.h"
+#include "migration/migration.h"
 
 #include "hw/boards.h"
 #include "monitor/stats.h"
@@ -2463,6 +2465,8 @@ static int kvm_init(MachineState *ms)
         }
     }
 
+    s->dirty_quota_supported = kvm_vm_check_extension(s, KVM_CAP_DIRTY_QUOTA);
+
     /*
      * KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is not needed when dirty ring is
      * enabled.  More importantly, KVM_DIRTY_LOG_INITIALLY_SET will assume no
@@ -2808,6 +2812,88 @@ static void kvm_eat_signals(CPUState *cpu)
     } while (sigismember(&chkset, SIG_IPI));
 }
 
+static void handle_dirty_quota_sleep(int64_t sleep_time)
+{
+    /* Do not throttle the vcpu more than the maximum throttle. */
+    sleep_time = MIN(sleep_time,
+                        DIRTY_QUOTA_MAX_THROTTLE * DIRTY_QUOTA_INTERVAL_SIZE);
+    /* Convert sleep time from nanoseconds to microseconds. */
+    g_usleep(sleep_time / 1000);
+}
+
+static uint64_t handle_dirty_quota_exhausted(
+                    CPUState *cpu, const uint64_t count, const uint64_t quota)
+{
+    MigrationState *s = migrate_get_current();
+    uint64_t time_to_sleep;
+    int64_t unclaimed_quota;
+    int64_t dirty_quota_overflow = (count - quota);
+    uint64_t dirty_rate_limit = qatomic_read(&s->per_vcpu_dirty_rate_limit);
+    uint64_t new_quota = (dirty_rate_limit * DIRTY_QUOTA_INTERVAL_SIZE) /
+                                                        NANOSECONDS_PER_SECOND;
+    uint64_t current_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+
+    /* Penalize the vCPU if it dirtied more pages than it was allowed to. */
+    if (dirty_quota_overflow > 0) {
+        time_to_sleep = (dirty_quota_overflow * NANOSECONDS_PER_SECOND) /
+                                                            dirty_rate_limit;
+        cpu->dirty_quota_expiry_time = current_time + time_to_sleep;
+        return time_to_sleep;
+    }
+
+    /*
+     * If the current dirty quota interval hasn't ended, try using common quota
+     * if it is available, else sleep.
+     */
+    current_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+    if (current_time < cpu->dirty_quota_expiry_time) {
+        qemu_spin_lock(&s->common_dirty_quota_lock);
+        if (s->common_dirty_quota > 0) {
+            s->common_dirty_quota -= new_quota;
+            qemu_spin_unlock(&s->common_dirty_quota_lock);
+            cpu->kvm_run->dirty_quota = count + new_quota;
+            return 0;
+        }
+
+        qemu_spin_unlock(&s->common_dirty_quota_lock);
+        current_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+        /* If common quota isn't available, sleep for the remaining interval. */
+        if (current_time < cpu->dirty_quota_expiry_time) {
+            time_to_sleep = cpu->dirty_quota_expiry_time - current_time;
+            return time_to_sleep;
+        }
+    }
+
+    /*
+     * This is a fresh dirty quota interval. If the vcpu has not claimed its
+     * quota for the previous intervals, add them to the common quota.
+     */
+    current_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+    unclaimed_quota = (current_time - cpu->dirty_quota_expiry_time) *
+                        dirty_rate_limit;
+    qemu_spin_lock(&s->common_dirty_quota_lock);
+    s->common_dirty_quota += unclaimed_quota;
+    qemu_spin_unlock(&s->common_dirty_quota_lock);
+
+    /*  Allocate the vcpu this new interval's dirty quota. */
+    cpu->kvm_run->dirty_quota = count + new_quota;
+    cpu->dirty_quota_expiry_time = current_time + DIRTY_QUOTA_INTERVAL_SIZE;
+    return 0;
+}
+
+
+static void handle_kvm_exit_dirty_quota_exhausted(CPUState *cpu,
+                                    const uint64_t count, const uint64_t quota)
+{
+    uint64_t time_to_sleep;
+    do {
+        time_to_sleep = handle_dirty_quota_exhausted(cpu, count, quota);
+        if (time_to_sleep > 0) {
+            handle_dirty_quota_sleep(time_to_sleep);
+        }
+    } while (time_to_sleep != 0);
+}
+
 int kvm_cpu_exec(CPUState *cpu)
 {
     struct kvm_run *run = cpu->kvm_run;
@@ -2943,6 +3029,11 @@ int kvm_cpu_exec(CPUState *cpu)
             dirtylimit_vcpu_execute(cpu);
             ret = 0;
             break;
+        case KVM_EXIT_DIRTY_QUOTA_EXHAUSTED:
+            handle_kvm_exit_dirty_quota_exhausted(cpu,
+                    run->dirty_quota_exit.count, run->dirty_quota_exit.quota);
+            ret = 0;
+            break;
         case KVM_EXIT_SYSTEM_EVENT:
             switch (run->system_event.type) {
             case KVM_SYSTEM_EVENT_SHUTDOWN:
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..becd0144a0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -3009,6 +3009,9 @@ bool ram_block_discard_is_disabled(void);
  */
 bool ram_block_discard_is_required(void);
 
+void dirty_quota_migration_start(void);
+void dirty_quota_migration_stop(void);
+
 #endif
 
 #endif
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 8830546121..7c5543849a 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -36,6 +36,9 @@
 typedef int (*WriteCoreDumpFunction)(const void *buf, size_t size,
                                      void *opaque);
 
+#define DIRTY_QUOTA_INTERVAL_SIZE 10000000
+#define DIRTY_QUOTA_MAX_THROTTLE .99
+
 /**
  * SECTION:cpu
  * @section_id: QEMU-cpu
@@ -443,6 +446,8 @@ struct CPUState {
 
     /* track IOMMUs whose translations we've cached in the TCG TLB */
     GArray *iommu_notifiers;
+
+    uint64_t dirty_quota_expiry_time;
 };
 
 typedef QTAILQ_HEAD(CPUTailQ, CPUState) CPUTailQ;
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 3b4adcdc10..51e3df18c7 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -110,6 +110,7 @@ struct KVMState
     struct KVMDirtyRingReaper reaper;
     NotifyVmexitOption notify_vmexit;
     uint32_t notify_window;
+    bool dirty_quota_supported; /* Whether KVM supports dirty quota or not */
 };
 
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index ebdafa576d..bc1d308afd 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -272,6 +272,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_SBI        35
 #define KVM_EXIT_RISCV_CSR        36
 #define KVM_EXIT_NOTIFY           37
+#define KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 38
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -508,6 +509,11 @@ struct kvm_run {
 #define KVM_NOTIFY_CONTEXT_INVALID	(1 << 0)
 			__u32 flags;
 		} notify;
+		/* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
+		struct {
+			__u64 count;
+			__u64 quota;
+		} dirty_quota_exit;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -529,6 +535,8 @@ struct kvm_run {
 		struct kvm_sync_regs regs;
 		char padding[SYNC_REGS_SIZE_BYTES];
 	} s;
+
+	__u64 dirty_quota;
 };
 
 /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
@@ -1175,6 +1183,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 220
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
+#define KVM_CAP_DIRTY_QUOTA 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/migration/migration.c b/migration/migration.c
index 739bb683f3..b94f636f08 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -61,6 +61,8 @@
 #include "sysemu/cpus.h"
 #include "yank_functions.h"
 #include "sysemu/qtest.h"
+#include "hw/core/cpu.h"
+#include "sysemu/kvm_int.h"
 
 #define MAX_THROTTLE  (128 << 20)      /* Migration transfer speed throttling */
 
@@ -3685,8 +3687,11 @@ static void migration_update_counters(MigrationState *s,
                                       int64_t current_time)
 {
     uint64_t transferred, transferred_pages, time_spent;
+    uint64_t pages_transferred_since_last_update, time_spent_since_last_update;
     uint64_t current_bytes; /* bytes transferred since the beginning */
     double bandwidth;
+    CPUState *cpu;
+    uint32_t nr_cpus;
 
     if (current_time < s->iteration_start_time + BUFFER_DELAY) {
         return;
@@ -3706,6 +3711,23 @@ static void migration_update_counters(MigrationState *s,
     s->pages_per_second = (double) transferred_pages /
                              (((double) time_spent / 1000.0));
 
+    if (kvm_state->dirty_quota_supported) {
+        CPU_FOREACH(cpu) {
+            nr_cpus++;
+        }
+        pages_transferred_since_last_update = transferred_pages -
+                                    s->last_counters_update.transferred_pages;
+        time_spent_since_last_update = time_spent -
+                                    s->last_counters_update.time_spent;
+        qatomic_set(&s->per_vcpu_dirty_rate_limit,
+            ((double) pages_transferred_since_last_update) /
+            (((double) time_spent_since_last_update) / 1000.0) /
+            ((double) nr_cpus));
+
+        s->last_counters_update.transferred_pages = transferred_pages;
+        s->last_counters_update.time_spent = time_spent;
+    }
+
     /*
      * if we haven't sent anything, we don't want to
      * recalculate. 10000 is a small enough number for our purposes
diff --git a/migration/migration.h b/migration/migration.h
index cdad8aceaa..66c680b81c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -249,6 +249,15 @@ struct MigrationState {
     uint64_t iteration_initial_bytes;
     /* time at the start of current iteration */
     int64_t iteration_start_time;
+
+    /* state related to last migration counters update */
+    struct {
+        /* time spent from the start of iteration till the last update */
+        uint64_t time_spent;
+        /* pages already sent in the current iteration till the last update */
+        uint64_t transferred_pages;
+    } last_counters_update;
+
     /*
      * The final stage happens when the remaining data is smaller than
      * this threshold; it's calculated from the requested downtime and
@@ -373,6 +382,28 @@ struct MigrationState {
      * This save hostname when out-going migration starts
      */
     char *hostname;
+
+    /*
+     * Dirty quota throttling tries to limit the dirty rate of the guest to some
+     * factor of network throughput. This factor is dirty_quota_throttle_ratio.
+     */
+    double dirty_quota_throttle_ratio;
+
+    /*
+     * For dirty quota throttling, this is the limit on the dirty rate of the
+     * vcpus. There maybe exceptions where this limit might be enforced losely
+     * to avoid overthrottling of the vcpus.
+     */
+    uint64_t per_vcpu_dirty_rate_limit;
+
+    /*
+     * If a vcpu doesn't claim its dirty quota for a given dirty quota interval,
+     * the unclaimed quota gets added to common quota.
+     * Common dirty quota can be claimed by any vcpu which has already used its
+     * individual dirty quota for the current dirty quota interval.
+     */
+    QemuSpin common_dirty_quota_lock;
+    int64_t common_dirty_quota;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/softmmu/memory.c b/softmmu/memory.c
index bc0be3f62c..8f725a9b89 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -12,6 +12,7 @@
  * Contributions after 2012-01-13 are licensed under the terms of the
  * GNU GPL, version 2 or (at your option) any later version.
  */
+#include <linux/kvm.h>
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
@@ -34,6 +35,10 @@
 #include "hw/boards.h"
 #include "migration/vmstate.h"
 #include "exec/address-spaces.h"
+#include "hw/core/cpu.h"
+#include "exec/target_page.h"
+#include "migration/migration.h"
+#include "sysemu/kvm_int.h"
 
 //#define DEBUG_UNASSIGNED
 
@@ -2869,6 +2874,46 @@ static unsigned int postponed_stop_flags;
 static VMChangeStateEntry *vmstate_change;
 static void memory_global_dirty_log_stop_postponed_run(void);
 
+static void init_vcpu_dirty_quota(CPUState *cpu, run_on_cpu_data arg)
+{
+    uint64_t current_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+    cpu->kvm_run->dirty_quota = 1;
+    cpu->dirty_quota_expiry_time = current_time;
+}
+
+void dirty_quota_migration_start(void)
+{
+    if (!kvm_state->dirty_quota_supported) {
+        return;
+    }
+
+    MigrationState *s = migrate_get_current();
+    /* Assuming initial bandwidth to be 128 MBps. */
+    double pages_per_second = (((double) 1e9) / 8.0) /
+                                    (double) qemu_target_page_size();
+    uint32_t nr_cpus;
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        nr_cpus++;
+    }
+    /*
+     * Currently we are hardcoding this to 2. There are plans to allow the user
+     * to manually select this ratio.
+     */
+    s->dirty_quota_throttle_ratio = 2;
+    qatomic_set(&s->per_vcpu_dirty_rate_limit,
+                pages_per_second / s->dirty_quota_throttle_ratio / nr_cpus);
+
+    qemu_spin_lock(&s->common_dirty_quota_lock);
+    s->common_dirty_quota = 0;
+    qemu_spin_unlock(&s->common_dirty_quota_lock);
+
+    CPU_FOREACH(cpu) {
+        run_on_cpu(cpu, init_vcpu_dirty_quota, RUN_ON_CPU_NULL);
+    }
+}
+
 void memory_global_dirty_log_start(unsigned int flags)
 {
     unsigned int old_flags;
@@ -2891,6 +2936,7 @@ void memory_global_dirty_log_start(unsigned int flags)
     trace_global_dirty_changed(global_dirty_tracking);
 
     if (!old_flags) {
+        dirty_quota_migration_start();
         MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward);
         memory_region_transaction_begin();
         memory_region_update_pending = true;
@@ -2898,6 +2944,23 @@ void memory_global_dirty_log_start(unsigned int flags)
     }
 }
 
+static void reset_vcpu_dirty_quota(CPUState *cpu, run_on_cpu_data arg)
+{
+    cpu->kvm_run->dirty_quota = 0;
+}
+
+void dirty_quota_migration_stop(void)
+{
+    if (!kvm_state->dirty_quota_supported) {
+        return;
+    }
+
+    CPUState *cpu;
+    CPU_FOREACH(cpu) {
+        run_on_cpu(cpu, reset_vcpu_dirty_quota, RUN_ON_CPU_NULL);
+    }
+}
+
 static void memory_global_dirty_log_do_stop(unsigned int flags)
 {
     assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
@@ -2907,6 +2970,7 @@ static void memory_global_dirty_log_do_stop(unsigned int flags)
     trace_global_dirty_changed(global_dirty_tracking);
 
     if (!global_dirty_tracking) {
+        dirty_quota_migration_stop();
         memory_region_transaction_begin();
         memory_region_update_pending = true;
         memory_region_transaction_commit();
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/1] Dirty quota-based throttling of vcpus
  2022-11-20 22:54 ` [RFC PATCH 1/1] " Shivam Kumar
@ 2022-11-21 11:35   ` Philippe Mathieu-Daudé
  2022-11-22  4:00     ` Shivam Kumar
  0 siblings, 1 reply; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-11-21 11:35 UTC (permalink / raw)
  To: Shivam Kumar, qemu-devel
  Cc: pbonzini, peterx, david, quintela, dgilbert, kvm, Shaju Abraham,
	Manish Mishra, Anurag Madnawat

Hi,

On 20/11/22 23:54, Shivam Kumar wrote:
> Introduces a (new) throttling scheme where QEMU defines a limit on the dirty
> rate of each vcpu of the VM. This limit is enfored on the vcpus in small
> intervals (dirty quota intervals) by allowing the vcpus to dirty only as many
> pages in these intervals as to maintain a dirty rate below the set limit.
> 
> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> ---
>   accel/kvm/kvm-all.c       | 91 +++++++++++++++++++++++++++++++++++++++
>   include/exec/memory.h     |  3 ++
>   include/hw/core/cpu.h     |  5 +++
>   include/sysemu/kvm_int.h  |  1 +
>   linux-headers/linux/kvm.h |  9 ++++
>   migration/migration.c     | 22 ++++++++++
>   migration/migration.h     | 31 +++++++++++++
>   softmmu/memory.c          | 64 +++++++++++++++++++++++++++
>   8 files changed, 226 insertions(+)


>   void migrate_set_state(int *state, int old_state, int new_state);
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index bc0be3f62c..8f725a9b89 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -12,6 +12,7 @@
>    * Contributions after 2012-01-13 are licensed under the terms of the
>    * GNU GPL, version 2 or (at your option) any later version.
>    */
> +#include <linux/kvm.h>
>   
>   #include "qemu/osdep.h"
>   #include "qemu/log.h"
> @@ -34,6 +35,10 @@
>   #include "hw/boards.h"
>   #include "migration/vmstate.h"
>   #include "exec/address-spaces.h"
> +#include "hw/core/cpu.h"
> +#include "exec/target_page.h"
> +#include "migration/migration.h"
> +#include "sysemu/kvm_int.h"
>   
>   //#define DEBUG_UNASSIGNED
>   
> @@ -2869,6 +2874,46 @@ static unsigned int postponed_stop_flags;
>   static VMChangeStateEntry *vmstate_change;
>   static void memory_global_dirty_log_stop_postponed_run(void);
>   
> +static void init_vcpu_dirty_quota(CPUState *cpu, run_on_cpu_data arg)
> +{
> +    uint64_t current_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> +    cpu->kvm_run->dirty_quota = 1;
> +    cpu->dirty_quota_expiry_time = current_time;
> +}
> +
> +void dirty_quota_migration_start(void)
> +{
> +    if (!kvm_state->dirty_quota_supported) {

You are accessing an accelerator-specific variable in an 
accelerator-agnostic file, this doesn't sound correct.

You might introduce some hooks in AccelClass and implement them in
accel/kvm/. See for example gdbstub_supported_sstep_flags() and
kvm_gdbstub_sstep_flags().

> +        return;
> +    }
> +
> +    MigrationState *s = migrate_get_current();
> +    /* Assuming initial bandwidth to be 128 MBps. */
> +    double pages_per_second = (((double) 1e9) / 8.0) /
> +                                    (double) qemu_target_page_size();
> +    uint32_t nr_cpus;
> +    CPUState *cpu;
> +
> +    CPU_FOREACH(cpu) {
> +        nr_cpus++;
> +    }
> +    /*
> +     * Currently we are hardcoding this to 2. There are plans to allow the user
> +     * to manually select this ratio.
> +     */
> +    s->dirty_quota_throttle_ratio = 2;
> +    qatomic_set(&s->per_vcpu_dirty_rate_limit,
> +                pages_per_second / s->dirty_quota_throttle_ratio / nr_cpus);
> +
> +    qemu_spin_lock(&s->common_dirty_quota_lock);
> +    s->common_dirty_quota = 0;
> +    qemu_spin_unlock(&s->common_dirty_quota_lock);
> +
> +    CPU_FOREACH(cpu) {
> +        run_on_cpu(cpu, init_vcpu_dirty_quota, RUN_ON_CPU_NULL);
> +    }
> +}
> +
>   void memory_global_dirty_log_start(unsigned int flags)
>   {
>       unsigned int old_flags;
> @@ -2891,6 +2936,7 @@ void memory_global_dirty_log_start(unsigned int flags)
>       trace_global_dirty_changed(global_dirty_tracking);
>   
>       if (!old_flags) {
> +        dirty_quota_migration_start();
>           MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward);
>           memory_region_transaction_begin();
>           memory_region_update_pending = true;
> @@ -2898,6 +2944,23 @@ void memory_global_dirty_log_start(unsigned int flags)
>       }
>   }
>   
> +static void reset_vcpu_dirty_quota(CPUState *cpu, run_on_cpu_data arg)
> +{
> +    cpu->kvm_run->dirty_quota = 0;
> +}
> +
> +void dirty_quota_migration_stop(void)
> +{
> +    if (!kvm_state->dirty_quota_supported) {
> +        return;
> +    }
> +
> +    CPUState *cpu;
> +    CPU_FOREACH(cpu) {
> +        run_on_cpu(cpu, reset_vcpu_dirty_quota, RUN_ON_CPU_NULL);
> +    }
> +}
> +
>   static void memory_global_dirty_log_do_stop(unsigned int flags)
>   {
>       assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
> @@ -2907,6 +2970,7 @@ static void memory_global_dirty_log_do_stop(unsigned int flags)
>       trace_global_dirty_changed(global_dirty_tracking);
>   
>       if (!global_dirty_tracking) {
> +        dirty_quota_migration_stop();
>           memory_region_transaction_begin();
>           memory_region_update_pending = true;
>           memory_region_transaction_commit();


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/1] Dirty quota-based throttling of vcpus
  2022-11-21 11:35   ` Philippe Mathieu-Daudé
@ 2022-11-22  4:00     ` Shivam Kumar
  2023-02-13  9:17       ` Shivam Kumar
  0 siblings, 1 reply; 10+ messages in thread
From: Shivam Kumar @ 2022-11-22  4:00 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: pbonzini, peterx, david, quintela, dgilbert, kvm, Shaju Abraham,
	Manish Mishra, Anurag Madnawat



On 21/11/22 5:05 pm, Philippe Mathieu-Daudé wrote:
> Hi,
> 
> On 20/11/22 23:54, Shivam Kumar wrote:
>> +
>> +void dirty_quota_migration_start(void)
>> +{
>> +    if (!kvm_state->dirty_quota_supported) {
> 
> You are accessing an accelerator-specific variable in an 
> accelerator-agnostic file, this doesn't sound correct.
> 
> You might introduce some hooks in AccelClass and implement them in
> accel/kvm/. See for example gdbstub_supported_sstep_flags() and
> kvm_gdbstub_sstep_flags().
>
Ack.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/1] QEMU: Dirty quota-based throttling of vcpus
  2022-11-20 22:54 [RFC PATCH 0/1] QEMU: Dirty quota-based throttling of vcpus Shivam Kumar
  2022-11-20 22:54 ` [RFC PATCH 1/1] " Shivam Kumar
@ 2022-12-06  5:48 ` Shivam Kumar
  2022-12-06 16:00   ` Peter Xu
  1 sibling, 1 reply; 10+ messages in thread
From: Shivam Kumar @ 2022-12-06  5:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, peterx, david, quintela, dgilbert, kvm, Juan Quintela



On 21/11/22 4:24 am, Shivam Kumar wrote:
> This patchset is the QEMU-side implementation of a (new) dirty "quota"
> based throttling algorithm that selectively throttles vCPUs based on their
> individual contribution to overall memory dirtying and also dynamically
> adapts the throttle based on the available network bandwidth.
> 
> Overview
> ----------
> ----------
> 
> To throttle memory dirtying, we propose to set a limit on the number of
> pages a vCPU can dirty in given fixed microscopic size time intervals. This
> limit depends on the network throughput calculated over the last few
> intervals so as to throttle the vCPUs based on available network bandwidth.
> We are referring to this limit as the "dirty quota" of a vCPU and
> the fixed size intervals as the "dirty quota intervals".
> 
> One possible approach to distributing the overall scope of dirtying for a
> dirty quota interval is to equally distribute it among all the vCPUs. This
> approach to the distribution doesn't make sense if the distribution of
> workloads among vCPUs is skewed. So, to counter such skewed cases, we
> propose that if any vCPU doesn't need its quota for any given dirty
> quota interval, we add this quota to a common pool. This common pool (or
> "common quota") can be consumed on a first come first serve basis
> by all vCPUs in the upcoming dirty quota intervals.
> 
> 
> Design
> ----------
> ----------
> 
> Userspace                                 KVM
> 
> [At the start of dirty logging]
> Initialize dirty quota to some
> non-zero value for each vcpu.    ----->   [When dirty logging starts]
>                                            Start incrementing dirty count
>                                            for every dirty by the vcpu.
> 
>                                            [Dirty count equals/exceeds
>                                            dirty quota]
> If the vcpu has already claimed  <-----   Exit to userspace.
> its quota for the current dirty
> quota interval:
> 
>          1) If common quota is
>          available, give the vcpu
>          its quota from common pool.
> 
>          2) Else sleep the vcpu until
>          the next interval starts.
> 
> Give the vcpu its share for the
> current(fresh) dirty quota       ----->  Continue dirtying with the newly
> interval.                                received quota.
> 
> [At the end of dirty logging]
> Set dirty quota back to zero
> for every vcpu.                 ----->   Throttling disabled.
> 
> 
> References
> ----------
> ----------
> 
> KVM Forum Talk: https://www.youtube.com/watch?v=ZBkkJf78zFA
> Kernel Patchset:
> https://lore.kernel.org/all/20221113170507.208810-1-shivam.kumar1@nutanix.com/
> 
> 
> Note
> ----------
> ----------
> 
> We understand that there is a good scope of improvement in the current
> implementation. Here is a list of things we are working on:
> 1) Adding dirty quota as a migration capability so that it can be toggled
> through QMP command.
> 2) Adding support for throttling guest DMAs.
> 3) Not enabling dirty quota for the first migration iteration.
> 4) Falling back to current auto-converge based throttling in cases where dirty
> quota throttling can overthrottle.
> 
> Please stay tuned for the next patchset.
> 
> Shivam Kumar (1):
>    Dirty quota-based throttling of vcpus
> 
>   accel/kvm/kvm-all.c       | 91 +++++++++++++++++++++++++++++++++++++++
>   include/exec/memory.h     |  3 ++
>   include/hw/core/cpu.h     |  5 +++
>   include/sysemu/kvm_int.h  |  1 +
>   linux-headers/linux/kvm.h |  9 ++++
>   migration/migration.c     | 22 ++++++++++
>   migration/migration.h     | 31 +++++++++++++
>   softmmu/memory.c          | 64 +++++++++++++++++++++++++++
>   8 files changed, 226 insertions(+)
> 

It'd be great if I could get some more feedback before I send v2. Thanks.

CC: Peter Xu, Juan Quintela

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/1] QEMU: Dirty quota-based throttling of vcpus
  2022-12-06  5:48 ` [RFC PATCH 0/1] QEMU: " Shivam Kumar
@ 2022-12-06 16:00   ` Peter Xu
  2022-12-06 17:29     ` Hyman Huang
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2022-12-06 16:00 UTC (permalink / raw)
  To: Shivam Kumar
  Cc: qemu-devel, pbonzini, david, quintela, dgilbert, kvm, Yong Huang

Hi, Shivam,

On Tue, Dec 06, 2022 at 11:18:52AM +0530, Shivam Kumar wrote:

[...]

> > Note
> > ----------
> > ----------
> > 
> > We understand that there is a good scope of improvement in the current
> > implementation. Here is a list of things we are working on:
> > 1) Adding dirty quota as a migration capability so that it can be toggled
> > through QMP command.
> > 2) Adding support for throttling guest DMAs.
> > 3) Not enabling dirty quota for the first migration iteration.

Agreed.

> > 4) Falling back to current auto-converge based throttling in cases where dirty
> > quota throttling can overthrottle.

If overthrottle happens, would auto-converge always be better?

> > 
> > Please stay tuned for the next patchset.
> > 
> > Shivam Kumar (1):
> >    Dirty quota-based throttling of vcpus
> > 
> >   accel/kvm/kvm-all.c       | 91 +++++++++++++++++++++++++++++++++++++++
> >   include/exec/memory.h     |  3 ++
> >   include/hw/core/cpu.h     |  5 +++
> >   include/sysemu/kvm_int.h  |  1 +
> >   linux-headers/linux/kvm.h |  9 ++++
> >   migration/migration.c     | 22 ++++++++++
> >   migration/migration.h     | 31 +++++++++++++
> >   softmmu/memory.c          | 64 +++++++++++++++++++++++++++
> >   8 files changed, 226 insertions(+)
> > 
> 
> It'd be great if I could get some more feedback before I send v2. Thanks.

Sorry to respond late.

What's the status of the kernel patchset?

From high level the approach looks good at least to me.  It's just that (as
I used to mention) we have two similar approaches now on throttling the
guest for precopy.  I'm not sure what's the best way to move forward if
without doing a comparison of the two.

https://lore.kernel.org/all/cover.1669047366.git.huangy81@chinatelecom.cn/

Sorry to say so, and no intention to create a contention, but merging the
two without any thought will definitely confuse everybody.  We need to
figure out a way.

From what I can tell..

One way is we choose one of them which will be superior to the other and
all of us stick with it (for either higher possibility of migrate, less
interference to the workloads, and so on).

The other way is we take both, when each of them may be suitable for
different scenarios.  However in this latter case, we'd better at least be
aware of the differences (which suites what), then that'll be part of
documentation we need for each of the features when the user wants to use
them.

Add Yong into the loop.

Any thoughts?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/1] QEMU: Dirty quota-based throttling of vcpus
  2022-12-06 16:00   ` Peter Xu
@ 2022-12-06 17:29     ` Hyman Huang
  2022-12-18 19:12       ` Shivam Kumar
  0 siblings, 1 reply; 10+ messages in thread
From: Hyman Huang @ 2022-12-06 17:29 UTC (permalink / raw)
  To: Peter Xu, Shivam Kumar
  Cc: qemu-devel, pbonzini, david, quintela, dgilbert, kvm



在 2022/12/7 0:00, Peter Xu 写道:
> Hi, Shivam,
> 
> On Tue, Dec 06, 2022 at 11:18:52AM +0530, Shivam Kumar wrote:
> 
> [...]
> 
>>> Note
>>> ----------
>>> ----------
>>>
>>> We understand that there is a good scope of improvement in the current
>>> implementation. Here is a list of things we are working on:
>>> 1) Adding dirty quota as a migration capability so that it can be toggled
>>> through QMP command.
>>> 2) Adding support for throttling guest DMAs.
>>> 3) Not enabling dirty quota for the first migration iteration.
> 
> Agreed.
> 
>>> 4) Falling back to current auto-converge based throttling in cases where dirty
>>> quota throttling can overthrottle.
> 
> If overthrottle happens, would auto-converge always be better?
> 
>>>
>>> Please stay tuned for the next patchset.
>>>
>>> Shivam Kumar (1):
>>>     Dirty quota-based throttling of vcpus
>>>
>>>    accel/kvm/kvm-all.c       | 91 +++++++++++++++++++++++++++++++++++++++
>>>    include/exec/memory.h     |  3 ++
>>>    include/hw/core/cpu.h     |  5 +++
>>>    include/sysemu/kvm_int.h  |  1 +
>>>    linux-headers/linux/kvm.h |  9 ++++
>>>    migration/migration.c     | 22 ++++++++++
>>>    migration/migration.h     | 31 +++++++++++++
>>>    softmmu/memory.c          | 64 +++++++++++++++++++++++++++
>>>    8 files changed, 226 insertions(+)
>>>
>>
>> It'd be great if I could get some more feedback before I send v2. Thanks.
> 
> Sorry to respond late.
> 
> What's the status of the kernel patchset?
> 
>  From high level the approach looks good at least to me.  It's just that (as
> I used to mention) we have two similar approaches now on throttling the
> guest for precopy.  I'm not sure what's the best way to move forward if
> without doing a comparison of the two.
> 
> https://lore.kernel.org/all/cover.1669047366.git.huangy81@chinatelecom.cn/
> 
> Sorry to say so, and no intention to create a contention, but merging the
> two without any thought will definitely confuse everybody.  We need to
> figure out a way.
> 
>  From what I can tell..
> 
> One way is we choose one of them which will be superior to the other and
> all of us stick with it (for either higher possibility of migrate, less
> interference to the workloads, and so on).
> 
> The other way is we take both, when each of them may be suitable for
> different scenarios.  However in this latter case, we'd better at least be
> aware of the differences (which suites what), then that'll be part of
> documentation we need for each of the features when the user wants to use
> them.
> 
> Add Yong into the loop.
> 
> Any thoughts?
> 
This is quite different from "dirtylimit capability of migration". IMHO, 
quota-based implementation seems a little complicated, because it 
depends on correctness of dirty quota and the measured data, which 
involves the patchset both in qemu and kernel. It seems that dirtylimit 
and quota-based are not mutually exclusive, at least we can figure out
which suites what firstly depending on the test results as Peter said.

-- 
Best regard

Hyman Huang(黄勇)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/1] QEMU: Dirty quota-based throttling of vcpus
  2022-12-06 17:29     ` Hyman Huang
@ 2022-12-18 19:12       ` Shivam Kumar
  2022-12-19 14:19         ` Hyman Huang
  0 siblings, 1 reply; 10+ messages in thread
From: Shivam Kumar @ 2022-12-18 19:12 UTC (permalink / raw)
  To: Hyman Huang, Peter Xu
  Cc: qemu-devel, pbonzini, david, quintela, dgilbert, kvm



On 06/12/22 10:59 pm, Hyman Huang wrote:
> 
> 
> 在 2022/12/7 0:00, Peter Xu 写道:
>> Hi, Shivam,
>>
>> On Tue, Dec 06, 2022 at 11:18:52AM +0530, Shivam Kumar wrote:
>>
>> [...]
>>
>>>> Note
>>>> ----------
>>>> ----------
>>>>
>>>> We understand that there is a good scope of improvement in the current
>>>> implementation. Here is a list of things we are working on:
>>>> 1) Adding dirty quota as a migration capability so that it can be 
>>>> toggled
>>>> through QMP command.
>>>> 2) Adding support for throttling guest DMAs.
>>>> 3) Not enabling dirty quota for the first migration iteration.
>>
>> Agreed.
>>
>>>> 4) Falling back to current auto-converge based throttling in cases 
>>>> where dirty
>>>> quota throttling can overthrottle.
>>
>> If overthrottle happens, would auto-converge always be better?
>>
>>>>
>>>> Please stay tuned for the next patchset.
>>>>
>>>> Shivam Kumar (1):
>>>>     Dirty quota-based throttling of vcpus
>>>>
>>>>    accel/kvm/kvm-all.c       | 91 
>>>> +++++++++++++++++++++++++++++++++++++++
>>>>    include/exec/memory.h     |  3 ++
>>>>    include/hw/core/cpu.h     |  5 +++
>>>>    include/sysemu/kvm_int.h  |  1 +
>>>>    linux-headers/linux/kvm.h |  9 ++++
>>>>    migration/migration.c     | 22 ++++++++++
>>>>    migration/migration.h     | 31 +++++++++++++
>>>>    softmmu/memory.c          | 64 +++++++++++++++++++++++++++
>>>>    8 files changed, 226 insertions(+)
>>>>
>>>
>>> It'd be great if I could get some more feedback before I send v2. 
>>> Thanks.
>>
>> Sorry to respond late.
>>
>> What's the status of the kernel patchset?
>>
>>  From high level the approach looks good at least to me.  It's just 
>> that (as
>> I used to mention) we have two similar approaches now on throttling the
>> guest for precopy.  I'm not sure what's the best way to move forward if
>> without doing a comparison of the two.
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_cover.1669047366.git.huangy81-40chinatelecom.cn_&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=CuJmsk4azThm0mAIiykxHz3F9q4xRCxjXeS5Q00YL6-FSnPF_BKSyW1y8LIiXqRA&s=QjAcViWNO5THFQvljhrWbDX30yTipTb7KEKWKkc2kDU&e=
>> Sorry to say so, and no intention to create a contention, but merging the
>> two without any thought will definitely confuse everybody.  We need to
>> figure out a way.
>>
>>  From what I can tell..
>>
>> One way is we choose one of them which will be superior to the other and
>> all of us stick with it (for either higher possibility of migrate, less
>> interference to the workloads, and so on).
>>
>> The other way is we take both, when each of them may be suitable for
>> different scenarios.  However in this latter case, we'd better at 
>> least be
>> aware of the differences (which suites what), then that'll be part of
>> documentation we need for each of the features when the user wants to use
>> them.
>>
>> Add Yong into the loop.
>>
>> Any thoughts?
>>
> This is quite different from "dirtylimit capability of migration". IMHO, 
> quota-based implementation seems a little complicated, because it 
> depends on correctness of dirty quota and the measured data, which 
> involves the patchset both in qemu and kernel. It seems that dirtylimit 
> and quota-based are not mutually exclusive, at least we can figure out
> which suites what firstly depending on the test results as Peter said.
> 
Thank you for sharing the link to this alternate approach towards 
throttling - "dirtylimit capability of migration". I am sharing key 
points from my understanding and some questions below:

1) The alternate approach is exclusively for the dirty ring interface. 
The dirty quota approach is orthogonal to the dirty logging interface. 
It works both with the dirty ring and the dirty bitmap interface.

2) Can we achieve micro-stunning with the alternate approach? Can we say 
with good confidence that for most of the time, we stun the vcpu only 
when it is dirtying the memory? Last time when I checked, dirty ring 
size could be a multiple of 512 which makes it difficult to stun the 
vcpu in microscopic intervals.

3) Also, are we relying on the system administrator to select a limit on 
the dirty rate for "dirtylimit capability of migration"?

4) Also, does "dirtylimit capability of migration" play with the dirty 
ring size in a way that it uses a larger ring size for higher dirty rate 
limits and smaller ring size for smaller dirty rate limits? I think the 
dirty rate limit is good information to choose a good-enough dirty ring 
size.


Thanks,
Shivam

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/1] QEMU: Dirty quota-based throttling of vcpus
  2022-12-18 19:12       ` Shivam Kumar
@ 2022-12-19 14:19         ` Hyman Huang
  0 siblings, 0 replies; 10+ messages in thread
From: Hyman Huang @ 2022-12-19 14:19 UTC (permalink / raw)
  To: Shivam Kumar, Peter Xu
  Cc: qemu-devel, pbonzini, david, quintela, dgilbert, kvm



在 2022/12/19 3:12, Shivam Kumar 写道:
> 
> 
> On 06/12/22 10:59 pm, Hyman Huang wrote:
>>
>>
>> 在 2022/12/7 0:00, Peter Xu 写道:
>>> Hi, Shivam,
>>>
>>> On Tue, Dec 06, 2022 at 11:18:52AM +0530, Shivam Kumar wrote:
>>>
>>> [...]
>>>
>>>>> Note
>>>>> ----------
>>>>> ----------
>>>>>
>>>>> We understand that there is a good scope of improvement in the current
>>>>> implementation. Here is a list of things we are working on:
>>>>> 1) Adding dirty quota as a migration capability so that it can be 
>>>>> toggled
>>>>> through QMP command.
>>>>> 2) Adding support for throttling guest DMAs.
>>>>> 3) Not enabling dirty quota for the first migration iteration.
>>>
>>> Agreed.
>>>
>>>>> 4) Falling back to current auto-converge based throttling in cases 
>>>>> where dirty
>>>>> quota throttling can overthrottle.
>>>
>>> If overthrottle happens, would auto-converge always be better?
>>>
>>>>>
>>>>> Please stay tuned for the next patchset.
>>>>>
>>>>> Shivam Kumar (1):
>>>>>     Dirty quota-based throttling of vcpus
>>>>>
>>>>>    accel/kvm/kvm-all.c       | 91 
>>>>> +++++++++++++++++++++++++++++++++++++++
>>>>>    include/exec/memory.h     |  3 ++
>>>>>    include/hw/core/cpu.h     |  5 +++
>>>>>    include/sysemu/kvm_int.h  |  1 +
>>>>>    linux-headers/linux/kvm.h |  9 ++++
>>>>>    migration/migration.c     | 22 ++++++++++
>>>>>    migration/migration.h     | 31 +++++++++++++
>>>>>    softmmu/memory.c          | 64 +++++++++++++++++++++++++++
>>>>>    8 files changed, 226 insertions(+)
>>>>>
>>>>
>>>> It'd be great if I could get some more feedback before I send v2. 
>>>> Thanks.
>>>
>>> Sorry to respond late.
>>>
>>> What's the status of the kernel patchset?
>>>
>>>  From high level the approach looks good at least to me.  It's just 
>>> that (as
>>> I used to mention) we have two similar approaches now on throttling the
>>> guest for precopy.  I'm not sure what's the best way to move forward if
>>> without doing a comparison of the two.
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_cover.1669047366.git.huangy81-40chinatelecom.cn_&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=CuJmsk4azThm0mAIiykxHz3F9q4xRCxjXeS5Q00YL6-FSnPF_BKSyW1y8LIiXqRA&s=QjAcViWNO5THFQvljhrWbDX30yTipTb7KEKWKkc2kDU&e=
>>> Sorry to say so, and no intention to create a contention, but merging 
>>> the
>>> two without any thought will definitely confuse everybody.  We need to
>>> figure out a way.
>>>
>>>  From what I can tell..
>>>
>>> One way is we choose one of them which will be superior to the other and
>>> all of us stick with it (for either higher possibility of migrate, less
>>> interference to the workloads, and so on).
>>>
>>> The other way is we take both, when each of them may be suitable for
>>> different scenarios.  However in this latter case, we'd better at 
>>> least be
>>> aware of the differences (which suites what), then that'll be part of
>>> documentation we need for each of the features when the user wants to 
>>> use
>>> them.
>>>
>>> Add Yong into the loop.
>>>
>>> Any thoughts?
>>>
>> This is quite different from "dirtylimit capability of migration". 
>> IMHO, quota-based implementation seems a little complicated, because 
>> it depends on correctness of dirty quota and the measured data, which 
>> involves the patchset both in qemu and kernel. It seems that 
>> dirtylimit and quota-based are not mutually exclusive, at least we can 
>> figure out
>> which suites what firstly depending on the test results as Peter said.
>>
> Thank you for sharing the link to this alternate approach towards 
> throttling - "dirtylimit capability of migration". I am sharing key 
> points from my understanding and some questions below:
> 
> 1) The alternate approach is exclusively for the dirty ring interface. 
> The dirty quota approach is orthogonal to the dirty logging interface. 
> It works both with the dirty ring and the dirty bitmap interface.
> 
> 2) Can we achieve micro-stunning with the alternate approach? Can we say 
> with good confidence that for most of the time, we stun the vcpu only 
> when it is dirtying the memory? Last time when I checked, dirty ring 
> size could be a multiple of 512 which makes it difficult to stun the 
> vcpu in microscopic intervals.
Actually, we implement dirtylimit in two phase, dirtylimit hmp/qmp 
command and dirtylimit migration, dirtylimit hmp/qmp focus on the 
so-called "micro-stunning" only, dirtylimit migration just reusing 
mainly the existing functions qmp_set_vcpu_dirty_limit.
As for the question:

Can we achieve micro-stunning with the alternate approach?
Yes, since migration iterate in milliseconds for most scenarios, and 
dirtylimit can satisfy it with appropriate ring size, from the test 
result, setting dirty ring size as 2048 is enough,(which can make
migration convergent, but auto-converge can not, in the same condition)

we stun the vcpu only when it is dirtying the memory?
Yes. dirty-ring in kvm is basing on intel PML, which can ensure this.
> 
> 3) Also, are we relying on the system administrator to select a limit on 
> the dirty rate for "dirtylimit capability of migration"?
> 
Not really, this is optional, dirtylimit migration set the 1MB/s as the
default dirtylimit minimum value, which make the value decreasing step 
by step and try the best to make migration convergent, in this way, 
dirtylimit can stop in time once migration reach the convergence criterion.
> 4) Also, does "dirtylimit capability of migration" play with the dirty 
> ring size in a way that it uses a larger ring size for higher dirty rate 
> limits and smaller ring size for smaller dirty rate limits? I think the 
> dirty rate limit is good information to choose a good-enough dirty ring 
> size.
> 
Actually we productize dirtylimit migration in a different way, we set 
the dirty ring size as 2048 by default once enabled, which ensure the 
"micro-stunning" in migraion phase and prove to be a good choice from 
the test result. We think dirty-ring size is "good-enough" if migration 
performance(migration total time, cpu/memory performance in vm during 
live migration) improved hugely.

As for dirtylimit itself, we leave the choice to upper apps, we assume 
that if upper apps are professional to decide the dirtylimit value,dirty 
ring size will also be considered.

Thanks.
> 
> Thanks,
> Shivam

-- 
Best regard

Hyman Huang(黄勇)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/1] Dirty quota-based throttling of vcpus
  2022-11-22  4:00     ` Shivam Kumar
@ 2023-02-13  9:17       ` Shivam Kumar
  0 siblings, 0 replies; 10+ messages in thread
From: Shivam Kumar @ 2023-02-13  9:17 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: pbonzini, peterx, david, quintela, dgilbert, kvm, Shaju Abraham,
	Manish Mishra, Anurag Madnawat



On 22/11/22 9:30 am, Shivam Kumar wrote:
> 
> 
> On 21/11/22 5:05 pm, Philippe Mathieu-Daudé wrote:
>> Hi,
>>
>> On 20/11/22 23:54, Shivam Kumar wrote:
>>> +
>>> +void dirty_quota_migration_start(void)
>>> +{
>>> +    if (!kvm_state->dirty_quota_supported) {
>>
>> You are accessing an accelerator-specific variable in an 
>> accelerator-agnostic file, this doesn't sound correct.
>>
>> You might introduce some hooks in AccelClass and implement them in
>> accel/kvm/. See for example gdbstub_supported_sstep_flags() and
>> kvm_gdbstub_sstep_flags().
>>
> Ack.
> 
> Thanks,
> Shivam

Hi Philippe,

I had received a suggestion on the kernel-side patchset to make dirty 
quota a more generic feature and not limit its use to live migration. 
Incorporating this ask might lead to a significant change in the dirty 
quota interface. So, I haven't been able to post the next version of the 
QEMU patchset. I intend to post it once the new proposition looks good 
to the KVM reviewers.

Thanks,
Shivam

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-02-13  9:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-20 22:54 [RFC PATCH 0/1] QEMU: Dirty quota-based throttling of vcpus Shivam Kumar
2022-11-20 22:54 ` [RFC PATCH 1/1] " Shivam Kumar
2022-11-21 11:35   ` Philippe Mathieu-Daudé
2022-11-22  4:00     ` Shivam Kumar
2023-02-13  9:17       ` Shivam Kumar
2022-12-06  5:48 ` [RFC PATCH 0/1] QEMU: " Shivam Kumar
2022-12-06 16:00   ` Peter Xu
2022-12-06 17:29     ` Hyman Huang
2022-12-18 19:12       ` Shivam Kumar
2022-12-19 14:19         ` Hyman Huang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.