All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/3] support dirty restraint on vCPU
@ 2021-11-30 10:28 huangy81
       [not found] ` <cover.1638267948.git.huangy81@chinatelecom.cn>
  2021-11-30 12:57 ` [PATCH v7 0/3] support dirty restraint " Peter Xu
  0 siblings, 2 replies; 14+ messages in thread
From: huangy81 @ 2021-11-30 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Hyman, Juan Quintela, Richard Henderson,
	Markus ArmBruster, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

The patch [2/3] has not been touched so far. Any corrections and
suggetions are welcome. 

Please review, thanks!

v7:
- rebase on master
- polish the comments and error message according to the
  advices given by Markus
- introduce dirtylimit_enabled function to pre-check if dirty
  page limit is enabled before canceling.

v6:
- rebase on master
- fix dirtylimit setup crash found by Markus
- polish the comments according to the advice given by Markus
- adjust the qemu qmp command tag to 7.0

v5:
- rebase on master
- adjust the throttle algorithm by removing the tuning in 
  RESTRAINT_RATIO case so that dirty page rate could reachs the quota
  more quickly.
- fix percentage update in throttle iteration.

v4:
- rebase on master
- modify the following points according to the advice given by Markus
  1. move the defination into migration.json
  2. polish the comments of set-dirty-limit
  3. do the syntax check and change dirty rate to dirty page rate

Thanks for the carefule reviews made by Markus.

Please review, thanks!

v3:
- rebase on master
- modify the following points according to the advice given by Markus
  1. remove the DirtyRateQuotaVcpu and use its field as option directly
  2. add comments to show details of what dirtylimit setup do
  3. explain how to use dirtylimit in combination with existing qmp
     commands "calc-dirty-rate" and "query-dirty-rate" in documentation.

Thanks for the carefule reviews made by Markus.

Please review, thanks!

Hyman

v2:
- rebase on master
- modify the following points according to the advices given by Juan
  1. rename dirtyrestraint to dirtylimit
  2. implement the full lifecyle function of dirtylimit_calc, include
     dirtylimit_calc and dirtylimit_calc_quit
  3. introduce 'quit' field in dirtylimit_calc_state to implement the
     dirtylimit_calc_quit
  4. remove the ready_cond and ready_mtx since it may not be suitable
  5. put the 'record_dirtypage' function code at the beggining of the
     file
  6. remove the unnecesary return;
- other modifications has been made after code review
  1. introduce 'bmap' and 'nr' field in dirtylimit_state to record the
     number of running thread forked by dirtylimit
  2. stop the dirtyrate calculation thread if all the dirtylimit thread
     are stopped
  3. do some renaming works
     dirtyrate calulation thread -> dirtylimit-calc
     dirtylimit thread -> dirtylimit-{cpu_index}
     function name do_dirtyrestraint -> dirtylimit_check
     qmp command dirty-restraint -> set-drity-limit
     qmp command dirty-restraint-cancel -> cancel-dirty-limit
     header file dirtyrestraint.h -> dirtylimit.h

Please review, thanks !

thanks for the accurate and timely advices given by Juan. we really
appreciate it if corrections and suggetions about this patchset are
proposed.

Best Regards !

Hyman

v1:
this patchset introduce a mechanism to impose dirty restraint
on vCPU, aiming to keep the vCPU running in a certain dirtyrate
given by user. dirty restraint on vCPU maybe an alternative
method to implement convergence logic for live migration,
which could improve guest memory performance during migration
compared with traditional method in theory.

For the current live migration implementation, the convergence
logic throttles all vCPUs of the VM, which has some side effects.
-'read processes' on vCPU will be unnecessarily penalized
- throttle increase percentage step by step, which seems
  struggling to find the optimal throttle percentage when
  dirtyrate is high.
- hard to predict the remaining time of migration if the
  throttling percentage reachs 99%

to a certain extent, the dirty restraint machnism can fix these
effects by throttling at vCPU granularity during migration.

the implementation is rather straightforward, we calculate
vCPU dirtyrate via the Dirty Ring mechanism periodically
as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
does, for vCPU that be specified to impose dirty restraint,
we throttle it periodically as the auto-converge does, once after
throttling, we compare the quota dirtyrate with current dirtyrate,
if current dirtyrate is not under the quota, increase the throttling
percentage until current dirtyrate is under the quota.

this patchset is the basis of implmenting a new auto-converge method
for live migration, we introduce two qmp commands for impose/cancel
the dirty restraint on specified vCPU, so it also can be an independent
api to supply the upper app such as libvirt, which can use it to
implement the convergence logic during live migration, supplemented
with the qmp 'calc-dirty-rate' command or whatever.

we post this patchset for RFC and any corrections and suggetions about
the implementation, api, throttleing algorithm or whatever are very
appreciated!

Please review, thanks !

Best Regards !

Hyman Huang (3):
  migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  cpu-throttle: implement vCPU throttle
  cpus-common: implement dirty page limit on vCPU

 cpus-common.c                 |  48 +++++++
 include/exec/memory.h         |   5 +-
 include/hw/core/cpu.h         |   9 ++
 include/sysemu/cpu-throttle.h |  30 ++++
 include/sysemu/dirtylimit.h   |  44 ++++++
 migration/dirtyrate.c         | 139 +++++++++++++++++--
 migration/dirtyrate.h         |   2 +
 qapi/migration.json           |  43 ++++++
 softmmu/cpu-throttle.c        | 316 ++++++++++++++++++++++++++++++++++++++++++
 softmmu/trace-events          |   5 +
 softmmu/vl.c                  |   1 +
 11 files changed, 631 insertions(+), 11 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v7 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically
       [not found] ` <cover.1638267948.git.huangy81@chinatelecom.cn>
@ 2021-11-30 10:28   ` huangy81
  2021-11-30 13:04     ` Peter Xu
  2021-11-30 10:28   ` [PATCH v7 2/3] cpu-throttle: implement vCPU throttle huangy81
  2021-11-30 10:28   ` [PATCH v7 3/3] cpus-common: implement dirty page limit on vCPU huangy81
  2 siblings, 1 reply; 14+ messages in thread
From: huangy81 @ 2021-11-30 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Hyman, Juan Quintela, Richard Henderson,
	Markus ArmBruster, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty restraint.

Implement thread for calculate dirtyrate periodly, which will
be used for dirty restraint.

Add dirtylimit.h to introduce the util function for dirty
limit implementation.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
 include/exec/memory.h       |   5 +-
 include/sysemu/dirtylimit.h |  44 ++++++++++++++
 migration/dirtyrate.c       | 139 ++++++++++++++++++++++++++++++++++++++++----
 migration/dirtyrate.h       |   2 +
 4 files changed, 179 insertions(+), 11 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 20f1b27..606bec8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 /* Dirty tracking enabled because measuring dirty rate */
 #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
 
-#define GLOBAL_DIRTY_MASK  (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT      (1U << 2)
+
+#define GLOBAL_DIRTY_MASK  (0x7)
 
 extern unsigned int global_dirty_tracking;
 
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 0000000..49298a2
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,44 @@
+/*
+ * dirty limit helper functions
+ *
+ * Copyright (c) 2021 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_PERIOD_TIME_S   15      /* 15s */
+
+/**
+ * dirtylimit_calc_current:
+ *
+ * get current dirty page rate for specified vCPU.
+ */
+int64_t dirtylimit_calc_current(int cpu_index);
+
+/**
+ * dirtylimit_calc:
+ *
+ * start dirty page rate calculation thread.
+ */
+void dirtylimit_calc(void);
+
+/**
+ * dirtylimit_calc_quit:
+ *
+ * quit dirty page rate calculation thread.
+ */
+void dirtylimit_calc_quit(void);
+
+/**
+ * dirtylimit_calc_state_init:
+ *
+ * initialize dirty page rate calculation state.
+ */
+void dirtylimit_calc_state_init(int max_cpus);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index d65e744..d370a21 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -27,6 +27,7 @@
 #include "qapi/qmp/qdict.h"
 #include "sysemu/kvm.h"
 #include "sysemu/runstate.h"
+#include "sysemu/dirtylimit.h"
 #include "exec/memory.h"
 
 /*
@@ -46,6 +47,134 @@ static struct DirtyRateStat DirtyStat;
 static DirtyRateMeasureMode dirtyrate_mode =
                 DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
+#define DIRTYLIMIT_CALC_TIME_MS         1000    /* 1000ms */
+
+struct {
+    DirtyRatesData data;
+    int64_t period;
+    bool quit;
+} *dirtylimit_calc_state;
+
+static void dirtylimit_global_dirty_log_start(void)
+{
+    qemu_mutex_lock_iothread();
+    memory_global_dirty_log_start(GLOBAL_DIRTY_LIMIT);
+    qemu_mutex_unlock_iothread();
+}
+
+static void dirtylimit_global_dirty_log_stop(void)
+{
+    qemu_mutex_lock_iothread();
+    memory_global_dirty_log_sync();
+    memory_global_dirty_log_stop(GLOBAL_DIRTY_LIMIT);
+    qemu_mutex_unlock_iothread();
+}
+
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+                                     CPUState *cpu, bool start)
+{
+    if (start) {
+        dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+    } else {
+        dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+    }
+}
+
+static void dirtylimit_calc_func(void)
+{
+    CPUState *cpu;
+    DirtyPageRecord *dirty_pages;
+    int64_t start_time, end_time, calc_time;
+    DirtyRateVcpu rate;
+    int i = 0;
+
+    dirty_pages = g_malloc0(sizeof(*dirty_pages) *
+        dirtylimit_calc_state->data.nvcpu);
+
+    dirtylimit_global_dirty_log_start();
+
+    CPU_FOREACH(cpu) {
+        record_dirtypages(dirty_pages, cpu, true);
+    }
+
+    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000);
+    end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    calc_time = end_time - start_time;
+
+    dirtylimit_global_dirty_log_stop();
+
+    CPU_FOREACH(cpu) {
+        record_dirtypages(dirty_pages, cpu, false);
+    }
+
+    for (i = 0; i < dirtylimit_calc_state->data.nvcpu; i++) {
+        uint64_t increased_dirty_pages =
+            dirty_pages[i].end_pages - dirty_pages[i].start_pages;
+        uint64_t memory_size_MB =
+            (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+        int64_t dirtyrate = (memory_size_MB * 1000) / calc_time;
+
+        rate.id = i;
+        rate.dirty_rate  = dirtyrate;
+        dirtylimit_calc_state->data.rates[i] = rate;
+
+        trace_dirtyrate_do_calculate_vcpu(i,
+            dirtylimit_calc_state->data.rates[i].dirty_rate);
+    }
+}
+
+static void *dirtylimit_calc_thread(void *opaque)
+{
+    rcu_register_thread();
+
+    while (!qatomic_read(&dirtylimit_calc_state->quit)) {
+        dirtylimit_calc_func();
+        sleep(dirtylimit_calc_state->period);
+    }
+
+    rcu_unregister_thread();
+    return NULL;
+}
+
+int64_t dirtylimit_calc_current(int cpu_index)
+{
+    DirtyRateVcpu *rates = dirtylimit_calc_state->data.rates;
+
+    return qatomic_read(&rates[cpu_index].dirty_rate);
+}
+
+void dirtylimit_calc(void)
+{
+    if (unlikely(qatomic_read(&dirtylimit_calc_state->quit))) {
+        qatomic_set(&dirtylimit_calc_state->quit, 0);
+        QemuThread thread;
+        qemu_thread_create(&thread, "dirtylimit-calc",
+            dirtylimit_calc_thread,
+            NULL, QEMU_THREAD_DETACHED);
+    }
+}
+
+void dirtylimit_calc_quit(void)
+{
+    qatomic_set(&dirtylimit_calc_state->quit, 1);
+}
+
+void dirtylimit_calc_state_init(int max_cpus)
+{
+    dirtylimit_calc_state =
+        g_malloc0(sizeof(*dirtylimit_calc_state));
+
+    dirtylimit_calc_state->data.nvcpu = max_cpus;
+    dirtylimit_calc_state->data.rates =
+        g_malloc0(sizeof(DirtyRateVcpu) * max_cpus);
+
+    dirtylimit_calc_state->period =
+        DIRTYLIMIT_CALC_PERIOD_TIME_S;
+
+    dirtylimit_calc_state->quit = true;
+}
+
 static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
 {
     int64_t current_time;
@@ -396,16 +525,6 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
     return true;
 }
 
-static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
-                                     CPUState *cpu, bool start)
-{
-    if (start) {
-        dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
-    } else {
-        dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
-    }
-}
-
 static void dirtyrate_global_dirty_log_start(void)
 {
     qemu_mutex_lock_iothread();
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index 69d4c5b..e96acdc 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -70,6 +70,8 @@ typedef struct VcpuStat {
     DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
 } VcpuStat;
 
+typedef struct VcpuStat DirtyRatesData;
+
 /*
  * Store calculation statistics for each measure.
  */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v7 2/3] cpu-throttle: implement vCPU throttle
       [not found] ` <cover.1638267948.git.huangy81@chinatelecom.cn>
  2021-11-30 10:28   ` [PATCH v7 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
@ 2021-11-30 10:28   ` huangy81
  2021-11-30 10:28   ` [PATCH v7 3/3] cpus-common: implement dirty page limit on vCPU huangy81
  2 siblings, 0 replies; 14+ messages in thread
From: huangy81 @ 2021-11-30 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Hyman, Juan Quintela, Richard Henderson,
	Markus ArmBruster, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

Impose dirty restraint on vCPU by kicking it and sleep
as the auto-converge does during migration, but just
kick the specified vCPU instead, not all vCPUs of vm.

Start a thread to track the dirtylimit status and adjust
the throttle pencentage dynamically depend on current
and quota dirtyrate.

Introduce the util function in the header for dirtylimit
implementation.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
 include/sysemu/cpu-throttle.h |  30 ++++
 softmmu/cpu-throttle.c        | 316 ++++++++++++++++++++++++++++++++++++++++++
 softmmu/trace-events          |   5 +
 3 files changed, 351 insertions(+)

diff --git a/include/sysemu/cpu-throttle.h b/include/sysemu/cpu-throttle.h
index d65bdef..334e5e2 100644
--- a/include/sysemu/cpu-throttle.h
+++ b/include/sysemu/cpu-throttle.h
@@ -65,4 +65,34 @@ bool cpu_throttle_active(void);
  */
 int cpu_throttle_get_percentage(void);
 
+/**
+ * dirtylimit_enabled
+ *
+ * Returns: %true if dirty page limit for vCPU is enabled, %false otherwise.
+ */
+bool dirtylimit_enabled(int cpu_index);
+
+/**
+ * dirtylimit_state_init:
+ *
+ * initialize golobal state for dirtylimit
+ */
+void dirtylimit_state_init(int max_cpus);
+
+/**
+ * dirtylimit_vcpu:
+ *
+ * impose dirtylimit on vcpu util reaching the quota dirtyrate
+ */
+void dirtylimit_vcpu(int cpu_index,
+                     uint64_t quota);
+/**
+ * dirtylimit_cancel_vcpu:
+ *
+ * cancel dirtylimit for the specified vcpu
+ *
+ * Returns: the number of running threads for dirtylimit
+ */
+int dirtylimit_cancel_vcpu(int cpu_index);
+
 #endif /* SYSEMU_CPU_THROTTLE_H */
diff --git a/softmmu/cpu-throttle.c b/softmmu/cpu-throttle.c
index 8c2144a..f199d68 100644
--- a/softmmu/cpu-throttle.c
+++ b/softmmu/cpu-throttle.c
@@ -29,6 +29,8 @@
 #include "qemu/main-loop.h"
 #include "sysemu/cpus.h"
 #include "sysemu/cpu-throttle.h"
+#include "sysemu/dirtylimit.h"
+#include "trace.h"
 
 /* vcpu throttling controls */
 static QEMUTimer *throttle_timer;
@@ -38,6 +40,320 @@ static unsigned int throttle_percentage;
 #define CPU_THROTTLE_PCT_MAX 99
 #define CPU_THROTTLE_TIMESLICE_NS 10000000
 
+#define DIRTYLIMIT_TOLERANCE_RANGE  15      /* 15MB/s */
+
+#define DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK     75
+#define DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK    90
+
+#define DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE     5
+#define DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE    2
+
+typedef enum {
+    RESTRAIN_KEEP,
+    RESTRAIN_RATIO,
+    RESTRAIN_HEAVY,
+    RESTRAIN_SLIGHT,
+} RestrainPolicy;
+
+typedef struct DirtyLimitState {
+    int cpu_index;
+    bool enabled;
+    uint64_t quota;     /* quota dirtyrate MB/s */
+    QemuThread thread;
+    char *name;         /* thread name */
+} DirtyLimitState;
+
+struct {
+    DirtyLimitState *states;
+    int max_cpus;
+    unsigned long *bmap; /* running thread bitmap */
+    unsigned long nr;
+} *dirtylimit_state;
+
+bool dirtylimit_enabled(int cpu_index)
+{
+    return qatomic_read(&dirtylimit_state->states[cpu_index].enabled);
+}
+
+static inline void dirtylimit_set_quota(int cpu_index, uint64_t quota)
+{
+    qatomic_set(&dirtylimit_state->states[cpu_index].quota, quota);
+}
+
+static inline uint64_t dirtylimit_quota(int cpu_index)
+{
+    return qatomic_read(&dirtylimit_state->states[cpu_index].quota);
+}
+
+static int64_t dirtylimit_current(int cpu_index)
+{
+    return dirtylimit_calc_current(cpu_index);
+}
+
+static void dirtylimit_vcpu_thread(CPUState *cpu, run_on_cpu_data data)
+{
+    double pct;
+    double throttle_ratio;
+    int64_t sleeptime_ns, endtime_ns;
+    int *percentage = (int *)data.host_ptr;
+
+    pct = (double)(*percentage) / 100;
+    throttle_ratio = pct / (1 - pct);
+    /* Add 1ns to fix double's rounding error (like 0.9999999...) */
+    sleeptime_ns = (int64_t)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS + 1);
+    endtime_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
+    while (sleeptime_ns > 0 && !cpu->stop) {
+        if (sleeptime_ns > SCALE_MS) {
+            qemu_cond_timedwait_iothread(cpu->halt_cond,
+                                         sleeptime_ns / SCALE_MS);
+        } else {
+            qemu_mutex_unlock_iothread();
+            g_usleep(sleeptime_ns / SCALE_US);
+            qemu_mutex_lock_iothread();
+        }
+        sleeptime_ns = endtime_ns - qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+    }
+    qatomic_set(&cpu->throttle_thread_scheduled, 0);
+
+    free(percentage);
+}
+
+static void dirtylimit_check(int cpu_index,
+                             int percentage)
+{
+    CPUState *cpu;
+    int64_t sleeptime_ns, starttime_ms, currenttime_ms;
+    int *pct_parameter;
+    double pct;
+
+    pct = (double) percentage / 100;
+
+    starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+    while (true) {
+        CPU_FOREACH(cpu) {
+            if ((cpu_index == cpu->cpu_index) &&
+                (!qatomic_xchg(&cpu->throttle_thread_scheduled, 1))) {
+                pct_parameter = malloc(sizeof(*pct_parameter));
+                *pct_parameter = percentage;
+                async_run_on_cpu(cpu, dirtylimit_vcpu_thread,
+                                 RUN_ON_CPU_HOST_PTR(pct_parameter));
+                break;
+            }
+        }
+
+        sleeptime_ns = CPU_THROTTLE_TIMESLICE_NS / (1 - pct);
+        g_usleep(sleeptime_ns / SCALE_US);
+
+        currenttime_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+        if (unlikely((currenttime_ms - starttime_ms) >
+                     (DIRTYLIMIT_CALC_PERIOD_TIME_S * 1000))) {
+            break;
+        }
+    }
+}
+
+static uint64_t dirtylimit_init_pct(uint64_t quota,
+                                    uint64_t current)
+{
+    uint64_t limit_pct = 0;
+
+    if (quota >= current || (current == 0) ||
+        ((current - quota) <= DIRTYLIMIT_TOLERANCE_RANGE)) {
+        limit_pct = 0;
+    } else {
+        limit_pct = (current - quota) * 100 / current;
+
+        limit_pct = MIN(limit_pct,
+            DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK);
+    }
+
+    return limit_pct;
+}
+
+static RestrainPolicy dirtylimit_policy(unsigned int last_pct,
+                                        uint64_t quota,
+                                        uint64_t current)
+{
+    uint64_t max, min;
+
+    max = MAX(quota, current);
+    min = MIN(quota, current);
+    if ((max - min) <= DIRTYLIMIT_TOLERANCE_RANGE) {
+        return RESTRAIN_KEEP;
+    }
+    if (last_pct < DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK) {
+        /* last percentage locates in [0, 75)*/
+        return RESTRAIN_RATIO;
+    } else if (last_pct < DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK) {
+        /* last percentage locates in [75, 90)*/
+        return RESTRAIN_HEAVY;
+    } else {
+        /* last percentage locates in [90, 99]*/
+        return RESTRAIN_SLIGHT;
+    }
+}
+
+static uint64_t dirtylimit_pct(unsigned int last_pct,
+                               uint64_t quota,
+                               uint64_t current)
+{
+    uint64_t limit_pct = 0;
+    RestrainPolicy policy;
+    bool mitigate = (quota > current) ? true : false;
+
+    if (mitigate && ((current == 0) ||
+        (last_pct <= DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE))) {
+        return 0;
+    }
+
+    policy = dirtylimit_policy(last_pct, quota, current);
+    switch (policy) {
+    case RESTRAIN_SLIGHT:
+        /* [90, 99] */
+        if (mitigate) {
+            limit_pct =
+                last_pct - DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE;
+        } else {
+            limit_pct =
+                last_pct + DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE;
+
+            limit_pct = MIN(limit_pct, CPU_THROTTLE_PCT_MAX);
+        }
+       break;
+    case RESTRAIN_HEAVY:
+        /* [75, 90) */
+        if (mitigate) {
+            limit_pct =
+                last_pct - DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE;
+        } else {
+            limit_pct =
+                last_pct + DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE;
+
+            limit_pct = MIN(limit_pct,
+                DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK);
+        }
+       break;
+    case RESTRAIN_RATIO:
+        /* [0, 75) */
+        if (mitigate) {
+            if (last_pct <= (((quota - current) * 100 / quota))) {
+                limit_pct = 0;
+            } else {
+                limit_pct = last_pct -
+                    ((quota - current) * 100 / quota);
+                limit_pct = MAX(limit_pct, CPU_THROTTLE_PCT_MIN);
+            }
+        } else {
+            limit_pct = last_pct +
+                ((current - quota) * 100 / current);
+
+            limit_pct = MIN(limit_pct,
+                DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK);
+        }
+       break;
+    case RESTRAIN_KEEP:
+    default:
+       limit_pct = last_pct;
+       break;
+    }
+
+    return limit_pct;
+}
+
+static void *dirtylimit_thread(void *opaque)
+{
+    int cpu_index = *(int *)opaque;
+    uint64_t quota_dirtyrate, current_dirtyrate;
+    unsigned int last_pct = 0;
+    unsigned int pct = 0;
+
+    rcu_register_thread();
+
+    quota_dirtyrate = dirtylimit_quota(cpu_index);
+    current_dirtyrate = dirtylimit_current(cpu_index);
+
+    pct = dirtylimit_init_pct(quota_dirtyrate, current_dirtyrate);
+
+    do {
+        trace_dirtylimit_impose(cpu_index,
+            quota_dirtyrate, current_dirtyrate, pct);
+
+        last_pct = pct;
+        if (pct == 0) {
+            sleep(DIRTYLIMIT_CALC_PERIOD_TIME_S);
+        } else {
+            dirtylimit_check(cpu_index, pct);
+        }
+
+        quota_dirtyrate = dirtylimit_quota(cpu_index);
+        current_dirtyrate = dirtylimit_current(cpu_index);
+
+        pct = dirtylimit_pct(last_pct, quota_dirtyrate, current_dirtyrate);
+    } while (dirtylimit_enabled(cpu_index));
+
+    rcu_unregister_thread();
+
+    return NULL;
+}
+
+int dirtylimit_cancel_vcpu(int cpu_index)
+{
+    int i;
+    int nr_threads = 0;
+
+    qatomic_set(&dirtylimit_state->states[cpu_index].enabled, 0);
+    bitmap_test_and_clear_atomic(dirtylimit_state->bmap, cpu_index, 1);
+
+    for (i = 0; i < dirtylimit_state->nr; i++) {
+        unsigned long temp = dirtylimit_state->bmap[i];
+        nr_threads += ctpopl(temp);
+    }
+
+   return nr_threads;
+}
+
+void dirtylimit_vcpu(int cpu_index,
+                     uint64_t quota)
+{
+    trace_dirtylimit_vcpu(cpu_index, quota);
+
+    dirtylimit_set_quota(cpu_index, quota);
+
+    if (unlikely(!dirtylimit_enabled(cpu_index))) {
+        qatomic_set(&dirtylimit_state->states[cpu_index].enabled, 1);
+        dirtylimit_state->states[cpu_index].name =
+            g_strdup_printf("dirtylimit-%d", cpu_index);
+        qemu_thread_create(&dirtylimit_state->states[cpu_index].thread,
+            dirtylimit_state->states[cpu_index].name,
+            dirtylimit_thread,
+            (void *)&dirtylimit_state->states[cpu_index].cpu_index,
+            QEMU_THREAD_DETACHED);
+        bitmap_set_atomic(dirtylimit_state->bmap, cpu_index, 1);
+    }
+}
+
+void dirtylimit_state_init(int max_cpus)
+{
+    int i;
+
+    dirtylimit_state = g_malloc0(sizeof(*dirtylimit_state));
+
+    dirtylimit_state->states =
+            g_malloc0(sizeof(DirtyLimitState) * max_cpus);
+
+    for (i = 0; i < max_cpus; i++) {
+        dirtylimit_state->states[i].cpu_index = i;
+    }
+
+    dirtylimit_state->max_cpus = max_cpus;
+    dirtylimit_state->bmap = bitmap_new(max_cpus);
+    bitmap_clear(dirtylimit_state->bmap, 0, max_cpus);
+    dirtylimit_state->nr = BITS_TO_LONGS(max_cpus);
+
+    trace_dirtylimit_state_init(max_cpus);
+}
+
 static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
 {
     double pct;
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 9c88887..a7c9c04 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -31,3 +31,8 @@ runstate_set(int current_state, const char *current_state_str, int new_state, co
 system_wakeup_request(int reason) "reason=%d"
 qemu_system_shutdown_request(int reason) "reason=%d"
 qemu_system_powerdown_request(void) ""
+
+#cpu-throttle.c
+dirtylimit_state_init(int max_cpus) "dirtylimit state init: max cpus %d"
+dirtylimit_impose(int cpu_index, uint64_t quota, uint64_t current, int pct) "CPU[%d] impose dirtylimit: quota %" PRIu64 ", current %" PRIu64 ", percentage %d"
+dirtylimit_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set quota dirtylimit %"PRIu64
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v7 3/3] cpus-common: implement dirty page limit on vCPU
       [not found] ` <cover.1638267948.git.huangy81@chinatelecom.cn>
  2021-11-30 10:28   ` [PATCH v7 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
  2021-11-30 10:28   ` [PATCH v7 2/3] cpu-throttle: implement vCPU throttle huangy81
@ 2021-11-30 10:28   ` huangy81
  2021-11-30 13:21     ` Peter Xu
  2021-12-02 16:02     ` Markus Armbruster
  2 siblings, 2 replies; 14+ messages in thread
From: huangy81 @ 2021-11-30 10:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Hyman, Juan Quintela, Richard Henderson,
	Markus ArmBruster, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

Implement dirtyrate calculation periodically basing on
dirty-ring and throttle vCPU until it reachs the quota
dirty page rate given by user.

Introduce qmp commands set-dirty-limit/cancel-dirty-limit to
set/cancel dirty page limit on vCPU.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
 cpus-common.c         | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/core/cpu.h |  9 +++++++++
 qapi/migration.json   | 43 +++++++++++++++++++++++++++++++++++++++++++
 softmmu/vl.c          |  1 +
 4 files changed, 101 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index 6e73d3e..86c7712 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -23,6 +23,11 @@
 #include "hw/core/cpu.h"
 #include "sysemu/cpus.h"
 #include "qemu/lockable.h"
+#include "sysemu/dirtylimit.h"
+#include "sysemu/cpu-throttle.h"
+#include "sysemu/kvm.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-migration.h"
 
 static QemuMutex qemu_cpu_list_lock;
 static QemuCond exclusive_cond;
@@ -352,3 +357,46 @@ void process_queued_cpu_work(CPUState *cpu)
     qemu_mutex_unlock(&cpu->work_mutex);
     qemu_cond_broadcast(&qemu_work_cond);
 }
+
+void qmp_set_dirty_limit(int64_t idx,
+                         uint64_t dirtyrate,
+                         Error **errp)
+{
+    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
+        error_setg(errp, "setting a dirty page limit requires KVM with"
+                   " accelerator property 'dirty-ring-size' set'");
+        return;
+    }
+
+    dirtylimit_calc();
+    dirtylimit_vcpu(idx, dirtyrate);
+}
+
+void qmp_cancel_dirty_limit(int64_t idx,
+                            Error **errp)
+{
+    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
+        error_setg(errp, "KVM with accelerator property 'dirty-ring-size'"
+                   " not set, abort canceling a dirty page limit");
+        return;
+    }
+
+    if (!dirtylimit_enabled(idx)) {
+        error_setg(errp, "dirty page limit for the CPU %ld not set", idx);
+        return;
+    }
+
+    if (unlikely(!dirtylimit_cancel_vcpu(idx))) {
+        dirtylimit_calc_quit();
+    }
+}
+
+void dirtylimit_setup(int max_cpus)
+{
+    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
+        return;
+    }
+
+    dirtylimit_calc_state_init(max_cpus);
+    dirtylimit_state_init(max_cpus);
+}
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index e948e81..11df012 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -881,6 +881,15 @@ void end_exclusive(void);
  */
 void qemu_init_vcpu(CPUState *cpu);
 
+/**
+ * dirtylimit_setup:
+ *
+ * Initializes the global state of dirtylimit calculation and
+ * dirtylimit itself. This is prepared for vCPU dirtylimit which
+ * could be triggered during vm lifecycle.
+ */
+void dirtylimit_setup(int max_cpus);
+
 #define SSTEP_ENABLE  0x1  /* Enable simulated HW single stepping */
 #define SSTEP_NOIRQ   0x2  /* Do not use IRQ while single stepping */
 #define SSTEP_NOTIMER 0x4  /* Do not Timers while single stepping */
diff --git a/qapi/migration.json b/qapi/migration.json
index bbfd48c..57c9a63 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1850,6 +1850,49 @@
 { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
 
 ##
+# @set-dirty-limit:
+#
+# Set the upper limit of dirty page rate for a virtual CPU.
+#
+# Requires KVM with accelerator property "dirty-ring-size" set.
+# A virtual CPU's dirty page rate is a measure of its memory load.
+# To observe dirty page rates, use @calc-dirty-rate.
+#
+# @cpu-index: index of the virtual CPU.
+#
+# @dirty-rate: upper limit for the specified vCPU's dirty page rate (MB/s)
+#
+# Since: 7.0
+#
+# Example:
+#   {"execute": "set-dirty-limit"}
+#    "arguments": { "cpu-index": 0,
+#                   "dirty-rate": 200 } }
+#
+##
+{ 'command': 'set-dirty-limit',
+  'data': { 'cpu-index': 'int', 'dirty-rate': 'uint64' } }
+
+##
+# @cancel-dirty-limit:
+#
+# Cancel the dirty page limit for the vCPU which has been set with
+# set-dirty-limit command. Note that this command requires support from
+# dirty ring, same as the "set-dirty-limit" command.
+#
+# @cpu-index: index of the virtual CPU to cancel the dirty page limit
+#
+# Since: 7.0
+#
+# Example:
+#   {"execute": "cancel-dirty-limit"}
+#    "arguments": { "cpu-index": 0 } }
+#
+##
+{ 'command': 'cancel-dirty-limit',
+  'data': { 'cpu-index': 'int' } }
+
+##
 # @snapshot-save:
 #
 # Save a VM snapshot
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 620a1f1..0f83ce3 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -3777,5 +3777,6 @@ void qemu_init(int argc, char **argv, char **envp)
     qemu_init_displays();
     accel_setup_post(current_machine);
     os_setup_post();
+    dirtylimit_setup(current_machine->smp.max_cpus);
     resume_mux_open();
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 0/3] support dirty restraint on vCPU
  2021-11-30 10:28 [PATCH v7 0/3] support dirty restraint on vCPU huangy81
       [not found] ` <cover.1638267948.git.huangy81@chinatelecom.cn>
@ 2021-11-30 12:57 ` Peter Xu
  2021-11-30 14:57   ` Hyman Huang
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Xu @ 2021-11-30 12:57 UTC (permalink / raw)
  To: huangy81
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

On Tue, Nov 30, 2021 at 06:28:10PM +0800, huangy81@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> 
> The patch [2/3] has not been touched so far. Any corrections and
> suggetions are welcome. 

I played with it today, but the vcpu didn't got throttled as expected.

What I did was starting two workload with 500mb/s, each pinned on one vcpu
thread:

[root@fedora ~]# pgrep -fa mig_mon
595 ./mig_mon mm_dirty 1000 500 sequential
604 ./mig_mon mm_dirty 1000 500 sequential
[root@fedora ~]# taskset -pc 595
pid 595's current affinity list: 2
[root@fedora ~]# taskset -pc 604
pid 604's current affinity list: 3

Then start throttle with 100mb/s:

(QEMU) set-dirty-limit cpu-index=3 dirty-rate=100
{"return": {}}
(QEMU) set-dirty-limit cpu-index=2 dirty-rate=100
{"return": {}}

I can see the workload dropped a tiny little bit (perhaps 500mb -> 499mb), then
it keeps going..

Further throttle won't work too:

(QEMU) set-dirty-limit cpu-index=2 dirty-rate=10
{"return": {}}

Funnily, the ssh client got slowed down instead... :(

Yong, how did you test it?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  2021-11-30 10:28   ` [PATCH v7 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
@ 2021-11-30 13:04     ` Peter Xu
  2021-11-30 15:10       ` Hyman Huang
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2021-11-30 13:04 UTC (permalink / raw)
  To: huangy81
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

On Tue, Nov 30, 2021 at 06:28:11PM +0800, huangy81@chinatelecom.cn wrote:
> +static void dirtylimit_calc_func(void)
> +{
> +    CPUState *cpu;
> +    DirtyPageRecord *dirty_pages;
> +    int64_t start_time, end_time, calc_time;
> +    DirtyRateVcpu rate;
> +    int i = 0;
> +
> +    dirty_pages = g_malloc0(sizeof(*dirty_pages) *
> +        dirtylimit_calc_state->data.nvcpu);
> +
> +    dirtylimit_global_dirty_log_start();
> +
> +    CPU_FOREACH(cpu) {
> +        record_dirtypages(dirty_pages, cpu, true);
> +    }
> +
> +    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000);
> +    end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    calc_time = end_time - start_time;
> +
> +    dirtylimit_global_dirty_log_stop();

I haven't looked into the details, but..  I'm wondering whether we should just
keep the dirty ring enabled during the whole process of throttling.

start/stop can be expensive, especially when huge pages are used, start dirty
tracking will start to do huge page split. While right after the "stop" all the
huge pages will need to be rebuild again.

David from Google is even proposing a kernel change to eagerly splitting huge
pages when dirty tracking is enabled.

So I think we can keep the dirty tracking enabled until all the vcpu throttles
are stopped.

> +
> +    CPU_FOREACH(cpu) {
> +        record_dirtypages(dirty_pages, cpu, false);
> +    }

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 3/3] cpus-common: implement dirty page limit on vCPU
  2021-11-30 10:28   ` [PATCH v7 3/3] cpus-common: implement dirty page limit on vCPU huangy81
@ 2021-11-30 13:21     ` Peter Xu
  2021-11-30 15:25       ` Hyman Huang
  2021-12-02 16:02     ` Markus Armbruster
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Xu @ 2021-11-30 13:21 UTC (permalink / raw)
  To: huangy81
  Cc: Juan Quintela, David Hildenbrand, Richard Henderson,
	Markus ArmBruster, qemu-devel, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Tue, Nov 30, 2021 at 06:28:13PM +0800, huangy81@chinatelecom.cn wrote:
>  ##
> +# @set-dirty-limit:
> +#
> +# Set the upper limit of dirty page rate for a virtual CPU.
> +#
> +# Requires KVM with accelerator property "dirty-ring-size" set.
> +# A virtual CPU's dirty page rate is a measure of its memory load.
> +# To observe dirty page rates, use @calc-dirty-rate.
> +#
> +# @cpu-index: index of the virtual CPU.
> +#
> +# @dirty-rate: upper limit for the specified vCPU's dirty page rate (MB/s)
> +#
> +# Since: 7.0
> +#
> +# Example:
> +#   {"execute": "set-dirty-limit"}
> +#    "arguments": { "cpu-index": 0,
> +#                   "dirty-rate": 200 } }
> +#
> +##
> +{ 'command': 'set-dirty-limit',
> +  'data': { 'cpu-index': 'int', 'dirty-rate': 'uint64' } }
> +
> +##
> +# @cancel-dirty-limit:
> +#
> +# Cancel the dirty page limit for the vCPU which has been set with
> +# set-dirty-limit command. Note that this command requires support from
> +# dirty ring, same as the "set-dirty-limit" command.
> +#
> +# @cpu-index: index of the virtual CPU to cancel the dirty page limit
> +#
> +# Since: 7.0
> +#
> +# Example:
> +#   {"execute": "cancel-dirty-limit"}
> +#    "arguments": { "cpu-index": 0 } }
> +#
> +##
> +{ 'command': 'cancel-dirty-limit',
> +  'data': { 'cpu-index': 'int' } }

This seems to be overloaded to be a standalone cmd..

How about:

  { "cmd": "vcpu-dirty-limit",
    "arguments": {
      "cpu": $cpu,
      "enable": true/false,
      "dirty-rate": 100,
    }
  }

If "enable"==false, then "dirty-rate" can be ignored and it'll shut down the
throttling on vcpu N.  Then this command will literally merge the two you
proposed.

It'll be nice if we provide yet another command:

  { "cmd": "query-vcpu-dirty-limit",
    "arguments": {
      "*cpu": $cpu,
    }
  }

When $cpu is specified, we return (cpu=$cpu, real_dirty_rate,
target_dirty_rate) for this vcpu.  When $cpu is not specified, we return an
array of that containing all the vcpus.

It'll be nicer to enhance the output of the query command to e.g. have a global
"enabled"=true/false as long as any vcpu has throttle enabled then the global
throttle is enabled.  I didn't think more than that, but how's that sound so
far?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 0/3] support dirty restraint on vCPU
  2021-11-30 12:57 ` [PATCH v7 0/3] support dirty restraint " Peter Xu
@ 2021-11-30 14:57   ` Hyman Huang
  2021-11-30 16:04     ` Hyman Huang
  0 siblings, 1 reply; 14+ messages in thread
From: Hyman Huang @ 2021-11-30 14:57 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

1.
Start vm with kernel+initrd.img with qemu command line as following:

[root@Hyman_server1 fast_qemu]# cat vm.sh
#!/bin/bash
/usr/bin/qemu-system-x86_64 \
     -display none -vga none \
     -name guest=simple_vm,debug-threads=on \
     -monitor stdio \
     -machine pc-i440fx-2.12 \
     -accel kvm,dirty-ring-size=65536 -cpu host \
     -kernel /home/work/fast_qemu/vmlinuz-5.13.0-rc4+ \
     -initrd /home/work/fast_qemu/initrd-stress.img \
     -append "noapic edd=off printk.time=1 noreplace-smp 
cgroup_disable=memory pci=noearly console=ttyS0 debug ramsize=1500 
ratio=1 sleep=1" \
     -chardev file,id=charserial0,path=/var/log/vm_console.log \
     -serial chardev:charserial0 \
     -qmp unix:/tmp/qmp-sock,server,nowait \
     -D /var/log/vm.log \
     --trace events=/home/work/fast_qemu/events \
     -m 4096 -smp 2 -device sga

2.
Enable the dirtylimit trace event which will output to /var/log/vm.log
[root@Hyman_server1 fast_qemu]# cat /home/work/fast_qemu/events
dirtylimit_state_init
dirtylimit_vcpu
dirtylimit_impose
dirtyrate_do_calculate_vcpu


3.
Connect the qmp server with low level qmp client and set-dirty-limit

[root@Hyman_server1 my_qemu]# python3.6 ./scripts/qmp/qmp-shell -v -p 
/tmp/qmp-sock 
 

Welcome to the QMP low-level shell!
Connected to QEMU 6.1.92

(QEMU) set-dirty-limit cpu-index=1 dirty-rate=400 
 
 

{
     "arguments": {
         "cpu-index": 1,
         "dirty-rate": 400
     },
     "execute": "set-dirty-limit"
}

4.
observe the vcpu current dirty rate and quota dirty rate...

[root@Hyman_server1 ~]# tail -f /var/log/vm.log
dirtylimit_state_init dirtylimit state init: max cpus 2
dirtylimit_vcpu CPU[1] set quota dirtylimit 400
dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 0, 
percentage 0
dirtyrate_do_calculate_vcpu vcpu[0]: 1075 MB/s
dirtyrate_do_calculate_vcpu vcpu[1]: 1061 MB/s
dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 1061, 
percentage 62
dirtyrate_do_calculate_vcpu vcpu[0]: 1133 MB/s
dirtyrate_do_calculate_vcpu vcpu[1]: 380 MB/s
dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 380, 
percentage 57
dirtyrate_do_calculate_vcpu vcpu[0]: 1227 MB/s
dirtyrate_do_calculate_vcpu vcpu[1]: 464 MB/s

We can observe that vcpu-1's dirtyrate is about 400MB/s with dirty page 
limit set and the vcpu-0 is not affected.

5.
observe the vm stress info...
[root@Hyman_server1 fast_qemu]# tail -f /var/log/vm_console.log
[    0.838051] Run /init as init process
[    0.839216]   with arguments:
[    0.840153]     /init
[    0.840882]   with environment:
[    0.841884]     HOME=/
[    0.842649]     TERM=linux
[    0.843478]     edd=off
[    0.844233]     ramsize=1500
[    0.845079]     ratio=1
[    0.845829]     sleep=1
/init (00001): INFO: RAM 1500 MiB across 2 CPUs, ratio 1, sleep 1 us
[    1.158011] random: init: uninitialized urandom read (4096 bytes read)
[    1.448205] random: init: uninitialized urandom read (4096 bytes read)
/init (00001): INFO: 1638282593684ms copied 1 GB in 00729ms
/init (00110): INFO: 1638282593964ms copied 1 GB in 00719ms
/init (00001): INFO: 1638282594405ms copied 1 GB in 00719ms
/init (00110): INFO: 1638282594677ms copied 1 GB in 00713ms
/init (00001): INFO: 1638282595093ms copied 1 GB in 00686ms
/init (00110): INFO: 1638282595339ms copied 1 GB in 00662ms
/init (00001): INFO: 1638282595764ms copied 1 GB in 00670m

PS: the kernel and initrd images comes from:

kernel image: vmlinuz-5.13.0-rc4+, normal centos vmlinuz copied from 
/boot directory

initrd.img: initrd-stress.img, only contains a stress binary, which 
compiled from qemu source tests/migration/stress.c and run as init
in vm.

you can view README.md file of my project 
"met"(https://github.com/newfriday/met) to compile the initrd-stress.img. :)

On 11/30/21 20:57, Peter Xu wrote:
> On Tue, Nov 30, 2021 at 06:28:10PM +0800, huangy81@chinatelecom.cn wrote:
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> The patch [2/3] has not been touched so far. Any corrections and
>> suggetions are welcome.
> 
> I played with it today, but the vcpu didn't got throttled as expected.
> 
> What I did was starting two workload with 500mb/s, each pinned on one vcpu
> thread:
> 
> [root@fedora ~]# pgrep -fa mig_mon
> 595 ./mig_mon mm_dirty 1000 500 sequential
> 604 ./mig_mon mm_dirty 1000 500 sequential
> [root@fedora ~]# taskset -pc 595
> pid 595's current affinity list: 2
> [root@fedora ~]# taskset -pc 604
> pid 604's current affinity list: 3
> 
> Then start throttle with 100mb/s:
> 
> (QEMU) set-dirty-limit cpu-index=3 dirty-rate=100
> {"return": {}}
> (QEMU) set-dirty-limit cpu-index=2 dirty-rate=100
> {"return": {}}
> 
> I can see the workload dropped a tiny little bit (perhaps 500mb -> 499mb), then
> it keeps going..
> 
> Further throttle won't work too:
> 
> (QEMU) set-dirty-limit cpu-index=2 dirty-rate=10
> {"return": {}}
> 
> Funnily, the ssh client got slowed down instead... :(
> 
> Yong, how did you test it?
> 

-- 
Best Regards
Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  2021-11-30 13:04     ` Peter Xu
@ 2021-11-30 15:10       ` Hyman Huang
  0 siblings, 0 replies; 14+ messages in thread
From: Hyman Huang @ 2021-11-30 15:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



On 11/30/21 21:04, Peter Xu wrote:
> On Tue, Nov 30, 2021 at 06:28:11PM +0800, huangy81@chinatelecom.cn wrote:
>> +static void dirtylimit_calc_func(void)
>> +{
>> +    CPUState *cpu;
>> +    DirtyPageRecord *dirty_pages;
>> +    int64_t start_time, end_time, calc_time;
>> +    DirtyRateVcpu rate;
>> +    int i = 0;
>> +
>> +    dirty_pages = g_malloc0(sizeof(*dirty_pages) *
>> +        dirtylimit_calc_state->data.nvcpu);
>> +
>> +    dirtylimit_global_dirty_log_start();
>> +
>> +    CPU_FOREACH(cpu) {
>> +        record_dirtypages(dirty_pages, cpu, true);
>> +    }
>> +
>> +    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +    g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000);
>> +    end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +    calc_time = end_time - start_time;
>> +
>> +    dirtylimit_global_dirty_log_stop();
> 
> I haven't looked into the details, but..  I'm wondering whether we should just
> keep the dirty ring enabled during the whole process of throttling.
> 
> start/stop can be expensive, especially when huge pages are used, start dirty
> tracking will start to do huge page split. While right after the "stop" all the
> huge pages will need to be rebuild again.
> 
> David from Google is even proposing a kernel change to eagerly splitting huge
> pages when dirty tracking is enabled.
> 
> So I think we can keep the dirty tracking enabled until all the vcpu throttles
> are stopped.
Yes, it's a good idea and i'll try this out next version.
> 
>> +
>> +    CPU_FOREACH(cpu) {
>> +        record_dirtypages(dirty_pages, cpu, false);
>> +    }
> 

-- 
Best Regards
Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 3/3] cpus-common: implement dirty page limit on vCPU
  2021-11-30 13:21     ` Peter Xu
@ 2021-11-30 15:25       ` Hyman Huang
  0 siblings, 0 replies; 14+ messages in thread
From: Hyman Huang @ 2021-11-30 15:25 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, David Hildenbrand, Richard Henderson,
	Markus ArmBruster, qemu-devel, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert



On 11/30/21 21:21, Peter Xu wrote:
> On Tue, Nov 30, 2021 at 06:28:13PM +0800, huangy81@chinatelecom.cn wrote:
>>   ##
>> +# @set-dirty-limit:
>> +#
>> +# Set the upper limit of dirty page rate for a virtual CPU.
>> +#
>> +# Requires KVM with accelerator property "dirty-ring-size" set.
>> +# A virtual CPU's dirty page rate is a measure of its memory load.
>> +# To observe dirty page rates, use @calc-dirty-rate.
>> +#
>> +# @cpu-index: index of the virtual CPU.
>> +#
>> +# @dirty-rate: upper limit for the specified vCPU's dirty page rate (MB/s)
>> +#
>> +# Since: 7.0
>> +#
>> +# Example:
>> +#   {"execute": "set-dirty-limit"}
>> +#    "arguments": { "cpu-index": 0,
>> +#                   "dirty-rate": 200 } }
>> +#
>> +##
>> +{ 'command': 'set-dirty-limit',
>> +  'data': { 'cpu-index': 'int', 'dirty-rate': 'uint64' } }
>> +
>> +##
>> +# @cancel-dirty-limit:
>> +#
>> +# Cancel the dirty page limit for the vCPU which has been set with
>> +# set-dirty-limit command. Note that this command requires support from
>> +# dirty ring, same as the "set-dirty-limit" command.
>> +#
>> +# @cpu-index: index of the virtual CPU to cancel the dirty page limit
>> +#
>> +# Since: 7.0
>> +#
>> +# Example:
>> +#   {"execute": "cancel-dirty-limit"}
>> +#    "arguments": { "cpu-index": 0 } }
>> +#
>> +##
>> +{ 'command': 'cancel-dirty-limit',
>> +  'data': { 'cpu-index': 'int' } }
> 
> This seems to be overloaded to be a standalone cmd..
> 
> How about:
> 
>    { "cmd": "vcpu-dirty-limit",
>      "arguments": {
>        "cpu": $cpu,
>        "enable": true/false,
>        "dirty-rate": 100,
>      }
>    }
> 
> If "enable"==false, then "dirty-rate" can be ignored and it'll shut down the
> throttling on vcpu N.  Then this command will literally merge the two you
> proposed.
> 
> It'll be nice if we provide yet another command:
> 
>    { "cmd": "query-vcpu-dirty-limit",
>      "arguments": {
>        "*cpu": $cpu,
>      }
>    }
> 
> When $cpu is specified, we return (cpu=$cpu, real_dirty_rate,
> target_dirty_rate) for this vcpu.  When $cpu is not specified, we return an
> array of that containing all the vcpus.
> 
> It'll be nicer to enhance the output of the query command to e.g. have a global
> "enabled"=true/false as long as any vcpu has throttle enabled then the global
> throttle is enabled.  I didn't think more than that, but how's that sound so
> far?
Soud good, it makes the command easier for programmers to use and 
understand, i'll try this out next version.
> 
> Thanks,
> 

-- 
Best Regards
Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 0/3] support dirty restraint on vCPU
  2021-11-30 14:57   ` Hyman Huang
@ 2021-11-30 16:04     ` Hyman Huang
  0 siblings, 0 replies; 14+ messages in thread
From: Hyman Huang @ 2021-11-30 16:04 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



On 11/30/21 22:57, Hyman Huang wrote:
> 1.
> Start vm with kernel+initrd.img with qemu command line as following:
> 
> [root@Hyman_server1 fast_qemu]# cat vm.sh
> #!/bin/bash
> /usr/bin/qemu-system-x86_64 \
>      -display none -vga none \
>      -name guest=simple_vm,debug-threads=on \
>      -monitor stdio \
>      -machine pc-i440fx-2.12 \
>      -accel kvm,dirty-ring-size=65536 -cpu host \
>      -kernel /home/work/fast_qemu/vmlinuz-5.13.0-rc4+ \
>      -initrd /home/work/fast_qemu/initrd-stress.img \
>      -append "noapic edd=off printk.time=1 noreplace-smp 
> cgroup_disable=memory pci=noearly console=ttyS0 debug ramsize=1500 
> ratio=1 sleep=1" \
>      -chardev file,id=charserial0,path=/var/log/vm_console.log \
>      -serial chardev:charserial0 \
>      -qmp unix:/tmp/qmp-sock,server,nowait \
>      -D /var/log/vm.log \
>      --trace events=/home/work/fast_qemu/events \
>      -m 4096 -smp 2 -device sga
> 
> 2.
> Enable the dirtylimit trace event which will output to /var/log/vm.log
> [root@Hyman_server1 fast_qemu]# cat /home/work/fast_qemu/events
> dirtylimit_state_init
> dirtylimit_vcpu
> dirtylimit_impose
> dirtyrate_do_calculate_vcpu
> 
> 
> 3.
> Connect the qmp server with low level qmp client and set-dirty-limit
> 
> [root@Hyman_server1 my_qemu]# python3.6 ./scripts/qmp/qmp-shell -v -p 
> /tmp/qmp-sock
> 
> Welcome to the QMP low-level shell!
> Connected to QEMU 6.1.92
> 
> (QEMU) set-dirty-limit cpu-index=1 dirty-rate=400
> 
> 
> {
>      "arguments": {
>          "cpu-index": 1,
>          "dirty-rate": 400
>      },
>      "execute": "set-dirty-limit"
> }
> 
> 4.
> observe the vcpu current dirty rate and quota dirty rate...
> 
> [root@Hyman_server1 ~]# tail -f /var/log/vm.log
> dirtylimit_state_init dirtylimit state init: max cpus 2
> dirtylimit_vcpu CPU[1] set quota dirtylimit 400
> dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 0, 
> percentage 0
> dirtyrate_do_calculate_vcpu vcpu[0]: 1075 MB/s
> dirtyrate_do_calculate_vcpu vcpu[1]: 1061 MB/s
> dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 1061, 
> percentage 62
> dirtyrate_do_calculate_vcpu vcpu[0]: 1133 MB/s
> dirtyrate_do_calculate_vcpu vcpu[1]: 380 MB/s
> dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 380, 
> percentage 57
> dirtyrate_do_calculate_vcpu vcpu[0]: 1227 MB/s
> dirtyrate_do_calculate_vcpu vcpu[1]: 464 MB/s
> 
> We can observe that vcpu-1's dirtyrate is about 400MB/s with dirty page 
> limit set and the vcpu-0 is not affected.
> 
> 5.
> observe the vm stress info...
> [root@Hyman_server1 fast_qemu]# tail -f /var/log/vm_console.log
> [    0.838051] Run /init as init process
> [    0.839216]   with arguments:
> [    0.840153]     /init
> [    0.840882]   with environment:
> [    0.841884]     HOME=/
> [    0.842649]     TERM=linux
> [    0.843478]     edd=off
> [    0.844233]     ramsize=1500
> [    0.845079]     ratio=1
> [    0.845829]     sleep=1
> /init (00001): INFO: RAM 1500 MiB across 2 CPUs, ratio 1, sleep 1 us
> [    1.158011] random: init: uninitialized urandom read (4096 bytes read)
> [    1.448205] random: init: uninitialized urandom read (4096 bytes read)
> /init (00001): INFO: 1638282593684ms copied 1 GB in 00729ms
> /init (00110): INFO: 1638282593964ms copied 1 GB in 00719ms
> /init (00001): INFO: 1638282594405ms copied 1 GB in 00719ms
> /init (00110): INFO: 1638282594677ms copied 1 GB in 00713ms
> /init (00001): INFO: 1638282595093ms copied 1 GB in 00686ms
> /init (00110): INFO: 1638282595339ms copied 1 GB in 00662ms
> /init (00001): INFO: 1638282595764ms copied 1 GB in 00670m
> 
> PS: the kernel and initrd images comes from:
> 
> kernel image: vmlinuz-5.13.0-rc4+, normal centos vmlinuz copied from 
> /boot directory
> 
> initrd.img: initrd-stress.img, only contains a stress binary, which 
> compiled from qemu source tests/migration/stress.c and run as init
> in vm.
> 
> you can view README.md file of my project 
> "met"(https://github.com/newfriday/met) to compile the 
> initrd-stress.img. :)
> 
> On 11/30/21 20:57, Peter Xu wrote:
>> On Tue, Nov 30, 2021 at 06:28:10PM +0800, huangy81@chinatelecom.cn wrote:
>>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>>
>>> The patch [2/3] has not been touched so far. Any corrections and
>>> suggetions are welcome.
>>
>> I played with it today, but the vcpu didn't got throttled as expected.
>>
>> What I did was starting two workload with 500mb/s, each pinned on one 
>> vcpu
>> thread:
>>
>> [root@fedora ~]# pgrep -fa mig_mon
>> 595 ./mig_mon mm_dirty 1000 500 sequential
>> 604 ./mig_mon mm_dirty 1000 500 sequential
>> [root@fedora ~]# taskset -pc 595
>> pid 595's current affinity list: 2
>> [root@fedora ~]# taskset -pc 604
>> pid 604's current affinity list: 3
>>
>> Then start throttle with 100mb/s:
>>
>> (QEMU) set-dirty-limit cpu-index=3 dirty-rate=100
>> {"return": {}}
>> (QEMU) set-dirty-limit cpu-index=2 dirty-rate=100
>> {"return": {}}
>>
>> I can see the workload dropped a tiny little bit (perhaps 500mb -> 
>> 499mb), then
>> it keeps going..
The test step above i listed assume that dirtyrate calculated by 
dirtylimit_calc_func via dirty-ring is accurate, which differ from
your test policy.

The macro DIRTYLIMIT_CALC_TIME_MS used as calculation period in 
migration/dirtyrate.c has a big affect on result. So "how we define the 
right dirtyrate" is worth discussing.

Anyway, one of our target is to improve the memory performence during 
migration, so i think memory write/read speed in vm is a convincing 
metric. I'll test the dirtyrate in the way your metioned and analyze the 
result.

>>
>> Further throttle won't work too:
>>
>> (QEMU) set-dirty-limit cpu-index=2 dirty-rate=10
>> {"return": {}}
>>
>> Funnily, the ssh client got slowed down instead... :(
>>
>> Yong, how did you test it?
>>
> 

-- 
Best Regards
Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 3/3] cpus-common: implement dirty page limit on vCPU
  2021-11-30 10:28   ` [PATCH v7 3/3] cpus-common: implement dirty page limit on vCPU huangy81
  2021-11-30 13:21     ` Peter Xu
@ 2021-12-02 16:02     ` Markus Armbruster
  2021-12-03  1:19       ` Hyman Huang
  1 sibling, 1 reply; 14+ messages in thread
From: Markus Armbruster @ 2021-12-02 16:02 UTC (permalink / raw)
  To: huangy81
  Cc: David Hildenbrand, Juan Quintela, Richard Henderson, qemu-devel,
	Peter Xu, Dr. David Alan Gilbert, Paolo Bonzini,
	Philippe Mathieu-Daudé

huangy81@chinatelecom.cn writes:

> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>
> Implement dirtyrate calculation periodically basing on
> dirty-ring and throttle vCPU until it reachs the quota
> dirty page rate given by user.
>
> Introduce qmp commands set-dirty-limit/cancel-dirty-limit to
> set/cancel dirty page limit on vCPU.
>
> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> ---
>  cpus-common.c         | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/core/cpu.h |  9 +++++++++
>  qapi/migration.json   | 43 +++++++++++++++++++++++++++++++++++++++++++
>  softmmu/vl.c          |  1 +
>  4 files changed, 101 insertions(+)
>
> diff --git a/cpus-common.c b/cpus-common.c
> index 6e73d3e..86c7712 100644
> --- a/cpus-common.c
> +++ b/cpus-common.c
> @@ -23,6 +23,11 @@
>  #include "hw/core/cpu.h"
>  #include "sysemu/cpus.h"
>  #include "qemu/lockable.h"
> +#include "sysemu/dirtylimit.h"
> +#include "sysemu/cpu-throttle.h"
> +#include "sysemu/kvm.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-commands-migration.h"
>  
>  static QemuMutex qemu_cpu_list_lock;
>  static QemuCond exclusive_cond;
> @@ -352,3 +357,46 @@ void process_queued_cpu_work(CPUState *cpu)
>      qemu_mutex_unlock(&cpu->work_mutex);
>      qemu_cond_broadcast(&qemu_work_cond);
>  }
> +
> +void qmp_set_dirty_limit(int64_t idx,
> +                         uint64_t dirtyrate,
> +                         Error **errp)
> +{
> +    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
> +        error_setg(errp, "setting a dirty page limit requires KVM with"
> +                   " accelerator property 'dirty-ring-size' set'");
> +        return;
> +    }
> +
> +    dirtylimit_calc();
> +    dirtylimit_vcpu(idx, dirtyrate);
> +}
> +
> +void qmp_cancel_dirty_limit(int64_t idx,
> +                            Error **errp)
> +{
> +    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
> +        error_setg(errp, "KVM with accelerator property 'dirty-ring-size'"
> +                   " not set, abort canceling a dirty page limit");
> +        return;
> +    }

Is this check actually needed?  It's not when !dirtylimit_enabled(idx).

> +
> +    if (!dirtylimit_enabled(idx)) {
> +        error_setg(errp, "dirty page limit for the CPU %ld not set", idx);

"for CPU"

> +        return;
> +    }
> +
> +    if (unlikely(!dirtylimit_cancel_vcpu(idx))) {

I don't think unlikely() matters here.

> +        dirtylimit_calc_quit();
> +    }
> +}
> +
> +void dirtylimit_setup(int max_cpus)
> +{
> +    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
> +        return;
> +    }
> +
> +    dirtylimit_calc_state_init(max_cpus);
> +    dirtylimit_state_init(max_cpus);
> +}
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index e948e81..11df012 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -881,6 +881,15 @@ void end_exclusive(void);
>   */
>  void qemu_init_vcpu(CPUState *cpu);
>  
> +/**
> + * dirtylimit_setup:
> + *
> + * Initializes the global state of dirtylimit calculation and
> + * dirtylimit itself. This is prepared for vCPU dirtylimit which
> + * could be triggered during vm lifecycle.
> + */
> +void dirtylimit_setup(int max_cpus);
> +
>  #define SSTEP_ENABLE  0x1  /* Enable simulated HW single stepping */
>  #define SSTEP_NOIRQ   0x2  /* Do not use IRQ while single stepping */
>  #define SSTEP_NOTIMER 0x4  /* Do not Timers while single stepping */
> diff --git a/qapi/migration.json b/qapi/migration.json
> index bbfd48c..57c9a63 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1850,6 +1850,49 @@
>  { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
>  
>  ##
> +# @set-dirty-limit:
> +#
> +# Set the upper limit of dirty page rate for a virtual CPU.
> +#
> +# Requires KVM with accelerator property "dirty-ring-size" set.
> +# A virtual CPU's dirty page rate is a measure of its memory load.
> +# To observe dirty page rates, use @calc-dirty-rate.
> +#
> +# @cpu-index: index of the virtual CPU.
> +#
> +# @dirty-rate: upper limit for the specified vCPU's dirty page rate (MB/s)
> +#
> +# Since: 7.0
> +#
> +# Example:
> +#   {"execute": "set-dirty-limit"}
> +#    "arguments": { "cpu-index": 0,
> +#                   "dirty-rate": 200 } }
> +#
> +##
> +{ 'command': 'set-dirty-limit',
> +  'data': { 'cpu-index': 'int', 'dirty-rate': 'uint64' } }
> +
> +##
> +# @cancel-dirty-limit:
> +#
> +# Cancel the dirty page limit for the vCPU which has been set with
> +# set-dirty-limit command. Note that this command requires support from
> +# dirty ring, same as the "set-dirty-limit" command.
> +#
> +# @cpu-index: index of the virtual CPU to cancel the dirty page limit

I'd go with

   # @cpu-index: index of the virtual CPU.

> +#
> +# Since: 7.0
> +#
> +# Example:
> +#   {"execute": "cancel-dirty-limit"}
> +#    "arguments": { "cpu-index": 0 } }
> +#
> +##
> +{ 'command': 'cancel-dirty-limit',
> +  'data': { 'cpu-index': 'int' } }
> +
> +##
>  # @snapshot-save:
>  #
>  # Save a VM snapshot
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 620a1f1..0f83ce3 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -3777,5 +3777,6 @@ void qemu_init(int argc, char **argv, char **envp)
>      qemu_init_displays();
>      accel_setup_post(current_machine);
>      os_setup_post();
> +    dirtylimit_setup(current_machine->smp.max_cpus);
>      resume_mux_open();
>  }

QAPI schema:
Acked-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-02 16:02     ` Markus Armbruster
@ 2021-12-03  1:19       ` Hyman Huang
  0 siblings, 0 replies; 14+ messages in thread
From: Hyman Huang @ 2021-12-03  1:19 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: David Hildenbrand, Juan Quintela, Richard Henderson, qemu-devel,
	Peter Xu, Dr. David Alan Gilbert, Paolo Bonzini,
	Philippe Mathieu-Daudé



在 2021/12/3 0:02, Markus Armbruster 写道:
> huangy81@chinatelecom.cn writes:
> 
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> Implement dirtyrate calculation periodically basing on
>> dirty-ring and throttle vCPU until it reachs the quota
>> dirty page rate given by user.
>>
>> Introduce qmp commands set-dirty-limit/cancel-dirty-limit to
>> set/cancel dirty page limit on vCPU.
>>
>> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>> ---
>>   cpus-common.c         | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/hw/core/cpu.h |  9 +++++++++
>>   qapi/migration.json   | 43 +++++++++++++++++++++++++++++++++++++++++++
>>   softmmu/vl.c          |  1 +
>>   4 files changed, 101 insertions(+)
>>
>> diff --git a/cpus-common.c b/cpus-common.c
>> index 6e73d3e..86c7712 100644
>> --- a/cpus-common.c
>> +++ b/cpus-common.c
>> @@ -23,6 +23,11 @@
>>   #include "hw/core/cpu.h"
>>   #include "sysemu/cpus.h"
>>   #include "qemu/lockable.h"
>> +#include "sysemu/dirtylimit.h"
>> +#include "sysemu/cpu-throttle.h"
>> +#include "sysemu/kvm.h"
>> +#include "qapi/error.h"
>> +#include "qapi/qapi-commands-migration.h"
>>   
>>   static QemuMutex qemu_cpu_list_lock;
>>   static QemuCond exclusive_cond;
>> @@ -352,3 +357,46 @@ void process_queued_cpu_work(CPUState *cpu)
>>       qemu_mutex_unlock(&cpu->work_mutex);
>>       qemu_cond_broadcast(&qemu_work_cond);
>>   }
>> +
>> +void qmp_set_dirty_limit(int64_t idx,
>> +                         uint64_t dirtyrate,
>> +                         Error **errp)
>> +{
>> +    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
>> +        error_setg(errp, "setting a dirty page limit requires KVM with"
>> +                   " accelerator property 'dirty-ring-size' set'");
>> +        return;
>> +    }
>> +
>> +    dirtylimit_calc();
>> +    dirtylimit_vcpu(idx, dirtyrate);
>> +}
>> +
>> +void qmp_cancel_dirty_limit(int64_t idx,
>> +                            Error **errp)
>> +{
>> +    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
>> +        error_setg(errp, "KVM with accelerator property 'dirty-ring-size'"
>> +                   " not set, abort canceling a dirty page limit");
>> +        return;
>> +    }
> 
> Is this check actually needed?  It's not when !dirtylimit_enabled(idx).
The logic never go there if user follows the enable/disable step, but 
just in case one call the qmp "cancel-dirty-limit" only.
Anyway, the set/cancel has been merged into one in latest version. And 
we do check before doing anything in both case.
> 
>> +
>> +    if (!dirtylimit_enabled(idx)) {
>> +        error_setg(errp, "dirty page limit for the CPU %ld not set", idx);
> 
> "for CPU"
> 
>> +        return;
>> +    }
>> +
>> +    if (unlikely(!dirtylimit_cancel_vcpu(idx))) {
> 
> I don't think unlikely() matters here.
Ok, i dropped the compilation syntax in latest version.
> 
>> +        dirtylimit_calc_quit();
>> +    }
>> +}
>> +
>> +void dirtylimit_setup(int max_cpus)
>> +{
>> +    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
>> +        return;
>> +    }
>> +
>> +    dirtylimit_calc_state_init(max_cpus);
>> +    dirtylimit_state_init(max_cpus);
>> +}
>> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
>> index e948e81..11df012 100644
>> --- a/include/hw/core/cpu.h
>> +++ b/include/hw/core/cpu.h
>> @@ -881,6 +881,15 @@ void end_exclusive(void);
>>    */
>>   void qemu_init_vcpu(CPUState *cpu);
>>   
>> +/**
>> + * dirtylimit_setup:
>> + *
>> + * Initializes the global state of dirtylimit calculation and
>> + * dirtylimit itself. This is prepared for vCPU dirtylimit which
>> + * could be triggered during vm lifecycle.
>> + */
>> +void dirtylimit_setup(int max_cpus);
>> +
>>   #define SSTEP_ENABLE  0x1  /* Enable simulated HW single stepping */
>>   #define SSTEP_NOIRQ   0x2  /* Do not use IRQ while single stepping */
>>   #define SSTEP_NOTIMER 0x4  /* Do not Timers while single stepping */
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index bbfd48c..57c9a63 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -1850,6 +1850,49 @@
>>   { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
>>   
>>   ##
>> +# @set-dirty-limit:
>> +#
>> +# Set the upper limit of dirty page rate for a virtual CPU.
>> +#
>> +# Requires KVM with accelerator property "dirty-ring-size" set.
>> +# A virtual CPU's dirty page rate is a measure of its memory load.
>> +# To observe dirty page rates, use @calc-dirty-rate.
>> +#
>> +# @cpu-index: index of the virtual CPU.
>> +#
>> +# @dirty-rate: upper limit for the specified vCPU's dirty page rate (MB/s)
>> +#
>> +# Since: 7.0
>> +#
>> +# Example:
>> +#   {"execute": "set-dirty-limit"}
>> +#    "arguments": { "cpu-index": 0,
>> +#                   "dirty-rate": 200 } }
>> +#
>> +##
>> +{ 'command': 'set-dirty-limit',
>> +  'data': { 'cpu-index': 'int', 'dirty-rate': 'uint64' } }
>> +
>> +##
>> +# @cancel-dirty-limit:
>> +#
>> +# Cancel the dirty page limit for the vCPU which has been set with
>> +# set-dirty-limit command. Note that this command requires support from
>> +# dirty ring, same as the "set-dirty-limit" command.
>> +#
>> +# @cpu-index: index of the virtual CPU to cancel the dirty page limit
> 
> I'd go with
> 
>     # @cpu-index: index of the virtual CPU.
Ok.
> 
>> +#
>> +# Since: 7.0
>> +#
>> +# Example:
>> +#   {"execute": "cancel-dirty-limit"}
>> +#    "arguments": { "cpu-index": 0 } }
>> +#
>> +##
>> +{ 'command': 'cancel-dirty-limit',
>> +  'data': { 'cpu-index': 'int' } }
>> +
>> +##
>>   # @snapshot-save:
>>   #
>>   # Save a VM snapshot
>> diff --git a/softmmu/vl.c b/softmmu/vl.c
>> index 620a1f1..0f83ce3 100644
>> --- a/softmmu/vl.c
>> +++ b/softmmu/vl.c
>> @@ -3777,5 +3777,6 @@ void qemu_init(int argc, char **argv, char **envp)
>>       qemu_init_displays();
>>       accel_setup_post(current_machine);
>>       os_setup_post();
>> +    dirtylimit_setup(current_machine->smp.max_cpus);
>>       resume_mux_open();
>>   }
> 
> QAPI schema:
> Acked-by: Markus Armbruster <armbru@redhat.com>
Thanks Markus very much for reviewing the code, :)
> 

-- 
Best regard

Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v7 0/3] support dirty restraint on vCPU
@ 2021-11-29 16:17 huangy81
  0 siblings, 0 replies; 14+ messages in thread
From: huangy81 @ 2021-11-29 16:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Hyman, David Hildenbrand, Richard Henderson,
	Markus ArmBruster, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

The patch [2/3] has not been touched so far. Any corrections and
suggetions are welcome. 

Please review, thanks!

v7:
- rebase on master
- polish the comments and error message according to the
  advices given by Markus
- introduce dirtylimit_enabled function to pre-check if dirty
  page limit is enabled before canceling.

v6:
- rebase on master
- fix dirtylimit setup crash found by Markus
- polish the comments according to the advice given by Markus
- adjust the qemu qmp command tag to 7.0

v5:
- rebase on master
- adjust the throttle algorithm by removing the tuning in 
  RESTRAINT_RATIO case so that dirty page rate could reachs the quota
  more quickly.
- fix percentage update in throttle iteration.

v4:
- rebase on master
- modify the following points according to the advice given by Markus
  1. move the defination into migration.json
  2. polish the comments of set-dirty-limit
  3. do the syntax check and change dirty rate to dirty page rate

Thanks for the carefule reviews made by Markus.

Please review, thanks!

v3:
- rebase on master
- modify the following points according to the advice given by Markus
  1. remove the DirtyRateQuotaVcpu and use its field as option directly
  2. add comments to show details of what dirtylimit setup do
  3. explain how to use dirtylimit in combination with existing qmp
     commands "calc-dirty-rate" and "query-dirty-rate" in documentation.

Thanks for the carefule reviews made by Markus.

Please review, thanks!

Hyman

v2:
- rebase on master
- modify the following points according to the advices given by Juan
  1. rename dirtyrestraint to dirtylimit
  2. implement the full lifecyle function of dirtylimit_calc, include
     dirtylimit_calc and dirtylimit_calc_quit
  3. introduce 'quit' field in dirtylimit_calc_state to implement the
     dirtylimit_calc_quit
  4. remove the ready_cond and ready_mtx since it may not be suitable
  5. put the 'record_dirtypage' function code at the beggining of the
     file
  6. remove the unnecesary return;
- other modifications has been made after code review
  1. introduce 'bmap' and 'nr' field in dirtylimit_state to record the
     number of running thread forked by dirtylimit
  2. stop the dirtyrate calculation thread if all the dirtylimit thread
     are stopped
  3. do some renaming works
     dirtyrate calulation thread -> dirtylimit-calc
     dirtylimit thread -> dirtylimit-{cpu_index}
     function name do_dirtyrestraint -> dirtylimit_check
     qmp command dirty-restraint -> set-drity-limit
     qmp command dirty-restraint-cancel -> cancel-dirty-limit
     header file dirtyrestraint.h -> dirtylimit.h

Please review, thanks !

thanks for the accurate and timely advices given by Juan. we really
appreciate it if corrections and suggetions about this patchset are
proposed.

Best Regards !

Hyman

v1:
this patchset introduce a mechanism to impose dirty restraint
on vCPU, aiming to keep the vCPU running in a certain dirtyrate
given by user. dirty restraint on vCPU maybe an alternative
method to implement convergence logic for live migration,
which could improve guest memory performance during migration
compared with traditional method in theory.

For the current live migration implementation, the convergence
logic throttles all vCPUs of the VM, which has some side effects.
-'read processes' on vCPU will be unnecessarily penalized
- throttle increase percentage step by step, which seems
  struggling to find the optimal throttle percentage when
  dirtyrate is high.
- hard to predict the remaining time of migration if the
  throttling percentage reachs 99%

to a certain extent, the dirty restraint machnism can fix these
effects by throttling at vCPU granularity during migration.

the implementation is rather straightforward, we calculate
vCPU dirtyrate via the Dirty Ring mechanism periodically
as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
does, for vCPU that be specified to impose dirty restraint,
we throttle it periodically as the auto-converge does, once after
throttling, we compare the quota dirtyrate with current dirtyrate,
if current dirtyrate is not under the quota, increase the throttling
percentage until current dirtyrate is under the quota.

this patchset is the basis of implmenting a new auto-converge method
for live migration, we introduce two qmp commands for impose/cancel
the dirty restraint on specified vCPU, so it also can be an independent
api to supply the upper app such as libvirt, which can use it to
implement the convergence logic during live migration, supplemented
with the qmp 'calc-dirty-rate' command or whatever.

we post this patchset for RFC and any corrections and suggetions about
the implementation, api, throttleing algorithm or whatever are very
appreciated!

Please review, thanks !

Best Regards !

Hyman Huang (3):
  migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  cpu-throttle: implement vCPU throttle
  cpus-common: implement dirty page limit on vCPU

 cpus-common.c                 |  48 +++++++
 include/exec/memory.h         |   5 +-
 include/hw/core/cpu.h         |   9 ++
 include/sysemu/cpu-throttle.h |  30 ++++
 include/sysemu/dirtylimit.h   |  44 ++++++
 migration/dirtyrate.c         | 139 +++++++++++++++++--
 migration/dirtyrate.h         |   2 +
 qapi/migration.json           |  43 ++++++
 softmmu/cpu-throttle.c        | 316 ++++++++++++++++++++++++++++++++++++++++++
 softmmu/trace-events          |   5 +
 softmmu/vl.c                  |   1 +
 11 files changed, 631 insertions(+), 11 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-12-03  1:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-30 10:28 [PATCH v7 0/3] support dirty restraint on vCPU huangy81
     [not found] ` <cover.1638267948.git.huangy81@chinatelecom.cn>
2021-11-30 10:28   ` [PATCH v7 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
2021-11-30 13:04     ` Peter Xu
2021-11-30 15:10       ` Hyman Huang
2021-11-30 10:28   ` [PATCH v7 2/3] cpu-throttle: implement vCPU throttle huangy81
2021-11-30 10:28   ` [PATCH v7 3/3] cpus-common: implement dirty page limit on vCPU huangy81
2021-11-30 13:21     ` Peter Xu
2021-11-30 15:25       ` Hyman Huang
2021-12-02 16:02     ` Markus Armbruster
2021-12-03  1:19       ` Hyman Huang
2021-11-30 12:57 ` [PATCH v7 0/3] support dirty restraint " Peter Xu
2021-11-30 14:57   ` Hyman Huang
2021-11-30 16:04     ` Hyman Huang
  -- strict thread matches above, loose matches on Subject: below --
2021-11-29 16:17 huangy81

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.