All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 0/3] support dirty restraint on vCPU
@ 2021-12-03  1:39 huangy81
  2021-12-03  1:39 ` [PATCH v9 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: huangy81 @ 2021-12-03  1:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Hyman, Juan Quintela, Richard Henderson,
	Markus ArmBruster, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

v9:
- rebase on master
- fix the meson directory change, keep it untouched.

v8:
- rebase on master
- polish the error message and remove the "unlikely" compilation syntax
  according to the advice given by Markus.
- keep the dirty tracking enabled during "dirtylimit-calc" lifecycle
  so that the overhead can be reduced according to the advice given by
  Peter. 
- merge the "set/cancel" qmp commands into one named "vcpu-dirty-limit"
  and introduce qmp command "query-vcpu-dirty-limit" to query dirty
  limit information about virtual CPU, according to the advice given by
  Peter.
- check if vcpu index is valid and handle the unplug case before
  enabling, disabling dirty limit for virtual CPU.
- introduce hmp commands so developers can play with them easier, use
  "vcpu_dirty_limit" to enable dirty limit and "info vcpu_dirty_limit"
  to query.

The patch [2/3] has not been touched so far. Any corrections and
suggetions are welcome. 

Please review, thanks!

v7:
- rebase on master
- polish the comments and error message according to the
  advices given by Markus
- introduce dirtylimit_enabled function to pre-check if dirty
  page limit is enabled before canceling.

v6:
- rebase on master
- fix dirtylimit setup crash found by Markus
- polish the comments according to the advice given by Markus
- adjust the qemu qmp command tag to 7.0

v5:
- rebase on master
- adjust the throttle algorithm by removing the tuning in 
  RESTRAINT_RATIO case so that dirty page rate could reachs the quota
  more quickly.
- fix percentage update in throttle iteration.

v4:
- rebase on master
- modify the following points according to the advice given by Markus
  1. move the defination into migration.json
  2. polish the comments of set-dirty-limit
  3. do the syntax check and change dirty rate to dirty page rate

Thanks for the carefule reviews made by Markus.

Please review, thanks!

v3:
- rebase on master
- modify the following points according to the advice given by Markus
  1. remove the DirtyRateQuotaVcpu and use its field as option directly
  2. add comments to show details of what dirtylimit setup do
  3. explain how to use dirtylimit in combination with existing qmp
     commands "calc-dirty-rate" and "query-dirty-rate" in documentation.

Thanks for the carefule reviews made by Markus.

Please review, thanks!

Hyman

v2:
- rebase on master
- modify the following points according to the advices given by Juan
  1. rename dirtyrestraint to dirtylimit
  2. implement the full lifecyle function of dirtylimit_calc, include
     dirtylimit_calc and dirtylimit_calc_quit
  3. introduce 'quit' field in dirtylimit_calc_state to implement the
     dirtylimit_calc_quit
  4. remove the ready_cond and ready_mtx since it may not be suitable
  5. put the 'record_dirtypage' function code at the beggining of the
     file
  6. remove the unnecesary return;
- other modifications has been made after code review
  1. introduce 'bmap' and 'nr' field in dirtylimit_state to record the
     number of running thread forked by dirtylimit
  2. stop the dirtyrate calculation thread if all the dirtylimit thread
     are stopped
  3. do some renaming works
     dirtyrate calulation thread -> dirtylimit-calc
     dirtylimit thread -> dirtylimit-{cpu_index}
     function name do_dirtyrestraint -> dirtylimit_check
     qmp command dirty-restraint -> set-drity-limit
     qmp command dirty-restraint-cancel -> cancel-dirty-limit
     header file dirtyrestraint.h -> dirtylimit.h

Please review, thanks !

thanks for the accurate and timely advices given by Juan. we really
appreciate it if corrections and suggetions about this patchset are
proposed.

Best Regards !

Hyman

v1:
this patchset introduce a mechanism to impose dirty restraint
on vCPU, aiming to keep the vCPU running in a certain dirtyrate
given by user. dirty restraint on vCPU maybe an alternative
method to implement convergence logic for live migration,
which could improve guest memory performance during migration
compared with traditional method in theory.

For the current live migration implementation, the convergence
logic throttles all vCPUs of the VM, which has some side effects.
-'read processes' on vCPU will be unnecessarily penalized
- throttle increase percentage step by step, which seems
  struggling to find the optimal throttle percentage when
  dirtyrate is high.
- hard to predict the remaining time of migration if the
  throttling percentage reachs 99%

to a certain extent, the dirty restraint machnism can fix these
effects by throttling at vCPU granularity during migration.

the implementation is rather straightforward, we calculate
vCPU dirtyrate via the Dirty Ring mechanism periodically
as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
does, for vCPU that be specified to impose dirty restraint,
we throttle it periodically as the auto-converge does, once after
throttling, we compare the quota dirtyrate with current dirtyrate,
if current dirtyrate is not under the quota, increase the throttling
percentage until current dirtyrate is under the quota.

this patchset is the basis of implmenting a new auto-converge method
for live migration, we introduce two qmp commands for impose/cancel
the dirty restraint on specified vCPU, so it also can be an independent
api to supply the upper app such as libvirt, which can use it to
implement the convergence logic during live migration, supplemented
with the qmp 'calc-dirty-rate' command or whatever.

we post this patchset for RFC and any corrections and suggetions about
the implementation, api, throttleing algorithm or whatever are very
appreciated!

Please review, thanks !

Best Regards !

Hyman Huang (3):
  migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  cpu-throttle: implement vCPU throttle
  cpus-common: implement dirty page limit on vCPU

 cpus-common.c                 | 149 ++++++++++++++++++
 hmp-commands-info.hx          |  13 ++
 hmp-commands.hx               |  16 ++
 include/exec/memory.h         |   5 +-
 include/hw/core/cpu.h         |   9 ++
 include/monitor/hmp.h         |   2 +
 include/sysemu/cpu-throttle.h |  45 ++++++
 include/sysemu/dirtylimit.h   |  44 ++++++
 migration/dirtyrate.c         | 142 +++++++++++++++--
 migration/dirtyrate.h         |   2 +
 qapi/migration.json           |  70 +++++++++
 softmmu/cpu-throttle.c        | 355 ++++++++++++++++++++++++++++++++++++++++++
 softmmu/trace-events          |   5 +
 softmmu/vl.c                  |   1 +
 14 files changed, 847 insertions(+), 11 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v9 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  2021-12-03  1:39 [PATCH v9 0/3] support dirty restraint on vCPU huangy81
@ 2021-12-03  1:39 ` huangy81
  2021-12-06 10:18   ` Peter Xu
  2021-12-03  1:39 ` [PATCH v9 2/3] cpu-throttle: implement vCPU throttle huangy81
  2021-12-03  1:39 ` [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU huangy81
  2 siblings, 1 reply; 21+ messages in thread
From: huangy81 @ 2021-12-03  1:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Hyman, Juan Quintela, Richard Henderson,
	Markus ArmBruster, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty restraint.

Implement thread for calculate dirtyrate periodly, which will
be used for dirty restraint.

Add dirtylimit.h to introduce the util function for dirty
limit implementation.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
 include/exec/memory.h       |   5 +-
 include/sysemu/dirtylimit.h |  44 ++++++++++++++
 migration/dirtyrate.c       | 142 ++++++++++++++++++++++++++++++++++++++++----
 migration/dirtyrate.h       |   2 +
 4 files changed, 182 insertions(+), 11 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 20f1b27..606bec8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 /* Dirty tracking enabled because measuring dirty rate */
 #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
 
-#define GLOBAL_DIRTY_MASK  (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT      (1U << 2)
+
+#define GLOBAL_DIRTY_MASK  (0x7)
 
 extern unsigned int global_dirty_tracking;
 
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 0000000..49298a2
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,44 @@
+/*
+ * dirty limit helper functions
+ *
+ * Copyright (c) 2021 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_PERIOD_TIME_S   15      /* 15s */
+
+/**
+ * dirtylimit_calc_current:
+ *
+ * get current dirty page rate for specified vCPU.
+ */
+int64_t dirtylimit_calc_current(int cpu_index);
+
+/**
+ * dirtylimit_calc:
+ *
+ * start dirty page rate calculation thread.
+ */
+void dirtylimit_calc(void);
+
+/**
+ * dirtylimit_calc_quit:
+ *
+ * quit dirty page rate calculation thread.
+ */
+void dirtylimit_calc_quit(void);
+
+/**
+ * dirtylimit_calc_state_init:
+ *
+ * initialize dirty page rate calculation state.
+ */
+void dirtylimit_calc_state_init(int max_cpus);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index d65e744..5b969be 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -27,6 +27,7 @@
 #include "qapi/qmp/qdict.h"
 #include "sysemu/kvm.h"
 #include "sysemu/runstate.h"
+#include "sysemu/dirtylimit.h"
 #include "exec/memory.h"
 
 /*
@@ -46,6 +47,137 @@ static struct DirtyRateStat DirtyStat;
 static DirtyRateMeasureMode dirtyrate_mode =
                 DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
+#define DIRTYLIMIT_CALC_TIME_MS         1000    /* 1000ms */
+
+struct {
+    DirtyRatesData data;
+    int64_t period;
+    bool quit;
+} *dirtylimit_calc_state;
+
+static void dirtylimit_global_dirty_log_start(void)
+{
+    qemu_mutex_lock_iothread();
+    memory_global_dirty_log_start(GLOBAL_DIRTY_LIMIT);
+    qemu_mutex_unlock_iothread();
+}
+
+static void dirtylimit_global_dirty_log_stop(void)
+{
+    qemu_mutex_lock_iothread();
+    memory_global_dirty_log_stop(GLOBAL_DIRTY_LIMIT);
+    qemu_mutex_unlock_iothread();
+}
+
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+                                     CPUState *cpu, bool start)
+{
+    if (start) {
+        dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+    } else {
+        dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+    }
+}
+
+static void dirtylimit_calc_func(void)
+{
+    CPUState *cpu;
+    DirtyPageRecord *dirty_pages;
+    int64_t start_time, end_time, calc_time;
+    DirtyRateVcpu rate;
+    int i = 0;
+
+    dirty_pages = g_malloc0(sizeof(*dirty_pages) *
+        dirtylimit_calc_state->data.nvcpu);
+
+    CPU_FOREACH(cpu) {
+        record_dirtypages(dirty_pages, cpu, true);
+    }
+
+    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000);
+    end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    calc_time = end_time - start_time;
+
+    qemu_mutex_lock_iothread();
+    memory_global_dirty_log_sync();
+    qemu_mutex_unlock_iothread();
+
+    CPU_FOREACH(cpu) {
+        record_dirtypages(dirty_pages, cpu, false);
+    }
+
+    for (i = 0; i < dirtylimit_calc_state->data.nvcpu; i++) {
+        uint64_t increased_dirty_pages =
+            dirty_pages[i].end_pages - dirty_pages[i].start_pages;
+        uint64_t memory_size_MB =
+            (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+        int64_t dirtyrate = (memory_size_MB * 1000) / calc_time;
+
+        rate.id = i;
+        rate.dirty_rate  = dirtyrate;
+        dirtylimit_calc_state->data.rates[i] = rate;
+
+        trace_dirtyrate_do_calculate_vcpu(i,
+            dirtylimit_calc_state->data.rates[i].dirty_rate);
+    }
+}
+
+static void *dirtylimit_calc_thread(void *opaque)
+{
+    rcu_register_thread();
+
+    dirtylimit_global_dirty_log_start();
+
+    while (!qatomic_read(&dirtylimit_calc_state->quit)) {
+        dirtylimit_calc_func();
+        sleep(dirtylimit_calc_state->period);
+    }
+
+    dirtylimit_global_dirty_log_stop();
+
+    rcu_unregister_thread();
+    return NULL;
+}
+
+int64_t dirtylimit_calc_current(int cpu_index)
+{
+    DirtyRateVcpu *rates = dirtylimit_calc_state->data.rates;
+
+    return qatomic_read(&rates[cpu_index].dirty_rate);
+}
+
+void dirtylimit_calc(void)
+{
+    if (unlikely(qatomic_read(&dirtylimit_calc_state->quit))) {
+        qatomic_set(&dirtylimit_calc_state->quit, 0);
+        QemuThread thread;
+        qemu_thread_create(&thread, "dirtylimit-calc",
+            dirtylimit_calc_thread,
+            NULL, QEMU_THREAD_DETACHED);
+    }
+}
+
+void dirtylimit_calc_quit(void)
+{
+    qatomic_set(&dirtylimit_calc_state->quit, 1);
+}
+
+void dirtylimit_calc_state_init(int max_cpus)
+{
+    dirtylimit_calc_state =
+        g_malloc0(sizeof(*dirtylimit_calc_state));
+
+    dirtylimit_calc_state->data.nvcpu = max_cpus;
+    dirtylimit_calc_state->data.rates =
+        g_malloc0(sizeof(DirtyRateVcpu) * max_cpus);
+
+    dirtylimit_calc_state->period =
+        DIRTYLIMIT_CALC_PERIOD_TIME_S;
+
+    dirtylimit_calc_state->quit = true;
+}
+
 static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
 {
     int64_t current_time;
@@ -396,16 +528,6 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
     return true;
 }
 
-static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
-                                     CPUState *cpu, bool start)
-{
-    if (start) {
-        dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
-    } else {
-        dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
-    }
-}
-
 static void dirtyrate_global_dirty_log_start(void)
 {
     qemu_mutex_lock_iothread();
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index 69d4c5b..e96acdc 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -70,6 +70,8 @@ typedef struct VcpuStat {
     DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
 } VcpuStat;
 
+typedef struct VcpuStat DirtyRatesData;
+
 /*
  * Store calculation statistics for each measure.
  */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v9 2/3] cpu-throttle: implement vCPU throttle
  2021-12-03  1:39 [PATCH v9 0/3] support dirty restraint on vCPU huangy81
  2021-12-03  1:39 ` [PATCH v9 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
@ 2021-12-03  1:39 ` huangy81
  2021-12-06 10:10   ` Peter Xu
  2021-12-03  1:39 ` [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU huangy81
  2 siblings, 1 reply; 21+ messages in thread
From: huangy81 @ 2021-12-03  1:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Hyman, Juan Quintela, Richard Henderson,
	Markus ArmBruster, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

Impose dirty restraint on vCPU by kicking it and sleep
as the auto-converge does during migration, but just
kick the specified vCPU instead, not all vCPUs of vm.

Start a thread to track the dirtylimit status and adjust
the throttle pencentage dynamically depend on current
and quota dirtyrate.

Introduce the util function in the header for dirtylimit
implementation.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
 include/sysemu/cpu-throttle.h |  45 ++++++
 qapi/migration.json           |  22 +++
 softmmu/cpu-throttle.c        | 355 ++++++++++++++++++++++++++++++++++++++++++
 softmmu/trace-events          |   5 +
 4 files changed, 427 insertions(+)

diff --git a/include/sysemu/cpu-throttle.h b/include/sysemu/cpu-throttle.h
index d65bdef..962990b 100644
--- a/include/sysemu/cpu-throttle.h
+++ b/include/sysemu/cpu-throttle.h
@@ -65,4 +65,49 @@ bool cpu_throttle_active(void);
  */
 int cpu_throttle_get_percentage(void);
 
+/**
+ * dirtylimit_enabled
+ *
+ * Returns: %true if dirty page limit for vCPU is enabled, %false otherwise.
+ */
+bool dirtylimit_enabled(int cpu_index);
+
+/**
+ * dirtylimit_is_vcpu_index_valid
+ *
+ * Returns: %true if cpu index valid, %false otherwise.
+ */
+bool dirtylimit_is_vcpu_index_valid(int cpu_index);
+
+/**
+ * dirtylimit_state_init:
+ *
+ * initialize golobal state for dirtylimit
+ */
+void dirtylimit_state_init(int max_cpus);
+
+/**
+ * dirtylimit_vcpu:
+ *
+ * impose dirtylimit on vcpu util reaching the quota dirtyrate
+ */
+void dirtylimit_vcpu(int cpu_index,
+                     uint64_t quota);
+
+/**
+ * dirtylimit_query_vcpu:
+ *
+ * Returns: dirty page limit information of specified virtual CPU.
+ */
+struct DirtyLimitInfo *dirtylimit_query_vcpu(int cpu_index);
+
+/**
+ * dirtylimit_cancel_vcpu:
+ *
+ * cancel dirtylimit for the specified vcpu
+ *
+ * Returns: the number of running threads for dirtylimit
+ */
+int dirtylimit_cancel_vcpu(int cpu_index);
+
 #endif /* SYSEMU_CPU_THROTTLE_H */
diff --git a/qapi/migration.json b/qapi/migration.json
index bbfd48c..3da8fdf 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1850,6 +1850,28 @@
 { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
 
 ##
+# @DirtyLimitInfo:
+#
+# Dirty page rate limit information of virtual CPU.
+#
+# @cpu-index: index of virtual CPU.
+#
+# @enable: true to enable, false to disable.
+#
+# @limit-rate: upper limit of dirty page rate for virtual CPU.
+#
+# @current-rate: current dirty page rate for virtual CPU.
+#
+# Since: 7.0
+#
+##
+{ 'struct': 'DirtyLimitInfo',
+  'data': { 'cpu-index': 'int',
+            'enable': 'bool',
+            'limit-rate': 'int64',
+            'current-rate': 'int64' } }
+
+##
 # @snapshot-save:
 #
 # Save a VM snapshot
diff --git a/softmmu/cpu-throttle.c b/softmmu/cpu-throttle.c
index 8c2144a..ca0f440 100644
--- a/softmmu/cpu-throttle.c
+++ b/softmmu/cpu-throttle.c
@@ -29,6 +29,9 @@
 #include "qemu/main-loop.h"
 #include "sysemu/cpus.h"
 #include "sysemu/cpu-throttle.h"
+#include "sysemu/dirtylimit.h"
+#include "qapi/qapi-commands-migration.h"
+#include "trace.h"
 
 /* vcpu throttling controls */
 static QEMUTimer *throttle_timer;
@@ -38,6 +41,358 @@ static unsigned int throttle_percentage;
 #define CPU_THROTTLE_PCT_MAX 99
 #define CPU_THROTTLE_TIMESLICE_NS 10000000
 
+#define DIRTYLIMIT_TOLERANCE_RANGE  15      /* 15MB/s */
+
+#define DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK     75
+#define DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK    90
+
+#define DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE     5
+#define DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE    2
+
+typedef enum {
+    RESTRAIN_KEEP,
+    RESTRAIN_RATIO,
+    RESTRAIN_HEAVY,
+    RESTRAIN_SLIGHT,
+} RestrainPolicy;
+
+typedef struct DirtyLimitState {
+    int cpu_index;
+    bool enabled;
+    uint64_t quota;     /* quota dirtyrate MB/s */
+    QemuThread thread;
+    char *name;         /* thread name */
+} DirtyLimitState;
+
+struct {
+    DirtyLimitState *states;
+    int max_cpus;
+    unsigned long *bmap; /* running thread bitmap */
+    unsigned long nr;
+} *dirtylimit_state;
+
+bool dirtylimit_enabled(int cpu_index)
+{
+    return qatomic_read(&dirtylimit_state->states[cpu_index].enabled);
+}
+
+static bool dirtylimit_is_vcpu_unplug(int cpu_index)
+{
+    CPUState *cpu;
+    CPU_FOREACH(cpu) {
+        if (cpu->cpu_index == cpu_index) {
+            break;
+        }
+    }
+
+    return cpu->unplug;
+}
+
+bool dirtylimit_is_vcpu_index_valid(int cpu_index)
+{
+    if (cpu_index < 0 ||
+        cpu_index >= qatomic_read(&dirtylimit_state->max_cpus) ||
+        dirtylimit_is_vcpu_unplug(cpu_index)) {
+        return false;
+    }
+
+    return true;
+}
+
+static inline void dirtylimit_set_quota(int cpu_index, uint64_t quota)
+{
+    qatomic_set(&dirtylimit_state->states[cpu_index].quota, quota);
+}
+
+static inline uint64_t dirtylimit_quota(int cpu_index)
+{
+    return qatomic_read(&dirtylimit_state->states[cpu_index].quota);
+}
+
+static int64_t dirtylimit_current(int cpu_index)
+{
+    return dirtylimit_calc_current(cpu_index);
+}
+
+static void dirtylimit_vcpu_thread(CPUState *cpu, run_on_cpu_data data)
+{
+    double pct;
+    double throttle_ratio;
+    int64_t sleeptime_ns, endtime_ns;
+    int *percentage = (int *)data.host_ptr;
+
+    pct = (double)(*percentage) / 100;
+    throttle_ratio = pct / (1 - pct);
+    /* Add 1ns to fix double's rounding error (like 0.9999999...) */
+    sleeptime_ns = (int64_t)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS + 1);
+    endtime_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
+    while (sleeptime_ns > 0 && !cpu->stop) {
+        if (sleeptime_ns > SCALE_MS) {
+            qemu_cond_timedwait_iothread(cpu->halt_cond,
+                                         sleeptime_ns / SCALE_MS);
+        } else {
+            qemu_mutex_unlock_iothread();
+            g_usleep(sleeptime_ns / SCALE_US);
+            qemu_mutex_lock_iothread();
+        }
+        sleeptime_ns = endtime_ns - qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+    }
+    qatomic_set(&cpu->throttle_thread_scheduled, 0);
+
+    free(percentage);
+}
+
+static void dirtylimit_check(int cpu_index,
+                             int percentage)
+{
+    CPUState *cpu;
+    int64_t sleeptime_ns, starttime_ms, currenttime_ms;
+    int *pct_parameter;
+    double pct;
+
+    pct = (double) percentage / 100;
+
+    starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+    while (true) {
+        CPU_FOREACH(cpu) {
+            if ((cpu_index == cpu->cpu_index) &&
+                (!qatomic_xchg(&cpu->throttle_thread_scheduled, 1))) {
+                pct_parameter = malloc(sizeof(*pct_parameter));
+                *pct_parameter = percentage;
+                async_run_on_cpu(cpu, dirtylimit_vcpu_thread,
+                                 RUN_ON_CPU_HOST_PTR(pct_parameter));
+                break;
+            }
+        }
+
+        sleeptime_ns = CPU_THROTTLE_TIMESLICE_NS / (1 - pct);
+        g_usleep(sleeptime_ns / SCALE_US);
+
+        currenttime_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+        if (unlikely((currenttime_ms - starttime_ms) >
+                     (DIRTYLIMIT_CALC_PERIOD_TIME_S * 1000))) {
+            break;
+        }
+    }
+}
+
+static uint64_t dirtylimit_init_pct(uint64_t quota,
+                                    uint64_t current)
+{
+    uint64_t limit_pct = 0;
+
+    if (quota >= current || (current == 0) ||
+        ((current - quota) <= DIRTYLIMIT_TOLERANCE_RANGE)) {
+        limit_pct = 0;
+    } else {
+        limit_pct = (current - quota) * 100 / current;
+
+        limit_pct = MIN(limit_pct,
+            DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK);
+    }
+
+    return limit_pct;
+}
+
+static RestrainPolicy dirtylimit_policy(unsigned int last_pct,
+                                        uint64_t quota,
+                                        uint64_t current)
+{
+    uint64_t max, min;
+
+    max = MAX(quota, current);
+    min = MIN(quota, current);
+    if ((max - min) <= DIRTYLIMIT_TOLERANCE_RANGE) {
+        return RESTRAIN_KEEP;
+    }
+    if (last_pct < DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK) {
+        /* last percentage locates in [0, 75)*/
+        return RESTRAIN_RATIO;
+    } else if (last_pct < DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK) {
+        /* last percentage locates in [75, 90)*/
+        return RESTRAIN_HEAVY;
+    } else {
+        /* last percentage locates in [90, 99]*/
+        return RESTRAIN_SLIGHT;
+    }
+}
+
+static uint64_t dirtylimit_pct(unsigned int last_pct,
+                               uint64_t quota,
+                               uint64_t current)
+{
+    uint64_t limit_pct = 0;
+    RestrainPolicy policy;
+    bool mitigate = (quota > current) ? true : false;
+
+    if (mitigate && ((current == 0) ||
+        (last_pct <= DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE))) {
+        return 0;
+    }
+
+    policy = dirtylimit_policy(last_pct, quota, current);
+    switch (policy) {
+    case RESTRAIN_SLIGHT:
+        /* [90, 99] */
+        if (mitigate) {
+            limit_pct =
+                last_pct - DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE;
+        } else {
+            limit_pct =
+                last_pct + DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE;
+
+            limit_pct = MIN(limit_pct, CPU_THROTTLE_PCT_MAX);
+        }
+       break;
+    case RESTRAIN_HEAVY:
+        /* [75, 90) */
+        if (mitigate) {
+            limit_pct =
+                last_pct - DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE;
+        } else {
+            limit_pct =
+                last_pct + DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE;
+
+            limit_pct = MIN(limit_pct,
+                DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK);
+        }
+       break;
+    case RESTRAIN_RATIO:
+        /* [0, 75) */
+        if (mitigate) {
+            if (last_pct <= (((quota - current) * 100 / quota))) {
+                limit_pct = 0;
+            } else {
+                limit_pct = last_pct -
+                    ((quota - current) * 100 / quota);
+                limit_pct = MAX(limit_pct, CPU_THROTTLE_PCT_MIN);
+            }
+        } else {
+            limit_pct = last_pct +
+                ((current - quota) * 100 / current);
+
+            limit_pct = MIN(limit_pct,
+                DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK);
+        }
+       break;
+    case RESTRAIN_KEEP:
+    default:
+       limit_pct = last_pct;
+       break;
+    }
+
+    return limit_pct;
+}
+
+static void *dirtylimit_thread(void *opaque)
+{
+    int cpu_index = *(int *)opaque;
+    uint64_t quota_dirtyrate, current_dirtyrate;
+    unsigned int last_pct = 0;
+    unsigned int pct = 0;
+
+    rcu_register_thread();
+
+    quota_dirtyrate = dirtylimit_quota(cpu_index);
+    current_dirtyrate = dirtylimit_current(cpu_index);
+
+    pct = dirtylimit_init_pct(quota_dirtyrate, current_dirtyrate);
+
+    do {
+        trace_dirtylimit_impose(cpu_index,
+            quota_dirtyrate, current_dirtyrate, pct);
+
+        last_pct = pct;
+        if (pct == 0) {
+            sleep(DIRTYLIMIT_CALC_PERIOD_TIME_S);
+        } else {
+            dirtylimit_check(cpu_index, pct);
+        }
+
+        quota_dirtyrate = dirtylimit_quota(cpu_index);
+        current_dirtyrate = dirtylimit_current(cpu_index);
+
+        pct = dirtylimit_pct(last_pct, quota_dirtyrate, current_dirtyrate);
+    } while (dirtylimit_enabled(cpu_index));
+
+    rcu_unregister_thread();
+
+    return NULL;
+}
+
+int dirtylimit_cancel_vcpu(int cpu_index)
+{
+    int i;
+    int nr_threads = 0;
+
+    qatomic_set(&dirtylimit_state->states[cpu_index].enabled, 0);
+    dirtylimit_set_quota(cpu_index, 0);
+
+    bitmap_test_and_clear_atomic(dirtylimit_state->bmap, cpu_index, 1);
+
+    for (i = 0; i < dirtylimit_state->nr; i++) {
+        unsigned long temp = dirtylimit_state->bmap[i];
+        nr_threads += ctpopl(temp);
+    }
+
+   return nr_threads;
+}
+
+void dirtylimit_vcpu(int cpu_index,
+                     uint64_t quota)
+{
+    trace_dirtylimit_vcpu(cpu_index, quota);
+
+    dirtylimit_set_quota(cpu_index, quota);
+
+    if (unlikely(!dirtylimit_enabled(cpu_index))) {
+        qatomic_set(&dirtylimit_state->states[cpu_index].enabled, 1);
+        dirtylimit_state->states[cpu_index].name =
+            g_strdup_printf("dirtylimit-%d", cpu_index);
+        qemu_thread_create(&dirtylimit_state->states[cpu_index].thread,
+            dirtylimit_state->states[cpu_index].name,
+            dirtylimit_thread,
+            (void *)&dirtylimit_state->states[cpu_index].cpu_index,
+            QEMU_THREAD_DETACHED);
+        bitmap_set_atomic(dirtylimit_state->bmap, cpu_index, 1);
+    }
+}
+
+struct DirtyLimitInfo *dirtylimit_query_vcpu(int cpu_index)
+{
+    DirtyLimitInfo *info = NULL;
+
+    info = g_malloc0(sizeof(*info));
+    info->cpu_index = cpu_index;
+    info->enable = dirtylimit_enabled(cpu_index);
+    info->limit_rate= dirtylimit_quota(cpu_index);;
+    info->current_rate = dirtylimit_current(cpu_index);
+
+    return info;
+}
+
+void dirtylimit_state_init(int max_cpus)
+{
+    int i;
+
+    dirtylimit_state = g_malloc0(sizeof(*dirtylimit_state));
+
+    dirtylimit_state->states =
+            g_malloc0(sizeof(DirtyLimitState) * max_cpus);
+
+    for (i = 0; i < max_cpus; i++) {
+        dirtylimit_state->states[i].cpu_index = i;
+    }
+
+    dirtylimit_state->max_cpus = max_cpus;
+    dirtylimit_state->bmap = bitmap_new(max_cpus);
+    bitmap_clear(dirtylimit_state->bmap, 0, max_cpus);
+    dirtylimit_state->nr = BITS_TO_LONGS(max_cpus);
+
+    trace_dirtylimit_state_init(max_cpus);
+}
+
 static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
 {
     double pct;
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 9c88887..a7c9c04 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -31,3 +31,8 @@ runstate_set(int current_state, const char *current_state_str, int new_state, co
 system_wakeup_request(int reason) "reason=%d"
 qemu_system_shutdown_request(int reason) "reason=%d"
 qemu_system_powerdown_request(void) ""
+
+#cpu-throttle.c
+dirtylimit_state_init(int max_cpus) "dirtylimit state init: max cpus %d"
+dirtylimit_impose(int cpu_index, uint64_t quota, uint64_t current, int pct) "CPU[%d] impose dirtylimit: quota %" PRIu64 ", current %" PRIu64 ", percentage %d"
+dirtylimit_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set quota dirtylimit %"PRIu64
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-03  1:39 [PATCH v9 0/3] support dirty restraint on vCPU huangy81
  2021-12-03  1:39 ` [PATCH v9 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
  2021-12-03  1:39 ` [PATCH v9 2/3] cpu-throttle: implement vCPU throttle huangy81
@ 2021-12-03  1:39 ` huangy81
  2021-12-03 12:34   ` Markus Armbruster
                     ` (2 more replies)
  2 siblings, 3 replies; 21+ messages in thread
From: huangy81 @ 2021-12-03  1:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Hyman, Juan Quintela, Richard Henderson,
	Markus ArmBruster, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

Implement dirtyrate calculation periodically basing on
dirty-ring and throttle vCPU until it reachs the quota
dirty page rate given by user.

Introduce qmp commands "vcpu-dirty-limit", "query-vcpu-dirty-limit"
to enable, disable, query dirty page limit for virtual CPU.

Meanwhile, introduce corresponding hmp commands "vcpu_dirty_limit",
"info vcpu_dirty_limit" so developers can play with them easier.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
 cpus-common.c         | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hmp-commands-info.hx  |  13 +++++
 hmp-commands.hx       |  16 ++++++
 include/hw/core/cpu.h |   9 +++
 include/monitor/hmp.h |   2 +
 qapi/migration.json   |  48 ++++++++++++++++
 softmmu/vl.c          |   1 +
 7 files changed, 238 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index 6e73d3e..04b9bc9 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -23,6 +23,14 @@
 #include "hw/core/cpu.h"
 #include "sysemu/cpus.h"
 #include "qemu/lockable.h"
+#include "sysemu/dirtylimit.h"
+#include "sysemu/cpu-throttle.h"
+#include "sysemu/kvm.h"
+#include "monitor/hmp.h"
+#include "monitor/monitor.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-migration.h"
 
 static QemuMutex qemu_cpu_list_lock;
 static QemuCond exclusive_cond;
@@ -352,3 +360,144 @@ void process_queued_cpu_work(CPUState *cpu)
     qemu_mutex_unlock(&cpu->work_mutex);
     qemu_cond_broadcast(&qemu_work_cond);
 }
+
+void qmp_vcpu_dirty_limit(int64_t cpu_index,
+                          bool enable,
+                          uint64_t dirty_rate,
+                          Error **errp)
+{
+    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
+        error_setg(errp, "dirty page limit feature requires KVM with"
+                   " accelerator property 'dirty-ring-size' set'");
+        return;
+    }
+
+    if (!dirtylimit_is_vcpu_index_valid(cpu_index)) {
+        error_setg(errp, "cpu index out of range");
+        return;
+    }
+
+    if (enable) {
+        dirtylimit_calc();
+        dirtylimit_vcpu(cpu_index, dirty_rate);
+    } else {
+        if (!dirtylimit_enabled(cpu_index)) {
+            error_setg(errp, "dirty page limit for CPU %ld not set",
+                       cpu_index);
+            return;
+        }
+
+        if (!dirtylimit_cancel_vcpu(cpu_index)) {
+            dirtylimit_calc_quit();
+        }
+    }
+}
+
+void hmp_vcpu_dirty_limit(Monitor *mon, const QDict *qdict)
+{
+    int64_t cpu_index = qdict_get_try_int(qdict, "cpu_index", -1);
+    int64_t dirty_rate = qdict_get_try_int(qdict, "dirty_rate", -1);
+    bool enable = qdict_get_bool(qdict, "enable");
+    Error *err = NULL;
+
+    if (enable && dirty_rate < 0) {
+        monitor_printf(mon, "Enabling dirty limit requires dirty_rate set!\n");
+        return;
+    }
+
+    if (!dirtylimit_is_vcpu_index_valid(cpu_index)) {
+        monitor_printf(mon, "Incorrect cpu index specified!\n");
+        return;
+    }
+
+    qmp_vcpu_dirty_limit(cpu_index, enable, dirty_rate, &err);
+
+    if (err) {
+        hmp_handle_error(mon, err);
+        return;
+    }
+
+    monitor_printf(mon, "Enabling dirty page rate limit with %"PRIi64
+                   " MB/s\n", dirty_rate);
+
+    monitor_printf(mon, "[Please use 'info vcpu_dirty_limit' to query "
+                   "dirty limit for virtual CPU]\n");
+}
+
+struct DirtyLimitInfoList *qmp_query_vcpu_dirty_limit(bool has_cpu_index,
+                                                      int64_t cpu_index,
+                                                      Error **errp)
+{
+    DirtyLimitInfo *info = NULL;
+    DirtyLimitInfoList *head = NULL, **tail = &head;
+
+    if (has_cpu_index &&
+        (!dirtylimit_is_vcpu_index_valid(cpu_index))) {
+        error_setg(errp, "cpu index out of range");
+        return NULL;
+    }
+
+    if (has_cpu_index) {
+        info = dirtylimit_query_vcpu(cpu_index);
+        QAPI_LIST_APPEND(tail, info);
+    } else {
+        CPUState *cpu;
+        CPU_FOREACH(cpu) {
+            if (!cpu->unplug) {
+                info = dirtylimit_query_vcpu(cpu->cpu_index);
+                QAPI_LIST_APPEND(tail, info);
+            }
+        }
+    }
+
+    return head;
+}
+
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict)
+{
+    DirtyLimitInfoList *limit, *head, *info = NULL;
+    int64_t cpu_index = qdict_get_try_int(qdict, "cpu_index", -1);
+    Error *err = NULL;
+
+    if (cpu_index >=0 &&
+        !dirtylimit_is_vcpu_index_valid(cpu_index)) {
+        monitor_printf(mon, "cpu index out of range\n");
+        return;
+    }
+
+    if (cpu_index < 0) {
+        info = qmp_query_vcpu_dirty_limit(false, -1, &err);
+    } else {
+        info = qmp_query_vcpu_dirty_limit(true, cpu_index, &err);
+    }
+
+    if (err) {
+        hmp_handle_error(mon, err);
+        return;
+    }
+
+    head = info;
+    for (limit = head; limit != NULL; limit = limit->next) {
+        monitor_printf(mon, "vcpu[%"PRIi64"], Enable: %s",
+                       limit->value->cpu_index,
+                       limit->value->enable ? "true" : "false");
+        if (limit->value->enable) {
+            monitor_printf(mon, ", limit rate %"PRIi64 " (MB/s),"
+                           " current rate %"PRIi64 " (MB/s)\n",
+                           limit->value->limit_rate,
+                           limit->value->current_rate);
+        } else {
+            monitor_printf(mon, "\n");
+        }
+    }
+}
+
+void dirtylimit_setup(int max_cpus)
+{
+    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
+        return;
+    }
+
+    dirtylimit_calc_state_init(max_cpus);
+    dirtylimit_state_init(max_cpus);
+}
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 407a1da..aff28d9 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -863,6 +863,19 @@ SRST
     Display the vcpu dirty rate information.
 ERST
 
+    {
+        .name       = "vcpu_dirty_limit",
+        .args_type  = "cpu_index:l?",
+        .params     = "cpu_index",
+        .help       = "show dirty page limit information",
+        .cmd        = hmp_info_vcpu_dirty_limit,
+    },
+
+SRST
+  ``info vcpu_dirty_limit``
+    Display the vcpu dirty page limit information.
+ERST
+
 #if defined(TARGET_I386)
     {
         .name       = "sgx",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 70a9136..1839fa4 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1744,3 +1744,19 @@ ERST
                       "\n\t\t\t -b to specify dirty bitmap as method of calculation)",
         .cmd        = hmp_calc_dirty_rate,
     },
+
+SRST
+``vcpu_dirty_limit``
+  Start dirty page rate limit on a virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+    {
+        .name       = "vcpu_dirty_limit",
+        .args_type  = "cpu_index:l,enable:b,dirty_rate:l?",
+        .params     = "cpu_index on|off [dirty_rate]",
+        .help       = "enable, disable dirty page rate limit on a virtual CPU"
+                      "(dirty_rate should be specified dirty_rate if enable)",
+        .cmd        = hmp_vcpu_dirty_limit,
+    },
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index e948e81..11df012 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -881,6 +881,15 @@ void end_exclusive(void);
  */
 void qemu_init_vcpu(CPUState *cpu);
 
+/**
+ * dirtylimit_setup:
+ *
+ * Initializes the global state of dirtylimit calculation and
+ * dirtylimit itself. This is prepared for vCPU dirtylimit which
+ * could be triggered during vm lifecycle.
+ */
+void dirtylimit_setup(int max_cpus);
+
 #define SSTEP_ENABLE  0x1  /* Enable simulated HW single stepping */
 #define SSTEP_NOIRQ   0x2  /* Do not use IRQ while single stepping */
 #define SSTEP_NOTIMER 0x4  /* Do not Timers while single stepping */
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 96d0148..04879a2 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -131,6 +131,8 @@ void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
 void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
 void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_human_readable_text_helper(Monitor *mon,
                                     HumanReadableText *(*qmp_handler)(Error **));
 
diff --git a/qapi/migration.json b/qapi/migration.json
index 3da8fdf..dc15b3f 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1872,6 +1872,54 @@
             'current-rate': 'int64' } }
 
 ##
+# @vcpu-dirty-limit:
+#
+# Set or cancel the upper limit of dirty page rate for a virtual CPU.
+#
+# Requires KVM with accelerator property "dirty-ring-size" set.
+# A virtual CPU's dirty page rate is a measure of its memory load.
+# To observe dirty page rates, use @calc-dirty-rate.
+#
+# @cpu-index: index of virtual CPU.
+#
+# @enable: true to enable, false to disable.
+#
+# @dirty-rate: upper limit of dirty page rate for virtual CPU.
+#
+# Since: 7.0
+#
+# Example:
+#   {"execute": "vcpu-dirty-limit"}
+#    "arguments": { "cpu-index": 0,
+#                   "enable": true,
+#                   "dirty-rate": 200 } }
+#
+##
+{ 'command': 'vcpu-dirty-limit',
+  'data': { 'cpu-index': 'int',
+            'enable': 'bool',
+            'dirty-rate': 'uint64'} }
+
+##
+# @query-vcpu-dirty-limit:
+#
+# Returns information about the virtual CPU dirty limit status.
+#
+# @cpu-index: index of the virtual CPU to query, if not specified, all
+#             virtual CPUs will be queried.
+#
+# Since: 7.0
+#
+# Example:
+#   {"execute": "query-vcpu-dirty-limit"}
+#    "arguments": { "cpu-index": 0 } }
+#
+##
+{ 'command': 'query-vcpu-dirty-limit',
+  'data': { '*cpu-index': 'int' },
+  'returns': [ 'DirtyLimitInfo' ] }
+
+##
 # @snapshot-save:
 #
 # Save a VM snapshot
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 620a1f1..0f83ce3 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -3777,5 +3777,6 @@ void qemu_init(int argc, char **argv, char **envp)
     qemu_init_displays();
     accel_setup_post(current_machine);
     os_setup_post();
+    dirtylimit_setup(current_machine->smp.max_cpus);
     resume_mux_open();
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-03  1:39 ` [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU huangy81
@ 2021-12-03 12:34   ` Markus Armbruster
  2021-12-04 12:00     ` Hyman Huang
  2021-12-06  8:36   ` Peter Xu
  2021-12-06  8:39   ` Peter Xu
  2 siblings, 1 reply; 21+ messages in thread
From: Markus Armbruster @ 2021-12-03 12:34 UTC (permalink / raw)
  To: huangy81
  Cc: David Hildenbrand, Juan Quintela, Richard Henderson, qemu-devel,
	Peter Xu, Dr. David Alan Gilbert, Paolo Bonzini,
	Philippe Mathieu-Daudé

huangy81@chinatelecom.cn writes:

> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>
> Implement dirtyrate calculation periodically basing on
> dirty-ring and throttle vCPU until it reachs the quota
> dirty page rate given by user.
>
> Introduce qmp commands "vcpu-dirty-limit", "query-vcpu-dirty-limit"
> to enable, disable, query dirty page limit for virtual CPU.
>
> Meanwhile, introduce corresponding hmp commands "vcpu_dirty_limit",
> "info vcpu_dirty_limit" so developers can play with them easier.
>
> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

[...]

I see you replaced the interface.  Back to square one...

> diff --git a/qapi/migration.json b/qapi/migration.json
> index 3da8fdf..dc15b3f 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1872,6 +1872,54 @@
>              'current-rate': 'int64' } }
>  
>  ##
> +# @vcpu-dirty-limit:
> +#
> +# Set or cancel the upper limit of dirty page rate for a virtual CPU.
> +#
> +# Requires KVM with accelerator property "dirty-ring-size" set.
> +# A virtual CPU's dirty page rate is a measure of its memory load.
> +# To observe dirty page rates, use @calc-dirty-rate.
> +#
> +# @cpu-index: index of virtual CPU.
> +#
> +# @enable: true to enable, false to disable.
> +#
> +# @dirty-rate: upper limit of dirty page rate for virtual CPU.
> +#
> +# Since: 7.0
> +#
> +# Example:
> +#   {"execute": "vcpu-dirty-limit"}
> +#    "arguments": { "cpu-index": 0,
> +#                   "enable": true,
> +#                   "dirty-rate": 200 } }
> +#
> +##
> +{ 'command': 'vcpu-dirty-limit',
> +  'data': { 'cpu-index': 'int',
> +            'enable': 'bool',
> +            'dirty-rate': 'uint64'} }

When @enable is false, @dirty-rate makes no sense and is not used (I
checked the code), but users have to specify it anyway.  That's bad
design.

Better: drop @enable, make @dirty-rate optional, present means enable,
absent means disable.

> +
> +##
> +# @query-vcpu-dirty-limit:
> +#
> +# Returns information about the virtual CPU dirty limit status.
> +#
> +# @cpu-index: index of the virtual CPU to query, if not specified, all
> +#             virtual CPUs will be queried.
> +#
> +# Since: 7.0
> +#
> +# Example:
> +#   {"execute": "query-vcpu-dirty-limit"}
> +#    "arguments": { "cpu-index": 0 } }
> +#
> +##
> +{ 'command': 'query-vcpu-dirty-limit',
> +  'data': { '*cpu-index': 'int' },
> +  'returns': [ 'DirtyLimitInfo' ] }

Why would anyone ever want to specify @cpu-index?  Output isn't that
large even if you have a few hundred CPUs.

Let's keep things simple and drop the parameter.

> +
> +##
>  # @snapshot-save:
>  #
>  # Save a VM snapshot
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 620a1f1..0f83ce3 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -3777,5 +3777,6 @@ void qemu_init(int argc, char **argv, char **envp)
>      qemu_init_displays();
>      accel_setup_post(current_machine);
>      os_setup_post();
> +    dirtylimit_setup(current_machine->smp.max_cpus);
>      resume_mux_open();
>  }



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-03 12:34   ` Markus Armbruster
@ 2021-12-04 12:00     ` Hyman Huang
  2021-12-06  8:28       ` Peter Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Hyman Huang @ 2021-12-04 12:00 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: David Hildenbrand, Juan Quintela, Richard Henderson, qemu-devel,
	Peter Xu, Dr. David Alan Gilbert, Paolo Bonzini,
	Philippe Mathieu-Daudé



在 2021/12/3 20:34, Markus Armbruster 写道:
> huangy81@chinatelecom.cn writes:
> 
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> Implement dirtyrate calculation periodically basing on
>> dirty-ring and throttle vCPU until it reachs the quota
>> dirty page rate given by user.
>>
>> Introduce qmp commands "vcpu-dirty-limit", "query-vcpu-dirty-limit"
>> to enable, disable, query dirty page limit for virtual CPU.
>>
>> Meanwhile, introduce corresponding hmp commands "vcpu_dirty_limit",
>> "info vcpu_dirty_limit" so developers can play with them easier.
>>
>> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> 
> [...]
> 
> I see you replaced the interface.  Back to square one...
> 
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 3da8fdf..dc15b3f 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -1872,6 +1872,54 @@
>>               'current-rate': 'int64' } }
>>   
>>   ##
>> +# @vcpu-dirty-limit:
>> +#
>> +# Set or cancel the upper limit of dirty page rate for a virtual CPU.
>> +#
>> +# Requires KVM with accelerator property "dirty-ring-size" set.
>> +# A virtual CPU's dirty page rate is a measure of its memory load.
>> +# To observe dirty page rates, use @calc-dirty-rate.
>> +#
>> +# @cpu-index: index of virtual CPU.
>> +#
>> +# @enable: true to enable, false to disable.
>> +#
>> +# @dirty-rate: upper limit of dirty page rate for virtual CPU.
>> +#
>> +# Since: 7.0
>> +#
>> +# Example:
>> +#   {"execute": "vcpu-dirty-limit"}
>> +#    "arguments": { "cpu-index": 0,
>> +#                   "enable": true,
>> +#                   "dirty-rate": 200 } }
>> +#
>> +##
>> +{ 'command': 'vcpu-dirty-limit',
>> +  'data': { 'cpu-index': 'int',
>> +            'enable': 'bool',
>> +            'dirty-rate': 'uint64'} }
> 
> When @enable is false, @dirty-rate makes no sense and is not used (I
> checked the code), but users have to specify it anyway.  That's bad
> design.
> 
> Better: drop @enable, make @dirty-rate optional, present means enable,
> absent means disable.
Uh, if we drop @enable, enabling dirty limit should be like:
vcpu-dirty-limit cpu-index=0 dirty-rate=1000

And disabling dirty limit like:
vcpu-dirty-limit cpu-index=0

For disabling case, there is no hint of disabling in command 
"vcpu-dirty-limit".

How about make @dirty-rate optional, when enable dirty limit, it should
present, ignored otherwise?

> 
>> +
>> +##
>> +# @query-vcpu-dirty-limit:
>> +#
>> +# Returns information about the virtual CPU dirty limit status.
>> +#
>> +# @cpu-index: index of the virtual CPU to query, if not specified, all
>> +#             virtual CPUs will be queried.
>> +#
>> +# Since: 7.0
>> +#
>> +# Example:
>> +#   {"execute": "query-vcpu-dirty-limit"}
>> +#    "arguments": { "cpu-index": 0 } }
>> +#
>> +##
>> +{ 'command': 'query-vcpu-dirty-limit',
>> +  'data': { '*cpu-index': 'int' },
>> +  'returns': [ 'DirtyLimitInfo' ] }
> 
> Why would anyone ever want to specify @cpu-index?  Output isn't that
> large even if you have a few hundred CPUs.
> 
> Let's keep things simple and drop the parameter.
Ok, this make things simple.
> 
>> +
>> +##
>>   # @snapshot-save:
>>   #
>>   # Save a VM snapshot
>> diff --git a/softmmu/vl.c b/softmmu/vl.c
>> index 620a1f1..0f83ce3 100644
>> --- a/softmmu/vl.c
>> +++ b/softmmu/vl.c
>> @@ -3777,5 +3777,6 @@ void qemu_init(int argc, char **argv, char **envp)
>>       qemu_init_displays();
>>       accel_setup_post(current_machine);
>>       os_setup_post();
>> +    dirtylimit_setup(current_machine->smp.max_cpus);
>>       resume_mux_open();
>>   }
> 

-- 
Best regard

Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-04 12:00     ` Hyman Huang
@ 2021-12-06  8:28       ` Peter Xu
  2021-12-06 14:56         ` Hyman
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2021-12-06  8:28 UTC (permalink / raw)
  To: Hyman Huang
  Cc: Juan Quintela, qemu-devel, David Hildenbrand, Richard Henderson,
	Markus Armbruster, Dr. David Alan Gilbert, Paolo Bonzini,
	Philippe Mathieu-Daudé

On Sat, Dec 04, 2021 at 08:00:19PM +0800, Hyman Huang wrote:
> 
> 
> 在 2021/12/3 20:34, Markus Armbruster 写道:
> > huangy81@chinatelecom.cn writes:
> > 
> > > From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> > > 
> > > Implement dirtyrate calculation periodically basing on
> > > dirty-ring and throttle vCPU until it reachs the quota
> > > dirty page rate given by user.
> > > 
> > > Introduce qmp commands "vcpu-dirty-limit", "query-vcpu-dirty-limit"
> > > to enable, disable, query dirty page limit for virtual CPU.
> > > 
> > > Meanwhile, introduce corresponding hmp commands "vcpu_dirty_limit",
> > > "info vcpu_dirty_limit" so developers can play with them easier.
> > > 
> > > Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> > 
> > [...]
> > 
> > I see you replaced the interface.  Back to square one...
> > 
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 3da8fdf..dc15b3f 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -1872,6 +1872,54 @@
> > >               'current-rate': 'int64' } }
> > >   ##
> > > +# @vcpu-dirty-limit:
> > > +#
> > > +# Set or cancel the upper limit of dirty page rate for a virtual CPU.
> > > +#
> > > +# Requires KVM with accelerator property "dirty-ring-size" set.
> > > +# A virtual CPU's dirty page rate is a measure of its memory load.
> > > +# To observe dirty page rates, use @calc-dirty-rate.
> > > +#
> > > +# @cpu-index: index of virtual CPU.
> > > +#
> > > +# @enable: true to enable, false to disable.
> > > +#
> > > +# @dirty-rate: upper limit of dirty page rate for virtual CPU.
> > > +#
> > > +# Since: 7.0
> > > +#
> > > +# Example:
> > > +#   {"execute": "vcpu-dirty-limit"}
> > > +#    "arguments": { "cpu-index": 0,
> > > +#                   "enable": true,
> > > +#                   "dirty-rate": 200 } }
> > > +#
> > > +##
> > > +{ 'command': 'vcpu-dirty-limit',
> > > +  'data': { 'cpu-index': 'int',
> > > +            'enable': 'bool',
> > > +            'dirty-rate': 'uint64'} }
> > 
> > When @enable is false, @dirty-rate makes no sense and is not used (I
> > checked the code), but users have to specify it anyway.  That's bad
> > design.
> > 
> > Better: drop @enable, make @dirty-rate optional, present means enable,
> > absent means disable.
> Uh, if we drop @enable, enabling dirty limit should be like:
> vcpu-dirty-limit cpu-index=0 dirty-rate=1000
> 
> And disabling dirty limit like:
> vcpu-dirty-limit cpu-index=0
> 
> For disabling case, there is no hint of disabling in command
> "vcpu-dirty-limit".
> 
> How about make @dirty-rate optional, when enable dirty limit, it should
> present, ignored otherwise?

Sounds good, I think we can make both "enable" and "dirty-rate" optional.

To turn it on we either use "enable=true,dirty-rate=XXX" or "dirty-rate=XXX".

To turn it off we use "enable=false".

> 
> > 
> > > +
> > > +##
> > > +# @query-vcpu-dirty-limit:
> > > +#
> > > +# Returns information about the virtual CPU dirty limit status.
> > > +#
> > > +# @cpu-index: index of the virtual CPU to query, if not specified, all
> > > +#             virtual CPUs will be queried.
> > > +#
> > > +# Since: 7.0
> > > +#
> > > +# Example:
> > > +#   {"execute": "query-vcpu-dirty-limit"}
> > > +#    "arguments": { "cpu-index": 0 } }
> > > +#
> > > +##
> > > +{ 'command': 'query-vcpu-dirty-limit',
> > > +  'data': { '*cpu-index': 'int' },
> > > +  'returns': [ 'DirtyLimitInfo' ] }
> > 
> > Why would anyone ever want to specify @cpu-index?  Output isn't that
> > large even if you have a few hundred CPUs.
> > 
> > Let's keep things simple and drop the parameter.
> Ok, this make things simple.

I found that it'll be challenging for any human being to identify "whether
he/she has turned throttle off for all vcpus"..  I think that could be useful
when we finally decided to cancel current migration.

I thought about adding a "global=on/off" flag, but instead can we just return
the vcpu info for the ones that enabled the per-vcpu throttling?  For anyone
who wants to read all vcpu dirty information he/she can use calc-dirty-rate.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-03  1:39 ` [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU huangy81
  2021-12-03 12:34   ` Markus Armbruster
@ 2021-12-06  8:36   ` Peter Xu
  2021-12-06 15:19     ` Hyman
  2021-12-06  8:39   ` Peter Xu
  2 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2021-12-06  8:36 UTC (permalink / raw)
  To: huangy81
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

On Fri, Dec 03, 2021 at 09:39:47AM +0800, huangy81@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> 
> Implement dirtyrate calculation periodically basing on
> dirty-ring and throttle vCPU until it reachs the quota
> dirty page rate given by user.
> 
> Introduce qmp commands "vcpu-dirty-limit", "query-vcpu-dirty-limit"
> to enable, disable, query dirty page limit for virtual CPU.
> 
> Meanwhile, introduce corresponding hmp commands "vcpu_dirty_limit",
> "info vcpu_dirty_limit" so developers can play with them easier.

Thanks.  Even if I start to use qmp-shell more recently but still in some case
where only hmp is specified this could still be handy.

> +void qmp_vcpu_dirty_limit(int64_t cpu_index,
> +                          bool enable,
> +                          uint64_t dirty_rate,
> +                          Error **errp)
> +{
> +    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
> +        error_setg(errp, "dirty page limit feature requires KVM with"
> +                   " accelerator property 'dirty-ring-size' set'");
> +        return;
> +    }
> +
> +    if (!dirtylimit_is_vcpu_index_valid(cpu_index)) {
> +        error_setg(errp, "cpu index out of range");
> +        return;
> +    }
> +
> +    if (enable) {
> +        dirtylimit_calc();
> +        dirtylimit_vcpu(cpu_index, dirty_rate);
> +    } else {
> +        if (!dirtylimit_enabled(cpu_index)) {
> +            error_setg(errp, "dirty page limit for CPU %ld not set",
> +                       cpu_index);
> +            return;
> +        }

We don't need to fail the user for enable=off even if vcpu is not throttled,
imho.

> +
> +        if (!dirtylimit_cancel_vcpu(cpu_index)) {
> +            dirtylimit_calc_quit();
> +        }
> +    }
> +}

[...]

> +struct DirtyLimitInfoList *qmp_query_vcpu_dirty_limit(bool has_cpu_index,
> +                                                      int64_t cpu_index,
> +                                                      Error **errp)
> +{
> +    DirtyLimitInfo *info = NULL;
> +    DirtyLimitInfoList *head = NULL, **tail = &head;
> +
> +    if (has_cpu_index &&
> +        (!dirtylimit_is_vcpu_index_valid(cpu_index))) {
> +        error_setg(errp, "cpu index out of range");
> +        return NULL;
> +    }
> +
> +    if (has_cpu_index) {
> +        info = dirtylimit_query_vcpu(cpu_index);
> +        QAPI_LIST_APPEND(tail, info);
> +    } else {
> +        CPUState *cpu;
> +        CPU_FOREACH(cpu) {
> +            if (!cpu->unplug) {
> +                info = dirtylimit_query_vcpu(cpu->cpu_index);
> +                QAPI_LIST_APPEND(tail, info);
> +            }

There're special handling for unplug in a few places.  Could you explain why?
E.g. if the vcpu is unplugged then dirty rate is zero, then it seems fine to
even keep it there?

> +        }
> +    }
> +
> +    return head;
> +}

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-03  1:39 ` [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU huangy81
  2021-12-03 12:34   ` Markus Armbruster
  2021-12-06  8:36   ` Peter Xu
@ 2021-12-06  8:39   ` Peter Xu
  2021-12-06 15:22     ` Hyman
  2 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2021-12-06  8:39 UTC (permalink / raw)
  To: huangy81
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

On Fri, Dec 03, 2021 at 09:39:47AM +0800, huangy81@chinatelecom.cn wrote:
> +void dirtylimit_setup(int max_cpus)
> +{
> +    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
> +        return;
> +    }
> +
> +    dirtylimit_calc_state_init(max_cpus);
> +    dirtylimit_state_init(max_cpus);
> +}

[...]

> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 620a1f1..0f83ce3 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -3777,5 +3777,6 @@ void qemu_init(int argc, char **argv, char **envp)
>      qemu_init_displays();
>      accel_setup_post(current_machine);
>      os_setup_post();
> +    dirtylimit_setup(current_machine->smp.max_cpus);
>      resume_mux_open();

Can we do the init only when someone enables it?  We could also do proper
free() for the structs when it's globally turned off.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 2/3] cpu-throttle: implement vCPU throttle
  2021-12-03  1:39 ` [PATCH v9 2/3] cpu-throttle: implement vCPU throttle huangy81
@ 2021-12-06 10:10   ` Peter Xu
  2021-12-08 15:36     ` Hyman
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2021-12-06 10:10 UTC (permalink / raw)
  To: huangy81
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

On Fri, Dec 03, 2021 at 09:39:46AM +0800, huangy81@chinatelecom.cn wrote:
> +static uint64_t dirtylimit_pct(unsigned int last_pct,
> +                               uint64_t quota,
> +                               uint64_t current)
> +{
> +    uint64_t limit_pct = 0;
> +    RestrainPolicy policy;
> +    bool mitigate = (quota > current) ? true : false;
> +
> +    if (mitigate && ((current == 0) ||
> +        (last_pct <= DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE))) {
> +        return 0;
> +    }
> +
> +    policy = dirtylimit_policy(last_pct, quota, current);
> +    switch (policy) {
> +    case RESTRAIN_SLIGHT:
> +        /* [90, 99] */
> +        if (mitigate) {
> +            limit_pct =
> +                last_pct - DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE;
> +        } else {
> +            limit_pct =
> +                last_pct + DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE;
> +
> +            limit_pct = MIN(limit_pct, CPU_THROTTLE_PCT_MAX);
> +        }
> +       break;
> +    case RESTRAIN_HEAVY:
> +        /* [75, 90) */
> +        if (mitigate) {
> +            limit_pct =
> +                last_pct - DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE;
> +        } else {
> +            limit_pct =
> +                last_pct + DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE;
> +
> +            limit_pct = MIN(limit_pct,
> +                DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK);
> +        }
> +       break;
> +    case RESTRAIN_RATIO:
> +        /* [0, 75) */
> +        if (mitigate) {
> +            if (last_pct <= (((quota - current) * 100 / quota))) {
> +                limit_pct = 0;
> +            } else {
> +                limit_pct = last_pct -
> +                    ((quota - current) * 100 / quota);
> +                limit_pct = MAX(limit_pct, CPU_THROTTLE_PCT_MIN);
> +            }
> +        } else {
> +            limit_pct = last_pct +
> +                ((current - quota) * 100 / current);
> +
> +            limit_pct = MIN(limit_pct,
> +                DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK);
> +        }
> +       break;
> +    case RESTRAIN_KEEP:
> +    default:
> +       limit_pct = last_pct;
> +       break;
> +    }
> +
> +    return limit_pct;
> +}
> +
> +static void *dirtylimit_thread(void *opaque)
> +{
> +    int cpu_index = *(int *)opaque;
> +    uint64_t quota_dirtyrate, current_dirtyrate;
> +    unsigned int last_pct = 0;
> +    unsigned int pct = 0;
> +
> +    rcu_register_thread();
> +
> +    quota_dirtyrate = dirtylimit_quota(cpu_index);
> +    current_dirtyrate = dirtylimit_current(cpu_index);
> +
> +    pct = dirtylimit_init_pct(quota_dirtyrate, current_dirtyrate);
> +
> +    do {
> +        trace_dirtylimit_impose(cpu_index,
> +            quota_dirtyrate, current_dirtyrate, pct);
> +
> +        last_pct = pct;
> +        if (pct == 0) {
> +            sleep(DIRTYLIMIT_CALC_PERIOD_TIME_S);
> +        } else {
> +            dirtylimit_check(cpu_index, pct);
> +        }
> +
> +        quota_dirtyrate = dirtylimit_quota(cpu_index);
> +        current_dirtyrate = dirtylimit_current(cpu_index);
> +
> +        pct = dirtylimit_pct(last_pct, quota_dirtyrate, current_dirtyrate);

So what I had in mind is we can start with an extremely simple version of
negative feedback system.  Say, firstly each vcpu will have a simple number to
sleep for some interval (this is ugly code, but just show what I meant..):

===============
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index eecd8031cf..c320fd190f 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2932,6 +2932,8 @@ int kvm_cpu_exec(CPUState *cpu)
             trace_kvm_dirty_ring_full(cpu->cpu_index);
             qemu_mutex_lock_iothread();
             kvm_dirty_ring_reap(kvm_state);
+            if (dirtylimit_enabled(cpu->cpu_index) && cpu->throttle_us_per_full)
+                usleep(cpu->throttle_us_per_full);
             qemu_mutex_unlock_iothread();
             ret = 0;
             break;
===============

I think this will have finer granularity when throttle (for 4096 ring size,
that's per-16MB operation) than current way where we inject per-vcpu async task
to sleep, like auto-converge.

Then we have the "black box" to tune this value with below input/output:

  - Input: dirty rate information, same as current algo

  - Output: increase/decrease of per-vcpu throttle_us_per_full above, and
    that's all

We can do the sampling per-second, then we keep doing it: we can have 1 thread
doing per-second task collecting dirty rate information for all the vcpus, then
tune that throttle_us_per_full for each of them.

The simplest linear algorithm would be as simple as (for each vcpu):

  if (quota < current)
    throttle_us_per_full += SOMETHING;
    if (throttle_us_per_full > MAX)
      throttle_us_per_full = MAX;
  else
    throttle_us_per_full -= SOMETHING;
    if (throttle_us_per_full < 0)
      throttle_us_per_full = 0;

I think your algorithm is fine, but thoroughly review every single bit of it in
one shot will be challenging, and it's also hard to prove every bit of the
algorithm is helpful, as there're a lot of hand-made macros and state changes.

I actually tested the current algorithm of yours, the dirty rate fluctuates a
bit (when I specified 200MB/s, it can go into either a few tens of MB/s or
300MB/s, normally less), neither does it respond fast (the initial throtle from
500MB/s -> 200MB/s should need 1 minute or something), so it seems not ideal
anyway. In that case I prefer we start with simple.

So IMHO we can start with this simple scheme first then it'll start working
with much less line of codes, afaict.  With that scheme ready in the 1st or
initial patches, it'll be easier to either apply any better algorithm
(e.g. your current one, if you're confident with that) or other things then
it'll be much easier to review too if you could consider split your patch like
that.

Normally per my knowledge for the need on migration, we could consider add an
integral algorithm into this linear algorithm that I said above, and it should
help us reach a very stable and constant state of throttling already.  But
we'll need to try it out, as I never tried.

What do you think?

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  2021-12-03  1:39 ` [PATCH v9 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
@ 2021-12-06 10:18   ` Peter Xu
  2021-12-06 15:36     ` Hyman
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2021-12-06 10:18 UTC (permalink / raw)
  To: huangy81
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

On Fri, Dec 03, 2021 at 09:39:45AM +0800, huangy81@chinatelecom.cn wrote:
> +static void dirtylimit_calc_func(void)
> +{
> +    CPUState *cpu;
> +    DirtyPageRecord *dirty_pages;
> +    int64_t start_time, end_time, calc_time;
> +    DirtyRateVcpu rate;
> +    int i = 0;
> +
> +    dirty_pages = g_malloc0(sizeof(*dirty_pages) *
> +        dirtylimit_calc_state->data.nvcpu);
> +
> +    CPU_FOREACH(cpu) {
> +        record_dirtypages(dirty_pages, cpu, true);
> +    }
> +
> +    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000);
> +    end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    calc_time = end_time - start_time;
> +
> +    qemu_mutex_lock_iothread();
> +    memory_global_dirty_log_sync();
> +    qemu_mutex_unlock_iothread();
> +
> +    CPU_FOREACH(cpu) {
> +        record_dirtypages(dirty_pages, cpu, false);
> +    }
> +
> +    for (i = 0; i < dirtylimit_calc_state->data.nvcpu; i++) {
> +        uint64_t increased_dirty_pages =
> +            dirty_pages[i].end_pages - dirty_pages[i].start_pages;
> +        uint64_t memory_size_MB =
> +            (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
> +        int64_t dirtyrate = (memory_size_MB * 1000) / calc_time;
> +
> +        rate.id = i;
> +        rate.dirty_rate  = dirtyrate;
> +        dirtylimit_calc_state->data.rates[i] = rate;
> +
> +        trace_dirtyrate_do_calculate_vcpu(i,
> +            dirtylimit_calc_state->data.rates[i].dirty_rate);
> +    }
> +}

This looks so like the calc-dirty-rate code already.

I think adding a new resion (GLOBAL_DIRTY_LIMIT) is fine, however still, any
chance to merge the code?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-06  8:28       ` Peter Xu
@ 2021-12-06 14:56         ` Hyman
  2021-12-07  2:24           ` Peter Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Hyman @ 2021-12-06 14:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, qemu-devel, David Hildenbrand, Richard Henderson,
	Markus Armbruster, Dr. David Alan Gilbert, Paolo Bonzini,
	Philippe Mathieu-Daudé



在 2021/12/6 16:28, Peter Xu 写道:
> On Sat, Dec 04, 2021 at 08:00:19PM +0800, Hyman Huang wrote:
>>
>>
>> 在 2021/12/3 20:34, Markus Armbruster 写道:
>>> huangy81@chinatelecom.cn writes:
>>>
>>>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>>>
>>>> Implement dirtyrate calculation periodically basing on
>>>> dirty-ring and throttle vCPU until it reachs the quota
>>>> dirty page rate given by user.
>>>>
>>>> Introduce qmp commands "vcpu-dirty-limit", "query-vcpu-dirty-limit"
>>>> to enable, disable, query dirty page limit for virtual CPU.
>>>>
>>>> Meanwhile, introduce corresponding hmp commands "vcpu_dirty_limit",
>>>> "info vcpu_dirty_limit" so developers can play with them easier.
>>>>
>>>> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>>
>>> [...]
>>>
>>> I see you replaced the interface.  Back to square one...
>>>
>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>> index 3da8fdf..dc15b3f 100644
>>>> --- a/qapi/migration.json
>>>> +++ b/qapi/migration.json
>>>> @@ -1872,6 +1872,54 @@
>>>>                'current-rate': 'int64' } }
>>>>    ##
>>>> +# @vcpu-dirty-limit:
>>>> +#
>>>> +# Set or cancel the upper limit of dirty page rate for a virtual CPU.
>>>> +#
>>>> +# Requires KVM with accelerator property "dirty-ring-size" set.
>>>> +# A virtual CPU's dirty page rate is a measure of its memory load.
>>>> +# To observe dirty page rates, use @calc-dirty-rate.
>>>> +#
>>>> +# @cpu-index: index of virtual CPU.
>>>> +#
>>>> +# @enable: true to enable, false to disable.
>>>> +#
>>>> +# @dirty-rate: upper limit of dirty page rate for virtual CPU.
>>>> +#
>>>> +# Since: 7.0
>>>> +#
>>>> +# Example:
>>>> +#   {"execute": "vcpu-dirty-limit"}
>>>> +#    "arguments": { "cpu-index": 0,
>>>> +#                   "enable": true,
>>>> +#                   "dirty-rate": 200 } }
>>>> +#
>>>> +##
>>>> +{ 'command': 'vcpu-dirty-limit',
>>>> +  'data': { 'cpu-index': 'int',
>>>> +            'enable': 'bool',
>>>> +            'dirty-rate': 'uint64'} }
>>>
>>> When @enable is false, @dirty-rate makes no sense and is not used (I
>>> checked the code), but users have to specify it anyway.  That's bad
>>> design.
>>>
>>> Better: drop @enable, make @dirty-rate optional, present means enable,
>>> absent means disable.
>> Uh, if we drop @enable, enabling dirty limit should be like:
>> vcpu-dirty-limit cpu-index=0 dirty-rate=1000
>>
>> And disabling dirty limit like:
>> vcpu-dirty-limit cpu-index=0
>>
>> For disabling case, there is no hint of disabling in command
>> "vcpu-dirty-limit".
>>
>> How about make @dirty-rate optional, when enable dirty limit, it should
>> present, ignored otherwise?
> 
> Sounds good, I think we can make both "enable" and "dirty-rate" optional.
> 
> To turn it on we either use "enable=true,dirty-rate=XXX" or "dirty-rate=XXX" >
> To turn it off we use "enable=false".
Indeed, this make things more convenient.
>  >>
>>>
>>>> +
>>>> +##
>>>> +# @query-vcpu-dirty-limit:
>>>> +#
>>>> +# Returns information about the virtual CPU dirty limit status.
>>>> +#
>>>> +# @cpu-index: index of the virtual CPU to query, if not specified, all
>>>> +#             virtual CPUs will be queried.
>>>> +#
>>>> +# Since: 7.0
>>>> +#
>>>> +# Example:
>>>> +#   {"execute": "query-vcpu-dirty-limit"}
>>>> +#    "arguments": { "cpu-index": 0 } }
>>>> +#
>>>> +##
>>>> +{ 'command': 'query-vcpu-dirty-limit',
>>>> +  'data': { '*cpu-index': 'int' },
>>>> +  'returns': [ 'DirtyLimitInfo' ] }
>>>
>>> Why would anyone ever want to specify @cpu-index?  Output isn't that
>>> large even if you have a few hundred CPUs.
>>>
>>> Let's keep things simple and drop the parameter.
>> Ok, this make things simple.
> 
> I found that it'll be challenging for any human being to identify "whether
> he/she has turned throttle off for all vcpus"..  I think that could be useful
> when we finally decided to cancel current migration.
That's question, how about adding an optional argument "global" and 
making "cpu-index", "enable", "dirty-rate" all optional in 
"vcpu-dirty-limit", keeping the "cpu-index" and "global" options 
mutually exclusive?
{ 'command': 'vcpu-dirty-limit',
   'data': { '*cpu-index': 'int',
             '*global': 'bool'
             '*enable': 'bool',
             '*dirty-rate': 'uint64'} }
In the case of enabling all vcpu throttle:
Either use "global=true,enable=true,dirty-rate=XXX" or 
"global=true,dirty-rate=XXX"

In the case of disabling all vcpu throttle:
use "global=true,enable=false,dirty-rate=XXX"

In other case, we pass the same option just like what we did for 
specified vcpu throttle before.
> 
> I thought about adding a "global=on/off" flag, but instead can we just return
> the vcpu info for the ones that enabled the per-vcpu throttling?  For anyone
> who wants to read all vcpu dirty information he/she can use calc-dirty-rate.
> 
Ok, I'll pick up this advice next version.
> Thanks,
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-06  8:36   ` Peter Xu
@ 2021-12-06 15:19     ` Hyman
  2021-12-07  2:57       ` Peter Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Hyman @ 2021-12-06 15:19 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



在 2021/12/6 16:36, Peter Xu 写道:
> On Fri, Dec 03, 2021 at 09:39:47AM +0800, huangy81@chinatelecom.cn wrote:
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> Implement dirtyrate calculation periodically basing on
>> dirty-ring and throttle vCPU until it reachs the quota
>> dirty page rate given by user.
>>
>> Introduce qmp commands "vcpu-dirty-limit", "query-vcpu-dirty-limit"
>> to enable, disable, query dirty page limit for virtual CPU.
>>
>> Meanwhile, introduce corresponding hmp commands "vcpu_dirty_limit",
>> "info vcpu_dirty_limit" so developers can play with them easier.
> 
> Thanks.  Even if I start to use qmp-shell more recently but still in some case
> where only hmp is specified this could still be handy.
> 
>> +void qmp_vcpu_dirty_limit(int64_t cpu_index,
>> +                          bool enable,
>> +                          uint64_t dirty_rate,
>> +                          Error **errp)
>> +{
>> +    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
>> +        error_setg(errp, "dirty page limit feature requires KVM with"
>> +                   " accelerator property 'dirty-ring-size' set'");
>> +        return;
>> +    }
>> +
>> +    if (!dirtylimit_is_vcpu_index_valid(cpu_index)) {
>> +        error_setg(errp, "cpu index out of range");
>> +        return;
>> +    }
>> +
>> +    if (enable) {
>> +        dirtylimit_calc();
>> +        dirtylimit_vcpu(cpu_index, dirty_rate);
>> +    } else {
>> +        if (!dirtylimit_enabled(cpu_index)) {
>> +            error_setg(errp, "dirty page limit for CPU %ld not set",
>> +                       cpu_index);
>> +            return;
>> +        }
> 
> We don't need to fail the user for enable=off even if vcpu is not throttled,
> imho.
Ok.
> 
>> +
>> +        if (!dirtylimit_cancel_vcpu(cpu_index)) {
>> +            dirtylimit_calc_quit();
>> +        }
>> +    }
>> +}
> 
> [...]
> 
>> +struct DirtyLimitInfoList *qmp_query_vcpu_dirty_limit(bool has_cpu_index,
>> +                                                      int64_t cpu_index,
>> +                                                      Error **errp)
>> +{
>> +    DirtyLimitInfo *info = NULL;
>> +    DirtyLimitInfoList *head = NULL, **tail = &head;
>> +
>> +    if (has_cpu_index &&
>> +        (!dirtylimit_is_vcpu_index_valid(cpu_index))) {
>> +        error_setg(errp, "cpu index out of range");
>> +        return NULL;
>> +    }
>> +
>> +    if (has_cpu_index) {
>> +        info = dirtylimit_query_vcpu(cpu_index);
>> +        QAPI_LIST_APPEND(tail, info);
>> +    } else {
>> +        CPUState *cpu;
>> +        CPU_FOREACH(cpu) {
>> +            if (!cpu->unplug) {
>> +                info = dirtylimit_query_vcpu(cpu->cpu_index);
>> +                QAPI_LIST_APPEND(tail, info);
>> +            }
> 
> There're special handling for unplug in a few places.  Could you explain why?
> E.g. if the vcpu is unplugged then dirty rate is zero, then it seems fine to
> even keep it there?
> The dirty limit logic only allow plugged vcpu to be enabled throttle, so 
that the "dirtylimit-{cpu-index}" thread don't need to be forked and we 
can save the overhead. So in query logic we just filter the unplugged vcpu.

Another reason is that i thought it could make user confused when we 
return the unplugged vcpu dirtylimit info. Uh, in most time of vm 
lifecycle, hotplugging vcpu may never happen.
>> +        }
>> +    }
>> +
>> +    return head;
>> +}
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-06  8:39   ` Peter Xu
@ 2021-12-06 15:22     ` Hyman
  0 siblings, 0 replies; 21+ messages in thread
From: Hyman @ 2021-12-06 15:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



在 2021/12/6 16:39, Peter Xu 写道:
> On Fri, Dec 03, 2021 at 09:39:47AM +0800, huangy81@chinatelecom.cn wrote:
>> +void dirtylimit_setup(int max_cpus)
>> +{
>> +    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
>> +        return;
>> +    }
>> +
>> +    dirtylimit_calc_state_init(max_cpus);
>> +    dirtylimit_state_init(max_cpus);
>> +}
> 
> [...]
> 
>> diff --git a/softmmu/vl.c b/softmmu/vl.c
>> index 620a1f1..0f83ce3 100644
>> --- a/softmmu/vl.c
>> +++ b/softmmu/vl.c
>> @@ -3777,5 +3777,6 @@ void qemu_init(int argc, char **argv, char **envp)
>>       qemu_init_displays();
>>       accel_setup_post(current_machine);
>>       os_setup_post();
>> +    dirtylimit_setup(current_machine->smp.max_cpus);
>>       resume_mux_open();
> 
> Can we do the init only when someone enables it?  We could also do proper
> free() for the structs when it's globally turned off.
Yes, i'll try this next version
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  2021-12-06 10:18   ` Peter Xu
@ 2021-12-06 15:36     ` Hyman
  0 siblings, 0 replies; 21+ messages in thread
From: Hyman @ 2021-12-06 15:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



在 2021/12/6 18:18, Peter Xu 写道:
> On Fri, Dec 03, 2021 at 09:39:45AM +0800, huangy81@chinatelecom.cn wrote:
>> +static void dirtylimit_calc_func(void)
>> +{
>> +    CPUState *cpu;
>> +    DirtyPageRecord *dirty_pages;
>> +    int64_t start_time, end_time, calc_time;
>> +    DirtyRateVcpu rate;
>> +    int i = 0;
>> +
>> +    dirty_pages = g_malloc0(sizeof(*dirty_pages) *
>> +        dirtylimit_calc_state->data.nvcpu);
>> +
>> +    CPU_FOREACH(cpu) {
>> +        record_dirtypages(dirty_pages, cpu, true);
>> +    }
>> +
>> +    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +    g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000);
>> +    end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +    calc_time = end_time - start_time;
>> +
>> +    qemu_mutex_lock_iothread();
>> +    memory_global_dirty_log_sync();
>> +    qemu_mutex_unlock_iothread();
>> +
>> +    CPU_FOREACH(cpu) {
>> +        record_dirtypages(dirty_pages, cpu, false);
>> +    }
>> +
>> +    for (i = 0; i < dirtylimit_calc_state->data.nvcpu; i++) {
>> +        uint64_t increased_dirty_pages =
>> +            dirty_pages[i].end_pages - dirty_pages[i].start_pages;
>> +        uint64_t memory_size_MB =
>> +            (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
>> +        int64_t dirtyrate = (memory_size_MB * 1000) / calc_time;
>> +
>> +        rate.id = i;
>> +        rate.dirty_rate  = dirtyrate;
>> +        dirtylimit_calc_state->data.rates[i] = rate;
>> +
>> +        trace_dirtyrate_do_calculate_vcpu(i,
>> +            dirtylimit_calc_state->data.rates[i].dirty_rate);
>> +    }
>> +}
> 
> This looks so like the calc-dirty-rate code already.
> 
> I think adding a new resion (GLOBAL_DIRTY_LIMIT) is fine, however still, any
Ok.
> chance to merge the code?
I'm not sure about merging but i'll try it. :)
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-06 14:56         ` Hyman
@ 2021-12-07  2:24           ` Peter Xu
  2021-12-07  4:32             ` Hyman
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2021-12-07  2:24 UTC (permalink / raw)
  To: Hyman
  Cc: Juan Quintela, qemu-devel, David Hildenbrand, Richard Henderson,
	Markus Armbruster, Dr. David Alan Gilbert, Paolo Bonzini,
	Philippe Mathieu-Daudé

On Mon, Dec 06, 2021 at 10:56:00PM +0800, Hyman wrote:
> > I found that it'll be challenging for any human being to identify "whether
> > he/she has turned throttle off for all vcpus"..  I think that could be useful
> > when we finally decided to cancel current migration.
> That's question, how about adding an optional argument "global" and making
> "cpu-index", "enable", "dirty-rate" all optional in "vcpu-dirty-limit",
> keeping the "cpu-index" and "global" options mutually exclusive?
> { 'command': 'vcpu-dirty-limit',
>   'data': { '*cpu-index': 'int',
>             '*global': 'bool'
>             '*enable': 'bool',
>             '*dirty-rate': 'uint64'} }
> In the case of enabling all vcpu throttle:
> Either use "global=true,enable=true,dirty-rate=XXX" or
> "global=true,dirty-rate=XXX"
> 
> In the case of disabling all vcpu throttle:
> use "global=true,enable=false,dirty-rate=XXX"
> 
> In other case, we pass the same option just like what we did for specified
> vcpu throttle before.

Could we merge "cpu-index" and "global" somehow?  They're mutual exclusive.

For example, merge them into one "vcpu" parameter, "vcpu=all" means global,
"vcpu=1" means vcpu 1.  But then we'll need to make it a string.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-06 15:19     ` Hyman
@ 2021-12-07  2:57       ` Peter Xu
  2021-12-07  4:38         ` Hyman
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Xu @ 2021-12-07  2:57 UTC (permalink / raw)
  To: Hyman
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

On Mon, Dec 06, 2021 at 11:19:21PM +0800, Hyman wrote:
> > > +    if (has_cpu_index) {
> > > +        info = dirtylimit_query_vcpu(cpu_index);
> > > +        QAPI_LIST_APPEND(tail, info);
> > > +    } else {
> > > +        CPUState *cpu;
> > > +        CPU_FOREACH(cpu) {
> > > +            if (!cpu->unplug) {
> > > +                info = dirtylimit_query_vcpu(cpu->cpu_index);
> > > +                QAPI_LIST_APPEND(tail, info);
> > > +            }
> > 
> > There're special handling for unplug in a few places.  Could you explain why?
> > E.g. if the vcpu is unplugged then dirty rate is zero, then it seems fine to
> > even keep it there?
> > The dirty limit logic only allow plugged vcpu to be enabled throttle, so
> that the "dirtylimit-{cpu-index}" thread don't need to be forked and we can
> save the overhead. So in query logic we just filter the unplugged vcpu.

I've commented similarly in the other thread - please consider not using NVCPU
threads only for vcpu throttling, irrelevant of vcpu hot plug/unplug.

Per-vcpu throttle is totally not a cpu intensive workload, 1 thread should be
enough globally, imho.

A guest with hundreds of vcpus are becoming more common, we shouldn't waste OS
thread resources just for this.

> 
> Another reason is that i thought it could make user confused when we return
> the unplugged vcpu dirtylimit info. Uh, in most time of vm lifecycle,
> hotplugging vcpu may never happen.

I just think if plug/unplug does not affect the throttle logic then we should
treat them the same, it avoids unnecessary special care on those vcpus too.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-07  2:24           ` Peter Xu
@ 2021-12-07  4:32             ` Hyman
  0 siblings, 0 replies; 21+ messages in thread
From: Hyman @ 2021-12-07  4:32 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, qemu-devel, David Hildenbrand, Richard Henderson,
	Markus Armbruster, Dr. David Alan Gilbert, Paolo Bonzini,
	Philippe Mathieu-Daudé



在 2021/12/7 10:24, Peter Xu 写道:
> On Mon, Dec 06, 2021 at 10:56:00PM +0800, Hyman wrote:
>>> I found that it'll be challenging for any human being to identify "whether
>>> he/she has turned throttle off for all vcpus"..  I think that could be useful
>>> when we finally decided to cancel current migration.
>> That's question, how about adding an optional argument "global" and making
>> "cpu-index", "enable", "dirty-rate" all optional in "vcpu-dirty-limit",
>> keeping the "cpu-index" and "global" options mutually exclusive?
>> { 'command': 'vcpu-dirty-limit',
>>    'data': { '*cpu-index': 'int',
>>              '*global': 'bool'
>>              '*enable': 'bool',
>>              '*dirty-rate': 'uint64'} }
>> In the case of enabling all vcpu throttle:
>> Either use "global=true,enable=true,dirty-rate=XXX" or
>> "global=true,dirty-rate=XXX"
>>
>> In the case of disabling all vcpu throttle:
>> use "global=true,enable=false,dirty-rate=XXX"
>>
>> In other case, we pass the same option just like what we did for specified
>> vcpu throttle before.
> 
> Could we merge "cpu-index" and "global" somehow?  They're mutual exclusive. >
> For example, merge them into one "vcpu" parameter, "vcpu=all" means global,
> "vcpu=1" means vcpu 1.  But then we'll need to make it a string.
>Ok, sound good


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU
  2021-12-07  2:57       ` Peter Xu
@ 2021-12-07  4:38         ` Hyman
  0 siblings, 0 replies; 21+ messages in thread
From: Hyman @ 2021-12-07  4:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



在 2021/12/7 10:57, Peter Xu 写道:
> On Mon, Dec 06, 2021 at 11:19:21PM +0800, Hyman wrote:
>>>> +    if (has_cpu_index) {
>>>> +        info = dirtylimit_query_vcpu(cpu_index);
>>>> +        QAPI_LIST_APPEND(tail, info);
>>>> +    } else {
>>>> +        CPUState *cpu;
>>>> +        CPU_FOREACH(cpu) {
>>>> +            if (!cpu->unplug) {
>>>> +                info = dirtylimit_query_vcpu(cpu->cpu_index);
>>>> +                QAPI_LIST_APPEND(tail, info);
>>>> +            }
>>>
>>> There're special handling for unplug in a few places.  Could you explain why?
>>> E.g. if the vcpu is unplugged then dirty rate is zero, then it seems fine to
>>> even keep it there?
>>> The dirty limit logic only allow plugged vcpu to be enabled throttle, so
>> that the "dirtylimit-{cpu-index}" thread don't need to be forked and we can
>> save the overhead. So in query logic we just filter the unplugged vcpu.
> 
> I've commented similarly in the other thread - please consider not using NVCPU
> threads only for vcpu throttling, irrelevant of vcpu hot plug/unplug.
> 
> Per-vcpu throttle is totally not a cpu intensive workload, 1 thread should be
> enough globally, imho.
> 
> A guest with hundreds of vcpus are becoming more common, we shouldn't waste OS
> thread resources just for this.
> 
Ok, i'll try this out next version
>>
>> Another reason is that i thought it could make user confused when we return
>> the unplugged vcpu dirtylimit info. Uh, in most time of vm lifecycle,
>> hotplugging vcpu may never happen.
> 
> I just think if plug/unplug does not affect the throttle logic then we should
> treat them the same, it avoids unnecessary special care on those vcpus too.
> 
Indeed, i'm struggling too :), i'll remove the plug/unplug logic the 
next version.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 2/3] cpu-throttle: implement vCPU throttle
  2021-12-06 10:10   ` Peter Xu
@ 2021-12-08 15:36     ` Hyman
  2021-12-08 15:50       ` Hyman
  0 siblings, 1 reply; 21+ messages in thread
From: Hyman @ 2021-12-08 15:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



在 2021/12/6 18:10, Peter Xu 写道:
> On Fri, Dec 03, 2021 at 09:39:46AM +0800, huangy81@chinatelecom.cn wrote:
>> +static uint64_t dirtylimit_pct(unsigned int last_pct,
>> +                               uint64_t quota,
>> +                               uint64_t current)
>> +{
>> +    uint64_t limit_pct = 0;
>> +    RestrainPolicy policy;
>> +    bool mitigate = (quota > current) ? true : false;
>> +
>> +    if (mitigate && ((current == 0) ||
>> +        (last_pct <= DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE))) {
>> +        return 0;
>> +    }
>> +
>> +    policy = dirtylimit_policy(last_pct, quota, current);
>> +    switch (policy) {
>> +    case RESTRAIN_SLIGHT:
>> +        /* [90, 99] */
>> +        if (mitigate) {
>> +            limit_pct =
>> +                last_pct - DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE;
>> +        } else {
>> +            limit_pct =
>> +                last_pct + DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE;
>> +
>> +            limit_pct = MIN(limit_pct, CPU_THROTTLE_PCT_MAX);
>> +        }
>> +       break;
>> +    case RESTRAIN_HEAVY:
>> +        /* [75, 90) */
>> +        if (mitigate) {
>> +            limit_pct =
>> +                last_pct - DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE;
>> +        } else {
>> +            limit_pct =
>> +                last_pct + DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE;
>> +
>> +            limit_pct = MIN(limit_pct,
>> +                DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK);
>> +        }
>> +       break;
>> +    case RESTRAIN_RATIO:
>> +        /* [0, 75) */
>> +        if (mitigate) {
>> +            if (last_pct <= (((quota - current) * 100 / quota))) {
>> +                limit_pct = 0;
>> +            } else {
>> +                limit_pct = last_pct -
>> +                    ((quota - current) * 100 / quota);
>> +                limit_pct = MAX(limit_pct, CPU_THROTTLE_PCT_MIN);
>> +            }
>> +        } else {
>> +            limit_pct = last_pct +
>> +                ((current - quota) * 100 / current);
>> +
>> +            limit_pct = MIN(limit_pct,
>> +                DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK);
>> +        }
>> +       break;
>> +    case RESTRAIN_KEEP:
>> +    default:
>> +       limit_pct = last_pct;
>> +       break;
>> +    }
>> +
>> +    return limit_pct;
>> +}
>> +
>> +static void *dirtylimit_thread(void *opaque)
>> +{
>> +    int cpu_index = *(int *)opaque;
>> +    uint64_t quota_dirtyrate, current_dirtyrate;
>> +    unsigned int last_pct = 0;
>> +    unsigned int pct = 0;
>> +
>> +    rcu_register_thread();
>> +
>> +    quota_dirtyrate = dirtylimit_quota(cpu_index);
>> +    current_dirtyrate = dirtylimit_current(cpu_index);
>> +
>> +    pct = dirtylimit_init_pct(quota_dirtyrate, current_dirtyrate);
>> +
>> +    do {
>> +        trace_dirtylimit_impose(cpu_index,
>> +            quota_dirtyrate, current_dirtyrate, pct);
>> +
>> +        last_pct = pct;
>> +        if (pct == 0) {
>> +            sleep(DIRTYLIMIT_CALC_PERIOD_TIME_S);
>> +        } else {
>> +            dirtylimit_check(cpu_index, pct);
>> +        }
>> +
>> +        quota_dirtyrate = dirtylimit_quota(cpu_index);
>> +        current_dirtyrate = dirtylimit_current(cpu_index);
>> +
>> +        pct = dirtylimit_pct(last_pct, quota_dirtyrate, current_dirtyrate);
> 
> So what I had in mind is we can start with an extremely simple version of
> negative feedback system.  Say, firstly each vcpu will have a simple number to
> sleep for some interval (this is ugly code, but just show what I meant..):
> 
> ===============
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index eecd8031cf..c320fd190f 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2932,6 +2932,8 @@ int kvm_cpu_exec(CPUState *cpu)
>               trace_kvm_dirty_ring_full(cpu->cpu_index);
>               qemu_mutex_lock_iothread();
>               kvm_dirty_ring_reap(kvm_state);
> +            if (dirtylimit_enabled(cpu->cpu_index) && cpu->throttle_us_per_full)
> +                usleep(cpu->throttle_us_per_full);
>               qemu_mutex_unlock_iothread();
>               ret = 0;
>               break;
> ===============
> 
> I think this will have finer granularity when throttle (for 4096 ring size,
> that's per-16MB operation) than current way where we inject per-vcpu async task
> to sleep, like auto-converge.
> 
> Then we have the "black box" to tune this value with below input/output:
> 
>    - Input: dirty rate information, same as current algo
> 
>    - Output: increase/decrease of per-vcpu throttle_us_per_full above, and
>      that's all
> 
> We can do the sampling per-second, then we keep doing it: we can have 1 thread
> doing per-second task collecting dirty rate information for all the vcpus, then
> tune that throttle_us_per_full for each of them.
> 
> The simplest linear algorithm would be as simple as (for each vcpu):
> 
>    if (quota < current)
>      throttle_us_per_full += SOMETHING;
>      if (throttle_us_per_full > MAX)
>        throttle_us_per_full = MAX;
>    else
>      throttle_us_per_full -= SOMETHING;
>      if (throttle_us_per_full < 0)
>        throttle_us_per_full = 0;
> 
> I think your algorithm is fine, but thoroughly review every single bit of it in
> one shot will be challenging, and it's also hard to prove every bit of the
> algorithm is helpful, as there're a lot of hand-made macros and state changes.
> 
> I actually tested the current algorithm of yours, the dirty rate fluctuates a
> bit (when I specified 200MB/s, it can go into either a few tens of MB/s or
> 300MB/s, normally less), neither does it respond fast (the initial throtle from
> 500MB/s -> 200MB/s should need 1 minute or something), so it seems not ideal
> anyway. In that case I prefer we start with simple.
> 
> So IMHO we can start with this simple scheme first then it'll start working
> with much less line of codes, afaict.  With that scheme ready in the 1st or
> initial patches, it'll be easier to either apply any better algorithm
> (e.g. your current one, if you're confident with that) or other things then
> it'll be much easier to review too if you could consider split your patch like
> that.
> 
> Normally per my knowledge for the need on migration, we could consider add an
> integral algorithm into this linear algorithm that I said above, and it should
> help us reach a very stable and constant state of throttling already.  But
> we'll need to try it out, as I never tried.
> 
> What do you think?
> 
I absolutely agree with your point, negative feedback system is also 
what i thought in the first place, and theoretically may be the most 
appropriate algo to control the vcpu in a stable dirty page rate from my 
point of view, but at the very beginning i'm not sure the new algo of 
throttling can be accepted, so i adopted the exiting auto-converge algo 
in qemu... :). One of my purposes of posting this patchset is for the 
sake of RFC, and thanks Peter very much for giving the advice.

I'll try it out and see the results. If things go well, the negative 
feedback system to control the dirty page rate for a vcpu will be 
introduced next version.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v9 2/3] cpu-throttle: implement vCPU throttle
  2021-12-08 15:36     ` Hyman
@ 2021-12-08 15:50       ` Hyman
  0 siblings, 0 replies; 21+ messages in thread
From: Hyman @ 2021-12-08 15:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, Markus ArmBruster, David Hildenbrand,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



在 2021/12/8 23:36, Hyman 写道:
> 
> 
> 在 2021/12/6 18:10, Peter Xu 写道:
>> On Fri, Dec 03, 2021 at 09:39:46AM +0800, huangy81@chinatelecom.cn wrote:
>>> +static uint64_t dirtylimit_pct(unsigned int last_pct,
>>> +                               uint64_t quota,
>>> +                               uint64_t current)
>>> +{
>>> +    uint64_t limit_pct = 0;
>>> +    RestrainPolicy policy;
>>> +    bool mitigate = (quota > current) ? true : false;
>>> +
>>> +    if (mitigate && ((current == 0) ||
>>> +        (last_pct <= DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE))) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    policy = dirtylimit_policy(last_pct, quota, current);
>>> +    switch (policy) {
>>> +    case RESTRAIN_SLIGHT:
>>> +        /* [90, 99] */
>>> +        if (mitigate) {
>>> +            limit_pct =
>>> +                last_pct - DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE;
>>> +        } else {
>>> +            limit_pct =
>>> +                last_pct + DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE;
>>> +
>>> +            limit_pct = MIN(limit_pct, CPU_THROTTLE_PCT_MAX);
>>> +        }
>>> +       break;
>>> +    case RESTRAIN_HEAVY:
>>> +        /* [75, 90) */
>>> +        if (mitigate) {
>>> +            limit_pct =
>>> +                last_pct - DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE;
>>> +        } else {
>>> +            limit_pct =
>>> +                last_pct + DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE;
>>> +
>>> +            limit_pct = MIN(limit_pct,
>>> +                DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK);
>>> +        }
>>> +       break;
>>> +    case RESTRAIN_RATIO:
>>> +        /* [0, 75) */
>>> +        if (mitigate) {
>>> +            if (last_pct <= (((quota - current) * 100 / quota))) {
>>> +                limit_pct = 0;
>>> +            } else {
>>> +                limit_pct = last_pct -
>>> +                    ((quota - current) * 100 / quota);
>>> +                limit_pct = MAX(limit_pct, CPU_THROTTLE_PCT_MIN);
>>> +            }
>>> +        } else {
>>> +            limit_pct = last_pct +
>>> +                ((current - quota) * 100 / current);
>>> +
>>> +            limit_pct = MIN(limit_pct,
>>> +                DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK);
>>> +        }
>>> +       break;
>>> +    case RESTRAIN_KEEP:
>>> +    default:
>>> +       limit_pct = last_pct;
>>> +       break;
>>> +    }
>>> +
>>> +    return limit_pct;
>>> +}
>>> +
>>> +static void *dirtylimit_thread(void *opaque)
>>> +{
>>> +    int cpu_index = *(int *)opaque;
>>> +    uint64_t quota_dirtyrate, current_dirtyrate;
>>> +    unsigned int last_pct = 0;
>>> +    unsigned int pct = 0;
>>> +
>>> +    rcu_register_thread();
>>> +
>>> +    quota_dirtyrate = dirtylimit_quota(cpu_index);
>>> +    current_dirtyrate = dirtylimit_current(cpu_index);
>>> +
>>> +    pct = dirtylimit_init_pct(quota_dirtyrate, current_dirtyrate);
>>> +
>>> +    do {
>>> +        trace_dirtylimit_impose(cpu_index,
>>> +            quota_dirtyrate, current_dirtyrate, pct);
>>> +
>>> +        last_pct = pct;
>>> +        if (pct == 0) {
>>> +            sleep(DIRTYLIMIT_CALC_PERIOD_TIME_S);
>>> +        } else {
>>> +            dirtylimit_check(cpu_index, pct);
>>> +        }
>>> +
>>> +        quota_dirtyrate = dirtylimit_quota(cpu_index);
>>> +        current_dirtyrate = dirtylimit_current(cpu_index);
>>> +
>>> +        pct = dirtylimit_pct(last_pct, quota_dirtyrate, 
>>> current_dirtyrate);
>>
>> So what I had in mind is we can start with an extremely simple version of
>> negative feedback system.  Say, firstly each vcpu will have a simple 
>> number to
>> sleep for some interval (this is ugly code, but just show what I 
>> meant..):
>>
>> ===============
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index eecd8031cf..c320fd190f 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -2932,6 +2932,8 @@ int kvm_cpu_exec(CPUState *cpu)
>>               trace_kvm_dirty_ring_full(cpu->cpu_index);
>>               qemu_mutex_lock_iothread();
>>               kvm_dirty_ring_reap(kvm_state);
>> +            if (dirtylimit_enabled(cpu->cpu_index) && 
>> cpu->throttle_us_per_full)
>> +                usleep(cpu->throttle_us_per_full);
>>               qemu_mutex_unlock_iothread();
>>               ret = 0;
>>               break;
>> ===============
>>
>> I think this will have finer granularity when throttle (for 4096 ring 
>> size,
>> that's per-16MB operation) than current way where we inject per-vcpu 
>> async task
>> to sleep, like auto-converge.
>>
>> Then we have the "black box" to tune this value with below input/output:
>>
>>    - Input: dirty rate information, same as current algo
>>
>>    - Output: increase/decrease of per-vcpu throttle_us_per_full above, 
>> and
>>      that's all
>>
>> We can do the sampling per-second, then we keep doing it: we can have 
>> 1 thread
>> doing per-second task collecting dirty rate information for all the 
>> vcpus, then
>> tune that throttle_us_per_full for each of them.
>>
>> The simplest linear algorithm would be as simple as (for each vcpu):
>>
>>    if (quota < current)
>>      throttle_us_per_full += SOMETHING;
>>      if (throttle_us_per_full > MAX)
>>        throttle_us_per_full = MAX;
>>    else
>>      throttle_us_per_full -= SOMETHING;
>>      if (throttle_us_per_full < 0)
>>        throttle_us_per_full = 0;
>>
>> I think your algorithm is fine, but thoroughly review every single bit 
>> of it in
>> one shot will be challenging, and it's also hard to prove every bit of 
>> the
>> algorithm is helpful, as there're a lot of hand-made macros and state 
>> changes.
>>
>> I actually tested the current algorithm of yours, the dirty rate 
>> fluctuates a
>> bit (when I specified 200MB/s, it can go into either a few tens of 
>> MB/s or
>> 300MB/s, normally less), neither does it respond fast (the initial 
>> throtle from
>> 500MB/s -> 200MB/s should need 1 minute or something), so it seems not 
>> ideal
>> anyway. In that case I prefer we start with simple.
>>
>> So IMHO we can start with this simple scheme first then it'll start 
>> working
>> with much less line of codes, afaict.  With that scheme ready in the 
>> 1st or
>> initial patches, it'll be easier to either apply any better algorithm
>> (e.g. your current one, if you're confident with that) or other things 
>> then
>> it'll be much easier to review too if you could consider split your 
>> patch like
>> that.
>>
>> Normally per my knowledge for the need on migration, we could consider 
>> add an
>> integral algorithm into this linear algorithm that I said above, and 
>> it should
>> help us reach a very stable and constant state of throttling already.  
>> But
>> we'll need to try it out, as I never tried.
>>
>> What do you think?
>>
> I absolutely agree with your point, negative feedback system is also 
> what i thought in the first place, and theoretically may be the most 
> appropriate algo to control the vcpu in a stable dirty page rate from my 
> point of view, but at the very beginning i'm not sure the new algo of 
> throttling can be accepted, so i adopted the exiting auto-converge algo 
> in qemu... :). One of my purposes of posting this patchset is for the 
> sake of RFC, and thanks Peter very much for giving the advice.
> 
> I'll try it out and see the results. If things go well, the negative 
> feedback system to control the dirty page rate for a vcpu will be 
> introduced next version.

uh... "method" may be a better word to express what i mean instead of 
"algo" in my reply above, and the real "algo" implemented in "black box".


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-12-08 15:54 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-03  1:39 [PATCH v9 0/3] support dirty restraint on vCPU huangy81
2021-12-03  1:39 ` [PATCH v9 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
2021-12-06 10:18   ` Peter Xu
2021-12-06 15:36     ` Hyman
2021-12-03  1:39 ` [PATCH v9 2/3] cpu-throttle: implement vCPU throttle huangy81
2021-12-06 10:10   ` Peter Xu
2021-12-08 15:36     ` Hyman
2021-12-08 15:50       ` Hyman
2021-12-03  1:39 ` [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU huangy81
2021-12-03 12:34   ` Markus Armbruster
2021-12-04 12:00     ` Hyman Huang
2021-12-06  8:28       ` Peter Xu
2021-12-06 14:56         ` Hyman
2021-12-07  2:24           ` Peter Xu
2021-12-07  4:32             ` Hyman
2021-12-06  8:36   ` Peter Xu
2021-12-06 15:19     ` Hyman
2021-12-07  2:57       ` Peter Xu
2021-12-07  4:38         ` Hyman
2021-12-06  8:39   ` Peter Xu
2021-12-06 15:22     ` Hyman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.