All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/3] support dirty restraint on vCPU
@ 2021-12-14 11:07 huangy81
  2021-12-14 11:07 ` [PATCH v10 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: huangy81 @ 2021-12-14 11:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, David Hildenbrand, Hyman, Juan Quintela,
	Richard Henderson, Markus ArmBruster, Peter Xu,
	Dr. David Alan Gilbert, Paolo Bonzini,
	Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

v10:
- rebase on master
- make the following modifications on patch [1/3]:
  1. Make "dirtylimit-calc" thread joinable and join it after quitting.

  2. Add finalize function to free dirtylimit_calc_state

  3. Do some code clean work

- make the following modifications on patch [2/3]:
  1. Remove the original implementation of throttle according to
     Peter's advice.
     
  2. Introduce a negative feedback system and implement the throttle
     on all vcpu in one thread named "dirtylimit". 

  3. Simplify the algo when calculation the throttle_us_per_full:
     increase/decrease linearly when there exists a wide difference
     between quota and current dirty page rate, increase/decrease
     a fixed time slice when the difference is narrow. This makes
     throttle responds faster and reach the quota smoothly.

  4. Introduce a unfit_cnt in algo to make sure throttle really
     takes effect.

  5. Set the max sleep time 99 times more than "ring_full_time_us".                                                                                                                                                                          
                                                                                                                                                                                                                                             
  6. Make "dirtylimit" thread joinable and join it after quitting.                                                                                                                                                                           
                                                                                                                                                                                                                                             
- make the following modifications on patch [3/3]:                                                                                                                                                                                           
  1. Remove the unplug cpu handling logic.                                                                                                                                                                                                   
                                                                                                                                                                                                                                             
  2. "query-vcpu-dirty-limit" only return dirtylimit information of                                                                                                                                                                          
     vcpus that enable dirtylimit                                                                                                                                                                                                            
                                                                                                                                                                                                                                             
  3. Remove the "dirtylimit_setup" function                                                                                                                                                                                                  
                                                                                                                                                                                                                                             
  4. Trigger the dirtylimit and initialize the global state only                                                                                                                                                                             
     when someone enable dirtylimit, and finalize it after the last                                                                                                                                                                          
     dirtylimit be canceled.                                                                                                                                                                                                                 
                                                                                                                                                                                                                                             
  5. Redefine the qmp command vcpu-dirty-limit/query-vcpu-dirty-limit:                                                                                                                                                                       
     enable/disable dirtylimit use a single command "vcpu-dirty-limit",
     to enable/disabled dirtylimit on specified vcpu only if "cpu-index"
     is specified, otherwise, all vcpu will be affected.

  6. Redefine the hmp command vcpu_dirty_limit/info vcpu_dirty_limit

- other points about the code review:
  1. "merge the code of calculation dirty page rate"
     I think maybe it's not suitable to touch the 'calc-dirty-rate',
     because 'calc-dirty-rate' will stop sync log after calculating 
     the dirtyrate and the 'dirtylimit-cal' will not untill the last
     dirtylimit be canceled, if we merge the GLOBAL_DIRTY_LIMIT into
     GLOBAL_DIRTY_DIRTYRATE, the two are interacted with each other.

  2. The new implementaion of throttle algo enlightened by Peter
     responds faster and consume less cpu resource than the older,
     we make a impressed progress.

     And there is a viewpoint may be discussed, it is that the new 
     throttle logic is "passive", vcpu sleeps only after dirty ring,
     is full, unlike the "auto-converge" which will kick vcpu instead
     in a fixed slice time. If the vcpu is memory-write intensive
     and the ring size is large, it will produce dirty memory during
     the dirty ring full time and the throttle works not so good, it
     means the throttle depends on the dirty ring size. 

     I actually tested the new algo in two case:

     case 1: dirty-ring-size: 4096, dirtyrate: 1170MB/s 
     result: minimum quota dirtyrate is 25MB/s or even less
             minimum vcpu util is 6%

     case 2: dirty-ring-size: 65536, dirtyrate: 1170MB/s 
     result: minimum quota dirtyrate is 256MB/s
             minimum vcpu util is 24%
     
     I post this just for discussion, i think this is not a big deal
     beacase if we set the dirty-ring-size to the maximum value(65536),
     we assume the server's bandwidth is capable of handling it.

  3. I hard-code the acceptable deviation value to 25MB/s, see the
     macro DIRTYLIMIT_TOLERANCE_RANGE. I'm struggling to decide 
     whether to let it configurable
   
  4. Another point is the unplug cpu handle, current algo affects the
     unplugged vcpu, if we set dirty limit on it, we should fork 2 
     thread "dirtylimit" and "dirtylimit-calc" but do nothing, once the
     vcpu is hot-plugged, dirty limit works, i think the logic is ok
     but still there can be different advice.

- to let developers play with it easier, i post the hmp usage example:
  (qemu) vcpu_dirty_limit -g on -1 500
  [Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
  
  (qemu) info vcpu_dirty_limit 
  vcpu[0], limit rate 500 (MB/s), current rate 415 (MB/s)
  vcpu[1], limit rate 500 (MB/s), current rate 496 (MB/s)
  vcpu[2], limit rate 500 (MB/s), current rate 0 (MB/s)
  vcpu[3], limit rate 500 (MB/s), current rate 0 (MB/s)
  (qemu) vcpu_dirty_limit -g off
  [Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
  
  (qemu) info vcpu_dirty_limit 
  Dirty page limit not enabled!
  
  (qemu) vcpu_dirty_limit on 0 300
  [Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
  
  (qemu) vcpu_dirty_limit on 1 500
  [Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
  
  (qemu) info vcpu_dirty_limit 
  vcpu[0], limit rate 300 (MB/s), current rate 342 (MB/s)
  vcpu[1], limit rate 500 (MB/s), current rate 485 (MB/s)
  
  (qemu) vcpu_dirty_limit off 0
  [Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
  
  (qemu) info vcpu_dirty_limit 
  vcpu[1], limit rate 500 (MB/s), current rate 528 (MB/s)
  
  (qemu) vcpu_dirty_limit off 1
  [Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
  
  (qemu) info vcpu_dirty_limit 
  Dirty page limit not enabled!

Thanks very much for the instructive algo suggestion given by Peter,
the comment and other code reviews made by Markus.

Please review, thanks!

v9:
- rebase on master
- fix the meson directory change, keep it untouched.

v8:
- rebase on master
- polish the error message and remove the "unlikely" compilation syntax
  according to the advice given by Markus.
- keep the dirty tracking enabled during "dirtylimit-calc" lifecycle
  so that the overhead can be reduced according to the advice given by
  Peter. 
- merge the "set/cancel" qmp commands into one named "vcpu-dirty-limit"
  and introduce qmp command "query-vcpu-dirty-limit" to query dirty
  limit information about virtual CPU, according to the advice given by
  Peter.
- check if vcpu index is valid and handle the unplug case before
  enabling, disabling dirty limit for virtual CPU.
- introduce hmp commands so developers can play with them easier, use
  "vcpu_dirty_limit" to enable dirty limit and "info vcpu_dirty_limit"
  to query.

The patch [2/3] has not been touched so far. Any corrections and
suggetions are welcome. 

Please review, thanks!

v7:
- rebase on master
- polish the comments and error message according to the
  advices given by Markus
- introduce dirtylimit_enabled function to pre-check if dirty
  page limit is enabled before canceling.

v6:
- rebase on master
- fix dirtylimit setup crash found by Markus
- polish the comments according to the advice given by Markus
- adjust the qemu qmp command tag to 7.0

v5:
- rebase on master
- adjust the throttle algorithm by removing the tuning in 
  RESTRAINT_RATIO case so that dirty page rate could reachs the quota
  more quickly.
- fix percentage update in throttle iteration.

v4:
- rebase on master
- modify the following points according to the advice given by Markus
  1. move the defination into migration.json
  2. polish the comments of set-dirty-limit
  3. do the syntax check and change dirty rate to dirty page rate

Thanks for the carefule reviews made by Markus.

Please review, thanks!

v3:
- rebase on master
- modify the following points according to the advice given by Markus
  1. remove the DirtyRateQuotaVcpu and use its field as option directly
  2. add comments to show details of what dirtylimit setup do
  3. explain how to use dirtylimit in combination with existing qmp
     commands "calc-dirty-rate" and "query-dirty-rate" in documentation.

Thanks for the carefule reviews made by Markus.

Please review, thanks!

Hyman

v2:
- rebase on master
- modify the following points according to the advices given by Juan
  1. rename dirtyrestraint to dirtylimit
  2. implement the full lifecyle function of dirtylimit_calc, include
     dirtylimit_calc and dirtylimit_calc_quit
  3. introduce 'quit' field in dirtylimit_calc_state to implement the
     dirtylimit_calc_quit
  4. remove the ready_cond and ready_mtx since it may not be suitable
  5. put the 'record_dirtypage' function code at the beggining of the
     file
  6. remove the unnecesary return;
- other modifications has been made after code review
  1. introduce 'bmap' and 'nr' field in dirtylimit_state to record the
     number of running thread forked by dirtylimit
  2. stop the dirtyrate calculation thread if all the dirtylimit thread
     are stopped
  3. do some renaming works
     dirtyrate calulation thread -> dirtylimit-calc
     dirtylimit thread -> dirtylimit-{cpu_index}
     function name do_dirtyrestraint -> dirtylimit_check
     qmp command dirty-restraint -> set-drity-limit
     qmp command dirty-restraint-cancel -> cancel-dirty-limit
     header file dirtyrestraint.h -> dirtylimit.h

Please review, thanks !

thanks for the accurate and timely advices given by Juan. we really
appreciate it if corrections and suggetions about this patchset are
proposed.

Best Regards !

Hyman

v1:
this patchset introduce a mechanism to impose dirty restraint
on vCPU, aiming to keep the vCPU running in a certain dirtyrate
given by user. dirty restraint on vCPU maybe an alternative
method to implement convergence logic for live migration,
which could improve guest memory performance during migration
compared with traditional method in theory.

For the current live migration implementation, the convergence
logic throttles all vCPUs of the VM, which has some side effects.
-'read processes' on vCPU will be unnecessarily penalized
- throttle increase percentage step by step, which seems
  struggling to find the optimal throttle percentage when
  dirtyrate is high.
- hard to predict the remaining time of migration if the
  throttling percentage reachs 99%

to a certain extent, the dirty restraint machnism can fix these
effects by throttling at vCPU granularity during migration.

the implementation is rather straightforward, we calculate
vCPU dirtyrate via the Dirty Ring mechanism periodically
as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
does, for vCPU that be specified to impose dirty restraint,
we throttle it periodically as the auto-converge does, once after
throttling, we compare the quota dirtyrate with current dirtyrate,
if current dirtyrate is not under the quota, increase the throttling
percentage until current dirtyrate is under the quota.

this patchset is the basis of implmenting a new auto-converge method
for live migration, we introduce two qmp commands for impose/cancel
the dirty restraint on specified vCPU, so it also can be an independent
api to supply the upper app such as libvirt, which can use it to
implement the convergence logic during live migration, supplemented
with the qmp 'calc-dirty-rate' command or whatever.

we post this patchset for RFC and any corrections and suggetions about
the implementation, api, throttleing algorithm or whatever are very
appreciated!

Please review, thanks !

Best Regards !

Hyman Huang (3):
  migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  cpu-throttle: implement virtual CPU throttle
  cpus-common: implement dirty page limit on virtual CPU

 accel/kvm/kvm-all.c           |  11 ++
 cpus-common.c                 | 155 ++++++++++++++++++
 hmp-commands-info.hx          |  13 ++
 hmp-commands.hx               |  17 ++
 include/exec/memory.h         |   5 +-
 include/hw/core/cpu.h         |   6 +
 include/monitor/hmp.h         |   2 +
 include/sysemu/cpu-throttle.h |  77 +++++++++
 include/sysemu/dirtylimit.h   |  51 ++++++
 include/sysemu/kvm.h          |   2 +
 migration/dirtyrate.c         | 160 ++++++++++++++++--
 migration/dirtyrate.h         |   2 +
 qapi/migration.json           |  63 +++++++
 softmmu/cpu-throttle.c        | 371 ++++++++++++++++++++++++++++++++++++++++++
 softmmu/trace-events          |   6 +
 15 files changed, 930 insertions(+), 11 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v10 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  2021-12-14 11:07 [PATCH v10 0/3] support dirty restraint on vCPU huangy81
@ 2021-12-14 11:07 ` huangy81
  2021-12-23 11:12   ` Peter Xu
  2021-12-14 11:07 ` [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle huangy81
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: huangy81 @ 2021-12-14 11:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, David Hildenbrand, Hyman, Juan Quintela,
	Richard Henderson, Markus ArmBruster, Peter Xu,
	Dr. David Alan Gilbert, Paolo Bonzini,
	Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty restraint.

Implement thread for calculate dirtyrate periodly, which will
be used for dirty page limit.

Add dirtylimit.h to introduce the util function for dirty
limit implementation.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
 include/exec/memory.h       |   5 +-
 include/sysemu/dirtylimit.h |  51 ++++++++++++++
 migration/dirtyrate.c       | 160 +++++++++++++++++++++++++++++++++++++++++---
 migration/dirtyrate.h       |   2 +
 4 files changed, 207 insertions(+), 11 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 20f1b27..606bec8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 /* Dirty tracking enabled because measuring dirty rate */
 #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
 
-#define GLOBAL_DIRTY_MASK  (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT      (1U << 2)
+
+#define GLOBAL_DIRTY_MASK  (0x7)
 
 extern unsigned int global_dirty_tracking;
 
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 0000000..34e48f8
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,51 @@
+/*
+ * dirty limit helper functions
+ *
+ * Copyright (c) 2021 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_TIME_MS         1000    /* 1000ms */
+
+/**
+ * dirtylimit_calc_current
+ *
+ * get current dirty page rate for specified virtual CPU.
+ */
+int64_t dirtylimit_calc_current(int cpu_index);
+
+/**
+ * dirtylimit_calc_start
+ *
+ * start dirty page rate calculation thread.
+ */
+void dirtylimit_calc_start(void);
+
+/**
+ * dirtylimit_calc_quit
+ *
+ * quit dirty page rate calculation thread.
+ */
+void dirtylimit_calc_quit(void);
+
+/**
+ * dirtylimit_calc_state_init
+ *
+ * initialize dirty page rate calculation state.
+ */
+void dirtylimit_calc_state_init(int max_cpus);
+
+/**
+ * dirtylimit_calc_state_finalize
+ *
+ * finalize dirty page rate calculation state.
+ */
+void dirtylimit_calc_state_finalize(void);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index d65e744..e8d4e4a 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -27,6 +27,7 @@
 #include "qapi/qmp/qdict.h"
 #include "sysemu/kvm.h"
 #include "sysemu/runstate.h"
+#include "sysemu/dirtylimit.h"
 #include "exec/memory.h"
 
 /*
@@ -46,6 +47,155 @@ static struct DirtyRateStat DirtyStat;
 static DirtyRateMeasureMode dirtyrate_mode =
                 DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
+struct {
+    DirtyRatesData data;
+    bool quit;
+    QemuThread thread;
+} *dirtylimit_calc_state;
+
+static void dirtylimit_global_dirty_log_start(void)
+{
+    qemu_mutex_lock_iothread();
+    memory_global_dirty_log_start(GLOBAL_DIRTY_LIMIT);
+    qemu_mutex_unlock_iothread();
+}
+
+static void dirtylimit_global_dirty_log_stop(void)
+{
+    qemu_mutex_lock_iothread();
+    memory_global_dirty_log_stop(GLOBAL_DIRTY_LIMIT);
+    qemu_mutex_unlock_iothread();
+}
+
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+                                     CPUState *cpu, bool start)
+{
+    if (start) {
+        dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+    } else {
+        dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+    }
+}
+
+static void dirtylimit_calc_func(void)
+{
+    CPUState *cpu;
+    DirtyPageRecord *dirty_pages;
+    int64_t start_time, end_time, calc_time;
+    DirtyRateVcpu rate;
+    int i = 0;
+
+    dirty_pages = g_malloc0(sizeof(*dirty_pages) *
+        dirtylimit_calc_state->data.nvcpu);
+
+    CPU_FOREACH(cpu) {
+        record_dirtypages(dirty_pages, cpu, true);
+    }
+
+    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000);
+    end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    calc_time = end_time - start_time;
+
+    qemu_mutex_lock_iothread();
+    memory_global_dirty_log_sync();
+    qemu_mutex_unlock_iothread();
+
+    CPU_FOREACH(cpu) {
+        record_dirtypages(dirty_pages, cpu, false);
+    }
+
+    for (i = 0; i < dirtylimit_calc_state->data.nvcpu; i++) {
+        uint64_t increased_dirty_pages =
+            dirty_pages[i].end_pages - dirty_pages[i].start_pages;
+        uint64_t memory_size_MB =
+            (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+        int64_t dirtyrate = (memory_size_MB * 1000) / calc_time;
+
+        rate.id = i;
+        rate.dirty_rate  = dirtyrate;
+        dirtylimit_calc_state->data.rates[i] = rate;
+
+        trace_dirtyrate_do_calculate_vcpu(i,
+            dirtylimit_calc_state->data.rates[i].dirty_rate);
+    }
+
+    free(dirty_pages);
+}
+
+static void *dirtylimit_calc_thread(void *opaque)
+{
+    rcu_register_thread();
+
+    dirtylimit_global_dirty_log_start();
+
+    while (!qatomic_read(&dirtylimit_calc_state->quit)) {
+        dirtylimit_calc_func();
+    }
+
+    dirtylimit_global_dirty_log_stop();
+
+    rcu_unregister_thread();
+    return NULL;
+}
+
+int64_t dirtylimit_calc_current(int cpu_index)
+{
+    DirtyRateVcpu *rates = dirtylimit_calc_state->data.rates;
+
+    return qatomic_read(&rates[cpu_index].dirty_rate);
+}
+
+void dirtylimit_calc_start(void)
+{
+    if (!qatomic_read(&dirtylimit_calc_state->quit)) {
+        goto end;
+    }
+
+    qatomic_set(&dirtylimit_calc_state->quit, 0);
+    qemu_thread_create(&dirtylimit_calc_state->thread,
+                       "dirtylimit-calc",
+                       dirtylimit_calc_thread,
+                       NULL,
+                       QEMU_THREAD_JOINABLE);
+end:
+    return;
+}
+
+void dirtylimit_calc_quit(void)
+{
+    qatomic_set(&dirtylimit_calc_state->quit, 1);
+
+    if (qemu_mutex_iothread_locked()) {
+        qemu_mutex_unlock_iothread();
+        qemu_thread_join(&dirtylimit_calc_state->thread);
+        qemu_mutex_lock_iothread();
+    } else {
+        qemu_thread_join(&dirtylimit_calc_state->thread);
+    }
+}
+
+void dirtylimit_calc_state_init(int max_cpus)
+{
+    dirtylimit_calc_state =
+        g_malloc0(sizeof(*dirtylimit_calc_state));
+
+    dirtylimit_calc_state->data.nvcpu = max_cpus;
+    dirtylimit_calc_state->data.rates =
+        g_malloc0(sizeof(DirtyRateVcpu) * max_cpus);
+
+    dirtylimit_calc_state->quit = true;
+}
+
+void dirtylimit_calc_state_finalize(void)
+{
+    free(dirtylimit_calc_state->data.rates);
+    dirtylimit_calc_state->data.rates = NULL;
+
+    free(dirtylimit_calc_state);
+    dirtylimit_calc_state = NULL;
+}
+
 static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
 {
     int64_t current_time;
@@ -396,16 +546,6 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
     return true;
 }
 
-static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
-                                     CPUState *cpu, bool start)
-{
-    if (start) {
-        dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
-    } else {
-        dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
-    }
-}
-
 static void dirtyrate_global_dirty_log_start(void)
 {
     qemu_mutex_lock_iothread();
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index 69d4c5b..e96acdc 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -70,6 +70,8 @@ typedef struct VcpuStat {
     DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
 } VcpuStat;
 
+typedef struct VcpuStat DirtyRatesData;
+
 /*
  * Store calculation statistics for each measure.
  */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle
  2021-12-14 11:07 [PATCH v10 0/3] support dirty restraint on vCPU huangy81
  2021-12-14 11:07 ` [PATCH v10 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
@ 2021-12-14 11:07 ` huangy81
  2021-12-15  7:15   ` Markus Armbruster
  2021-12-24  5:12   ` Peter Xu
  2021-12-14 11:07 ` [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU huangy81
  2021-12-24  5:17 ` [PATCH v10 0/3] support dirty restraint on vCPU Peter Xu
  3 siblings, 2 replies; 30+ messages in thread
From: huangy81 @ 2021-12-14 11:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, David Hildenbrand, Hyman, Juan Quintela,
	Richard Henderson, Markus ArmBruster, Peter Xu,
	Dr. David Alan Gilbert, Paolo Bonzini,
	Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

Setup a negative feedback system when vCPU thread
handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
throttle_us_per_full field in struct CPUState. Sleep
throttle_us_per_full microseconds to throttle vCPU
if dirtylimit is enabled.

Start a thread to track current dirty page rates and
tune the throttle_us_per_full dynamically untill current
dirty page rate reach the quota.

Introduce the util function in the header for dirtylimit
implementation.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
 accel/kvm/kvm-all.c           |  11 ++
 include/hw/core/cpu.h         |   6 +
 include/sysemu/cpu-throttle.h |  77 +++++++++
 include/sysemu/kvm.h          |   2 +
 qapi/migration.json           |  19 +++
 softmmu/cpu-throttle.c        | 371 ++++++++++++++++++++++++++++++++++++++++++
 softmmu/trace-events          |   6 +
 7 files changed, 492 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index eecd803..cba5fed 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -45,6 +45,7 @@
 #include "qemu/guest-random.h"
 #include "sysemu/hw_accel.h"
 #include "kvm-cpus.h"
+#include "sysemu/cpu-throttle.h"
 
 #include "hw/boards.h"
 
@@ -2303,6 +2304,11 @@ bool kvm_dirty_ring_enabled(void)
     return kvm_state->kvm_dirty_ring_size ? true : false;
 }
 
+uint32_t kvm_dirty_ring_size(void)
+{
+    return kvm_state->kvm_dirty_ring_size;
+}
+
 static int kvm_init(MachineState *ms)
 {
     MachineClass *mc = MACHINE_GET_CLASS(ms);
@@ -2933,6 +2939,11 @@ int kvm_cpu_exec(CPUState *cpu)
             qemu_mutex_lock_iothread();
             kvm_dirty_ring_reap(kvm_state);
             qemu_mutex_unlock_iothread();
+            if (dirtylimit_in_service() &&
+                dirtylimit_is_enabled(cpu->cpu_index) &&
+                cpu->throttle_us_per_full) {
+                usleep(cpu->throttle_us_per_full);
+            }
             ret = 0;
             break;
         case KVM_EXIT_SYSTEM_EVENT:
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index e948e81..be80fe2 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -411,6 +411,12 @@ struct CPUState {
      */
     bool throttle_thread_scheduled;
 
+    /*
+     * Sleep throttle_us_per_full microseconds once dirty ring is full
+     * when dirty page limit is enabled.
+     */
+    int64_t throttle_us_per_full;
+
     bool ignore_memory_transaction_failures;
 
     struct hax_vcpu_state *hax_vcpu;
diff --git a/include/sysemu/cpu-throttle.h b/include/sysemu/cpu-throttle.h
index d65bdef..d4973a5 100644
--- a/include/sysemu/cpu-throttle.h
+++ b/include/sysemu/cpu-throttle.h
@@ -65,4 +65,81 @@ bool cpu_throttle_active(void);
  */
 int cpu_throttle_get_percentage(void);
 
+/**
+ * dirtylimit_is_enabled
+ *
+ * Returns: %true if dirty page rate limit on specified virtual CPU is enabled,
+ *          %false otherwise.
+ */
+bool dirtylimit_is_enabled(int cpu_index);
+
+/**
+ * dirtylimit_in_service
+ *
+ * Returns: %true if dirty page rate limit thread is running, %false otherwise.
+ */
+bool dirtylimit_in_service(void);
+
+/**
+ * dirtylimit_stop
+ *
+ * stop dirty page rate limit thread.
+ */
+void dirtylimit_stop(void);
+
+/**
+ * dirtylimit_is_vcpu_index_valid
+ *
+ * Returns: %true if cpu index valid, %false otherwise.
+ */
+bool dirtylimit_is_vcpu_index_valid(int cpu_index);
+
+/**
+ * dirtylimit_state_init
+ *
+ * initialize golobal state for dirty page rate limit.
+ */
+void dirtylimit_state_init(int max_cpus);
+
+/**
+ * dirtylimit_state_finalize
+ *
+ * finalize golobal state for dirty page rate limit.
+ */
+void dirtylimit_state_finalize(void);
+
+/**
+ * dirtylimit_vcpu
+ *
+ * setup dirty page rate limit on specified virtual CPU with quota.
+ */
+void dirtylimit_vcpu(int cpu_index, uint64_t quota);
+
+/**
+ * dirtylimit_all
+ *
+ * setup dirty page rate limit on all virtual CPU with quota.
+ */
+void dirtylimit_all(uint64_t quota);
+
+/**
+ * dirtylimit_query_all
+ *
+ * Returns: dirty page limit information of all virtual CPU enabled.
+ */
+struct DirtyLimitInfoList *dirtylimit_query_all(void);
+
+/**
+ * dirtylimit_cancel_vcpu
+ *
+ * cancel dirtylimit for the specified virtual CPU.
+ */
+void dirtylimit_cancel_vcpu(int cpu_index);
+
+/**
+ * dirtylimit_cancel_all
+ *
+ * cancel dirtylimit for all virtual CPU enabled.
+ */
+void dirtylimit_cancel_all(void);
 #endif /* SYSEMU_CPU_THROTTLE_H */
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 7b22aeb..e5a9a28 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -548,4 +548,6 @@ bool kvm_cpu_check_are_resettable(void);
 bool kvm_arch_cpu_check_are_resettable(void);
 
 bool kvm_dirty_ring_enabled(void);
+
+uint32_t kvm_dirty_ring_size(void);
 #endif
diff --git a/qapi/migration.json b/qapi/migration.json
index bbfd48c..ac5fa56 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1850,6 +1850,25 @@
 { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
 
 ##
+# @DirtyLimitInfo:
+#
+# Dirty page rate limit information of virtual CPU.
+#
+# @cpu-index: index of virtual CPU.
+#
+# @limit-rate: upper limit of dirty page rate for virtual CPU.
+#
+# @current-rate: current dirty page rate for virtual CPU.
+#
+# Since: 7.0
+#
+##
+{ 'struct': 'DirtyLimitInfo',
+  'data': { 'cpu-index': 'int',
+            'limit-rate': 'int64',
+            'current-rate': 'int64' } }
+
+##
 # @snapshot-save:
 #
 # Save a VM snapshot
diff --git a/softmmu/cpu-throttle.c b/softmmu/cpu-throttle.c
index 8c2144a..4472ab3 100644
--- a/softmmu/cpu-throttle.c
+++ b/softmmu/cpu-throttle.c
@@ -29,6 +29,10 @@
 #include "qemu/main-loop.h"
 #include "sysemu/cpus.h"
 #include "sysemu/cpu-throttle.h"
+#include "sysemu/dirtylimit.h"
+#include "sysemu/kvm.h"
+#include "qapi/qapi-commands-migration.h"
+#include "trace.h"
 
 /* vcpu throttling controls */
 static QEMUTimer *throttle_timer;
@@ -38,6 +42,373 @@ static unsigned int throttle_percentage;
 #define CPU_THROTTLE_PCT_MAX 99
 #define CPU_THROTTLE_TIMESLICE_NS 10000000
 
+#define DIRTYLIMIT_TOLERANCE_RANGE  25      /* 25MB/s */
+#define DIRTYLIMIT_THROTTLE_PCT_WATERMARK   50
+
+typedef struct DirtyLimitState {
+    int cpu_index;
+    bool enabled;
+    uint64_t quota;     /* quota dirtyrate MB/s */
+    int unfit_cnt;
+} DirtyLimitState;
+
+struct {
+    DirtyLimitState *states;
+    int max_cpus;
+    unsigned long *bmap; /* running thread bitmap */
+    unsigned long nr;
+    QemuThread thread;
+} *dirtylimit_state;
+
+static bool dirtylimit_quit = true;
+
+bool dirtylimit_is_enabled(int cpu_index)
+{
+    return qatomic_read(&dirtylimit_state->states[cpu_index].enabled);
+}
+
+static inline void dirtylimit_enable(int cpu_index)
+{
+    qatomic_set(&dirtylimit_state->states[cpu_index].enabled, 1);
+}
+
+static inline void dirtylimit_disable(int cpu_index)
+{
+    qatomic_set(&dirtylimit_state->states[cpu_index].enabled, 0);
+}
+
+bool dirtylimit_in_service(void)
+{
+    return !qatomic_read(&dirtylimit_quit);
+}
+
+void dirtylimit_stop(void)
+{
+    qatomic_set(&dirtylimit_quit, 1);
+    if (qemu_mutex_iothread_locked()) {
+        qemu_mutex_unlock_iothread();
+        qemu_thread_join(&dirtylimit_state->thread);
+        qemu_mutex_lock_iothread();
+    } else {
+        qemu_thread_join(&dirtylimit_state->thread);
+    }
+}
+
+static void dirtylimit_start(void)
+{
+    qatomic_set(&dirtylimit_quit, 0);
+}
+
+bool dirtylimit_is_vcpu_index_valid(int cpu_index)
+{
+    return !(cpu_index < 0 ||
+             cpu_index >= dirtylimit_state->max_cpus);
+}
+
+static inline void dirtylimit_set_quota(int cpu_index, uint64_t quota)
+{
+    dirtylimit_state->states[cpu_index].quota = quota;
+}
+
+static inline uint64_t dirtylimit_quota(int cpu_index)
+{
+    return dirtylimit_state->states[cpu_index].quota;
+}
+
+static int64_t dirtylimit_current(int cpu_index)
+{
+    return dirtylimit_calc_current(cpu_index);
+}
+
+static inline int dirtylimit_unfit_cnt(int cpu_index)
+{
+    return dirtylimit_state->states[cpu_index].unfit_cnt;
+}
+
+static inline int dirtylimit_unfit_cnt_inc(int cpu_index)
+{
+    return ++dirtylimit_state->states[cpu_index].unfit_cnt;
+}
+
+static inline void dirtylimit_set_unfit_cnt(int cpu_index, int count)
+{
+    dirtylimit_state->states[cpu_index].unfit_cnt = count;
+}
+
+static inline int64_t dirtylimit_dirty_ring_full_time(uint64_t dirtyrate)
+{
+    static uint64_t max_dirtyrate;
+    uint32_t dirty_ring_size = kvm_dirty_ring_size();
+    uint64_t dirty_ring_size_meory_MB =
+        dirty_ring_size * TARGET_PAGE_SIZE >> 20;
+
+    if (max_dirtyrate < dirtyrate) {
+        max_dirtyrate = dirtyrate;
+    }
+
+    return dirty_ring_size_meory_MB * 1000000 / max_dirtyrate;
+}
+
+static inline bool dirtylimit_hit(uint64_t quota,
+                                  uint64_t current)
+{
+    uint64_t min, max;
+
+    min = MIN(quota, current);
+    max = MAX(quota, current);
+
+    return ((max - min) <= DIRTYLIMIT_TOLERANCE_RANGE) ? true : false;
+}
+
+static inline bool dirtylimit_turbo(uint64_t quota,
+                                    uint64_t current)
+{
+    uint64_t min, max, pct;
+
+    min = MIN(quota, current);
+    max = MAX(quota, current);
+
+    pct = (max - min) * 100 / max;
+
+    return pct > DIRTYLIMIT_THROTTLE_PCT_WATERMARK;
+}
+
+static void dirtylimit_throttle_init(CPUState *cpu,
+                                     uint64_t quota,
+                                     uint64_t current)
+{
+    uint64_t pct = 0;
+    int64_t throttle_us;
+
+    if (quota >= current || (current == 0)) {
+        cpu->throttle_us_per_full = 0;
+    } else {
+        pct = (current - quota) * 100 / current;
+        pct = MIN(pct, DIRTYLIMIT_THROTTLE_PCT_WATERMARK);
+        pct = (double)pct / 100;
+
+        throttle_us = dirtylimit_dirty_ring_full_time(current) / (1 - pct);
+        cpu->throttle_us_per_full = throttle_us;
+    }
+}
+
+static void dirtylimit_throttle(CPUState *cpu)
+{
+    int64_t ring_full_time_us = 0;
+    uint64_t quota = 0;
+    uint64_t current = 0;
+    uint64_t sleep_pct = 0;
+    uint64_t throttle_us = 0;
+
+    quota = dirtylimit_quota(cpu->cpu_index);
+    current = dirtylimit_current(cpu->cpu_index);
+
+    if (current == 0 &&
+        dirtylimit_unfit_cnt(cpu->cpu_index) == 0) {
+        cpu->throttle_us_per_full = 0;
+        goto end;
+    } else if (cpu->throttle_us_per_full == 0) {
+        dirtylimit_throttle_init(cpu, quota, current);
+        goto end;
+    } else if (dirtylimit_hit(quota, current)) {
+        goto end;
+    } else if (dirtylimit_unfit_cnt_inc(cpu->cpu_index) < 2) {
+        goto end;
+    }
+
+    dirtylimit_set_unfit_cnt(cpu->cpu_index, 0);
+
+    ring_full_time_us = dirtylimit_dirty_ring_full_time(current);
+    if (dirtylimit_turbo(quota, current)) {
+        if (quota < current) {
+            sleep_pct = (current - quota) * 100 / current;
+            throttle_us =
+                ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
+            cpu->throttle_us_per_full += throttle_us;
+        } else {
+            sleep_pct = (quota - current) * 100 / quota;
+            throttle_us =
+                ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
+            cpu->throttle_us_per_full -= throttle_us;
+        }
+
+        trace_dirtylimit_throttle_pct(cpu->cpu_index,
+                                      sleep_pct,
+                                      throttle_us);
+    } else {
+        if (quota < current) {
+            cpu->throttle_us_per_full += ring_full_time_us / 10;
+        } else {
+            cpu->throttle_us_per_full -= ring_full_time_us / 10;
+        }
+    }
+
+    cpu->throttle_us_per_full = MIN(cpu->throttle_us_per_full,
+        ring_full_time_us * CPU_THROTTLE_PCT_MAX);
+
+    cpu->throttle_us_per_full = MAX(cpu->throttle_us_per_full, 0);
+
+end:
+    trace_dirtylimit_throttle(cpu->cpu_index,
+                              quota, current,
+                              cpu->throttle_us_per_full);
+    return;
+}
+
+static void *dirtylimit_thread(void *opaque)
+{
+    CPUState *cpu;
+
+    rcu_register_thread();
+
+    while (dirtylimit_in_service()) {
+        sleep(DIRTYLIMIT_CALC_TIME_MS / 1000);
+        CPU_FOREACH(cpu) {
+            if (!dirtylimit_is_enabled(cpu->cpu_index)) {
+                continue;
+            }
+            dirtylimit_throttle(cpu);
+        }
+    }
+
+    rcu_unregister_thread();
+
+    return NULL;
+}
+
+static struct DirtyLimitInfo *dirtylimit_query_vcpu(int cpu_index)
+{
+    DirtyLimitInfo *info = NULL;
+
+    info = g_malloc0(sizeof(*info));
+    info->cpu_index = cpu_index;
+    info->limit_rate = dirtylimit_quota(cpu_index);
+    info->current_rate = dirtylimit_current(cpu_index);
+
+    return info;
+}
+
+struct DirtyLimitInfoList *dirtylimit_query_all(void)
+{
+    int i, index;
+    DirtyLimitInfo *info = NULL;
+    DirtyLimitInfoList *head = NULL, **tail = &head;
+
+    for (i = 0; i < dirtylimit_state->max_cpus; i++) {
+        index = dirtylimit_state->states[i].cpu_index;
+        if (dirtylimit_is_enabled(index)) {
+            info = dirtylimit_query_vcpu(index);
+            QAPI_LIST_APPEND(tail, info);
+        }
+    }
+
+    return head;
+}
+
+static int dirtylimit_nvcpus(void)
+{
+    int i;
+    int nr = 0;
+    for (i = 0; i < dirtylimit_state->nr; i++) {
+        unsigned long temp = dirtylimit_state->bmap[i];
+        nr += ctpopl(temp);
+    }
+
+   return nr;
+}
+
+void dirtylimit_cancel_vcpu(int cpu_index)
+{
+    if (!dirtylimit_is_enabled(cpu_index)) {
+        return;
+    }
+
+    dirtylimit_set_quota(cpu_index, 0);
+    dirtylimit_disable(cpu_index);
+    bitmap_test_and_clear_atomic(dirtylimit_state->bmap, cpu_index, 1);
+
+    if (dirtylimit_nvcpus() == 0) {
+        dirtylimit_stop();
+    }
+}
+
+void dirtylimit_cancel_all(void)
+{
+    int i, index;
+
+    for (i = 0; i < dirtylimit_state->max_cpus; i++) {
+        index = dirtylimit_state->states[i].cpu_index;
+        if (dirtylimit_is_enabled(index)) {
+            dirtylimit_cancel_vcpu(index);
+        }
+    }
+}
+
+void dirtylimit_vcpu(int cpu_index, uint64_t quota)
+{
+    trace_dirtylimit_vcpu(cpu_index, quota);
+
+    dirtylimit_set_quota(cpu_index, quota);
+    dirtylimit_enable(cpu_index);
+    bitmap_set_atomic(dirtylimit_state->bmap, cpu_index, 1);
+
+    if (dirtylimit_in_service()) {
+        goto end;
+    }
+
+    dirtylimit_start();
+    qemu_thread_create(&dirtylimit_state->thread,
+                       "dirtylimit",
+                       dirtylimit_thread,
+                       NULL,
+                       QEMU_THREAD_JOINABLE);
+end:
+    return;
+}
+
+void dirtylimit_all(uint64_t quota)
+{
+    int i, index;
+
+    for (i = 0; i < dirtylimit_state->max_cpus; i++) {
+        index = dirtylimit_state->states[i].cpu_index;
+        dirtylimit_vcpu(index, quota);
+    }
+}
+
+void dirtylimit_state_init(int max_cpus)
+{
+    int i;
+
+    dirtylimit_state = g_malloc0(sizeof(*dirtylimit_state));
+
+    dirtylimit_state->states =
+            g_malloc0(sizeof(DirtyLimitState) * max_cpus);
+
+    for (i = 0; i < max_cpus; i++) {
+        dirtylimit_state->states[i].cpu_index = i;
+    }
+
+    dirtylimit_state->max_cpus = max_cpus;
+    dirtylimit_state->bmap = bitmap_new(max_cpus);
+    bitmap_clear(dirtylimit_state->bmap, 0, max_cpus);
+    dirtylimit_state->nr = BITS_TO_LONGS(max_cpus);
+
+    trace_dirtylimit_state_init(max_cpus);
+}
+
+void dirtylimit_state_finalize(void)
+{
+    free(dirtylimit_state->states);
+    dirtylimit_state->states = NULL;
+
+    free(dirtylimit_state->bmap);
+    dirtylimit_state->bmap = NULL;
+
+    free(dirtylimit_state);
+    dirtylimit_state = NULL;
+}
+
 static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
 {
     double pct;
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 9c88887..f19acf8 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -31,3 +31,9 @@ runstate_set(int current_state, const char *current_state_str, int new_state, co
 system_wakeup_request(int reason) "reason=%d"
 qemu_system_shutdown_request(int reason) "reason=%d"
 qemu_system_powerdown_request(void) ""
+
+#cpu-throttle.c
+dirtylimit_state_init(int max_cpus) "dirtylimit state init: max cpus %d"
+dirtylimit_throttle(int cpu_index, uint64_t quota, uint64_t current, int64_t time_us) "CPU[%d] throttle: quota %" PRIu64 ", current %" PRIu64 ", throttle %"PRIi64 " us"
+dirtylimit_throttle_pct(int cpu_index, uint64_t pct, int64_t time_us) "CPU[%d] throttle percent: %" PRIu64 ", throttle adjust time %"PRIi64 " us"
+dirtylimit_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set dirty page rate limit %"PRIu64
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-14 11:07 [PATCH v10 0/3] support dirty restraint on vCPU huangy81
  2021-12-14 11:07 ` [PATCH v10 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
  2021-12-14 11:07 ` [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle huangy81
@ 2021-12-14 11:07 ` huangy81
  2021-12-15  7:37   ` Markus Armbruster
  2021-12-24  5:14   ` Peter Xu
  2021-12-24  5:17 ` [PATCH v10 0/3] support dirty restraint on vCPU Peter Xu
  3 siblings, 2 replies; 30+ messages in thread
From: huangy81 @ 2021-12-14 11:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, David Hildenbrand, Hyman, Juan Quintela,
	Richard Henderson, Markus ArmBruster, Peter Xu,
	Dr. David Alan Gilbert, Paolo Bonzini,
	Philippe Mathieu-Daudé

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

Implement dirtyrate calculation periodically basing on
dirty-ring and throttle virtual CPU until it reachs the quota
dirty page rate given by user.

Introduce qmp commands "vcpu-dirty-limit", "query-vcpu-dirty-limit"
to enable, disable, query dirty page limit for virtual CPU.

Meanwhile, introduce corresponding hmp commands "vcpu_dirty_limit",
"info vcpu_dirty_limit" so developers can play with them easier.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
 cpus-common.c         | 155 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hmp-commands-info.hx  |  13 +++++
 hmp-commands.hx       |  17 ++++++
 include/monitor/hmp.h |   2 +
 qapi/migration.json   |  44 ++++++++++++++
 5 files changed, 231 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index 6e73d3e..37c3584 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -23,6 +23,15 @@
 #include "hw/core/cpu.h"
 #include "sysemu/cpus.h"
 #include "qemu/lockable.h"
+#include "sysemu/dirtylimit.h"
+#include "sysemu/cpu-throttle.h"
+#include "sysemu/kvm.h"
+#include "monitor/hmp.h"
+#include "monitor/monitor.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-migration.h"
+#include "hw/boards.h"
 
 static QemuMutex qemu_cpu_list_lock;
 static QemuCond exclusive_cond;
@@ -352,3 +361,149 @@ void process_queued_cpu_work(CPUState *cpu)
     qemu_mutex_unlock(&cpu->work_mutex);
     qemu_cond_broadcast(&qemu_work_cond);
 }
+
+void qmp_vcpu_dirty_limit(bool enable,
+                          bool has_cpu_index,
+                          uint64_t cpu_index,
+                          bool has_dirty_rate,
+                          uint64_t dirty_rate,
+                          Error **errp)
+{
+    static bool initialized;
+
+    if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
+        if (enable) {
+            error_setg(errp, "dirty page limit feature requires KVM with"
+                       " accelerator property 'dirty-ring-size' set'");
+        }
+        return;
+    }
+
+    if (!enable && !dirtylimit_in_service()) {
+        return;
+    }
+
+    if (!initialized) {
+        MachineState *ms = MACHINE(qdev_get_machine());
+        dirtylimit_calc_state_init(ms->smp.max_cpus);
+        dirtylimit_state_init(ms->smp.max_cpus);
+        initialized = true;
+    }
+
+    if (enable && !has_dirty_rate) {
+        error_setg(errp, "enable dirty page limit feature requires"
+                   " 'dirty-rate' set'");
+        return;
+    }
+
+    if (has_cpu_index && !dirtylimit_is_vcpu_index_valid(cpu_index)) {
+        error_setg(errp, "incorrect cpu index specified");
+        return;
+    }
+
+    if (enable) {
+        dirtylimit_calc_start();
+        if (has_cpu_index) {
+            dirtylimit_vcpu(cpu_index, dirty_rate);
+        } else {
+            dirtylimit_all(dirty_rate);
+        }
+    } else {
+        if (has_cpu_index) {
+            dirtylimit_cancel_vcpu(cpu_index);
+        } else {
+            dirtylimit_cancel_all();
+        }
+
+        if (!dirtylimit_in_service()) {
+            dirtylimit_calc_quit();
+            dirtylimit_state_finalize();
+            dirtylimit_calc_state_finalize();
+            initialized = false;
+        }
+    }
+}
+
+void hmp_vcpu_dirty_limit(Monitor *mon, const QDict *qdict)
+{
+    bool global = qdict_get_try_bool(qdict, "global", false);
+    bool enable = qdict_get_bool(qdict, "enable");
+    int64_t cpu_index = qdict_get_try_int(qdict, "cpu_index", -1);
+    int64_t dirty_rate = qdict_get_try_int(qdict, "dirty_rate", -1);
+    Error *err = NULL;
+
+    if (enable && dirty_rate < 0) {
+        monitor_printf(mon, "Dirty page limit requires dirty_rate set!\n");
+        return;
+    }
+
+    if (enable && !global && cpu_index < 0) {
+        monitor_printf(mon, "Dirty page limit requires cpu_index set!\n");
+        return;
+    }
+
+    if (global && cpu_index != -1) {
+        monitor_printf(mon, "Either global option or cpu_index can be set!\n");
+        return;
+    }
+
+    if (global) {
+        if (enable) {
+            qmp_vcpu_dirty_limit(true, false, -1, true, dirty_rate, &err);
+        } else {
+            qmp_vcpu_dirty_limit(false, false, -1, false, -1, &err);
+        }
+    } else {
+        if (enable) {
+            qmp_vcpu_dirty_limit(true, true, cpu_index, true, dirty_rate, &err);
+        } else {
+            qmp_vcpu_dirty_limit(false, true, cpu_index, false, -1, &err);
+        }
+    }
+
+    if (err) {
+        hmp_handle_error(mon, err);
+        return;
+    }
+
+    monitor_printf(mon, "[Please use 'info vcpu_dirty_limit' to query "
+                   "dirty limit for virtual CPU]\n");
+}
+
+struct DirtyLimitInfoList *qmp_query_vcpu_dirty_limit(Error **errp)
+{
+    if (!dirtylimit_in_service()) {
+        error_setg(errp, "dirty page limit not enabled");
+        return NULL;
+    }
+
+    return dirtylimit_query_all();
+}
+
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict)
+{
+    DirtyLimitInfoList *limit, *head, *info = NULL;
+    Error *err = NULL;
+
+    if (!dirtylimit_in_service()) {
+        monitor_printf(mon, "Dirty page limit not enabled!\n");
+        return;
+    }
+
+    info = qmp_query_vcpu_dirty_limit(&err);
+    if (err) {
+        hmp_handle_error(mon, err);
+        return;
+    }
+
+    head = info;
+    for (limit = head; limit != NULL; limit = limit->next) {
+        monitor_printf(mon, "vcpu[%"PRIi64"], limit rate %"PRIi64 " (MB/s),"
+                            " current rate %"PRIi64 " (MB/s)\n",
+                            limit->value->cpu_index,
+                            limit->value->limit_rate,
+                            limit->value->current_rate);
+    }
+
+    g_free(info);
+}
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 407a1da..5dd3001 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -863,6 +863,19 @@ SRST
     Display the vcpu dirty rate information.
 ERST
 
+    {
+        .name       = "vcpu_dirty_limit",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show dirty page limit information of all vCPU",
+        .cmd        = hmp_info_vcpu_dirty_limit,
+    },
+
+SRST
+  ``info vcpu_dirty_limit``
+    Display the vcpu dirty page limit information.
+ERST
+
 #if defined(TARGET_I386)
     {
         .name       = "sgx",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 70a9136..ef0f7cc 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1744,3 +1744,20 @@ ERST
                       "\n\t\t\t -b to specify dirty bitmap as method of calculation)",
         .cmd        = hmp_calc_dirty_rate,
     },
+
+SRST
+``vcpu_dirty_limit``
+  Start dirty page rate limit on a virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+    {
+        .name       = "vcpu_dirty_limit",
+        .args_type  = "global:-g,enable:b,cpu_index:l?,dirty_rate:l?",
+        .params     = "[-g] on|off [cpu_index] [dirty_rate]",
+        .help       = "turn on,off dirty page rate limit"
+                      "\n\t\t (use -g to affect all vCPU, cpu_index required be set to -1 if"
+                      "\n\t\t enable all vCPU. dirty_rate should be specified if turned on)",
+        .cmd        = hmp_vcpu_dirty_limit,
+    },
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 96d0148..04879a2 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -131,6 +131,8 @@ void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
 void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
 void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_human_readable_text_helper(Monitor *mon,
                                     HumanReadableText *(*qmp_handler)(Error **));
 
diff --git a/qapi/migration.json b/qapi/migration.json
index ac5fa56..7d8da4f 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1869,6 +1869,50 @@
             'current-rate': 'int64' } }
 
 ##
+# @vcpu-dirty-limit:
+#
+# Set or cancel the upper limit of dirty page rate for a virtual CPU.
+#
+# Requires KVM with accelerator property "dirty-ring-size" set.
+# A virtual CPU's dirty page rate is a measure of its memory load.
+# To observe dirty page rates, use @calc-dirty-rate.
+#
+# @enable: true to enable, false to disable.
+#
+# @cpu-index: index of virtual CPU, default is all.
+#
+# @dirty-rate: upper limit of dirty page rate for virtual CPU, to
+#              cancel dirty limit, this field will be ignored.
+#
+# Since: 7.0
+#
+# Example:
+#   {"execute": "vcpu-dirty-limit"}
+#    "arguments": { "enable": true,
+#                   "cpu-index": 1,
+#                   "dirty-rate": 200 } }
+#
+##
+{ 'command': 'vcpu-dirty-limit',
+  'data': { 'enable': 'bool',
+            '*cpu-index': 'uint64',
+            '*dirty-rate': 'uint64'} }
+
+##
+# @query-vcpu-dirty-limit:
+#
+# Returns information about all virtual CPU dirty limit if enabled.
+#
+# Since: 7.0
+#
+# Example:
+#   {"execute": "query-vcpu-dirty-limit"}
+#
+##
+{ 'command': 'query-vcpu-dirty-limit',
+  'returns': [ 'DirtyLimitInfo' ] }
+
+##
 # @snapshot-save:
 #
 # Save a VM snapshot
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle
  2021-12-14 11:07 ` [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle huangy81
@ 2021-12-15  7:15   ` Markus Armbruster
  2021-12-15  7:40     ` Hyman Huang
  2021-12-24  5:12   ` Peter Xu
  1 sibling, 1 reply; 30+ messages in thread
From: Markus Armbruster @ 2021-12-15  7:15 UTC (permalink / raw)
  To: huangy81
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

huangy81@chinatelecom.cn writes:

> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>
> Setup a negative feedback system when vCPU thread
> handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
> throttle_us_per_full field in struct CPUState. Sleep
> throttle_us_per_full microseconds to throttle vCPU
> if dirtylimit is enabled.
>
> Start a thread to track current dirty page rates and
> tune the throttle_us_per_full dynamically untill current
> dirty page rate reach the quota.
>
> Introduce the util function in the header for dirtylimit
> implementation.
>
> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index bbfd48c..ac5fa56 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1850,6 +1850,25 @@
>  { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
>  
>  ##
> +# @DirtyLimitInfo:
> +#
> +# Dirty page rate limit information of virtual CPU.
> +#
> +# @cpu-index: index of virtual CPU.
> +#
> +# @limit-rate: upper limit of dirty page rate for virtual CPU.

If I understand your code correctly, zero means unlimited.  This is
undocumented.

In review of v9, I asked to "make @dirty-rate optional, present means
enable, absent means disable."  Any particular reason for not doing it
that way?

> +#
> +# @current-rate: current dirty page rate for virtual CPU.
> +#
> +# Since: 7.0
> +#
> +##
> +{ 'struct': 'DirtyLimitInfo',
> +  'data': { 'cpu-index': 'int',
> +            'limit-rate': 'int64',
> +            'current-rate': 'int64' } }
> +
> +##
>  # @snapshot-save:
>  #
>  # Save a VM snapshot

[...]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-14 11:07 ` [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU huangy81
@ 2021-12-15  7:37   ` Markus Armbruster
  2021-12-15  7:56     ` Hyman Huang
  2021-12-24  5:14   ` Peter Xu
  1 sibling, 1 reply; 30+ messages in thread
From: Markus Armbruster @ 2021-12-15  7:37 UTC (permalink / raw)
  To: huangy81
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

huangy81@chinatelecom.cn writes:

> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>
> Implement dirtyrate calculation periodically basing on
> dirty-ring and throttle virtual CPU until it reachs the quota
> dirty page rate given by user.
>
> Introduce qmp commands "vcpu-dirty-limit", "query-vcpu-dirty-limit"
> to enable, disable, query dirty page limit for virtual CPU.
>
> Meanwhile, introduce corresponding hmp commands "vcpu_dirty_limit",
> "info vcpu_dirty_limit" so developers can play with them easier.
>
> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

[...]

> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
> index 96d0148..04879a2 100644
> --- a/include/monitor/hmp.h
> +++ b/include/monitor/hmp.h
> @@ -131,6 +131,8 @@ void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
>  void hmp_replay_seek(Monitor *mon, const QDict *qdict);
>  void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
>  void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
> +void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
> +void hmp_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
>  void hmp_human_readable_text_helper(Monitor *mon,
>                                      HumanReadableText *(*qmp_handler)(Error **));
>  
> diff --git a/qapi/migration.json b/qapi/migration.json
> index ac5fa56..7d8da4f 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1869,6 +1869,50 @@
>              'current-rate': 'int64' } }
>  
>  ##
> +# @vcpu-dirty-limit:
> +#
> +# Set or cancel the upper limit of dirty page rate for a virtual CPU.
> +#
> +# Requires KVM with accelerator property "dirty-ring-size" set.
> +# A virtual CPU's dirty page rate is a measure of its memory load.
> +# To observe dirty page rates, use @calc-dirty-rate.
> +#
> +# @enable: true to enable, false to disable.
> +#
> +# @cpu-index: index of virtual CPU, default is all.

Suggest

    # @cpu-index: The vCPU to act upon (all by default).

(I'm stealing this from trace.json)

> +#
> +# @dirty-rate: upper limit of dirty page rate for virtual CPU, to
> +#              cancel dirty limit, this field will be ignored.
> +#
> +# Since: 7.0
> +#
> +# Example:
> +#   {"execute": "vcpu-dirty-limit"}
> +#    "arguments": { "enable": true,
> +#                   "cpu-index": 1,
> +#                   "dirty-rate": 200 } }
> +#
> +##
> +{ 'command': 'vcpu-dirty-limit',
> +  'data': { 'enable': 'bool',
> +            '*cpu-index': 'uint64',
> +            '*dirty-rate': 'uint64'} }

Drop @enable, please.

If @dirty-rate is present, set the limit to its value.

If it's absent, cancel the limit.

> +
> +##
> +# @query-vcpu-dirty-limit:
> +#
> +# Returns information about all virtual CPU dirty limit if enabled.
> +#
> +# Since: 7.0
> +#
> +# Example:
> +#   {"execute": "query-vcpu-dirty-limit"}
> +#
> +##
> +{ 'command': 'query-vcpu-dirty-limit',
> +  'returns': [ 'DirtyLimitInfo' ] }
> +
> +##
>  # @snapshot-save:
>  #
>  # Save a VM snapshot



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle
  2021-12-15  7:15   ` Markus Armbruster
@ 2021-12-15  7:40     ` Hyman Huang
  0 siblings, 0 replies; 30+ messages in thread
From: Hyman Huang @ 2021-12-15  7:40 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



在 2021/12/15 15:15, Markus Armbruster 写道:
> huangy81@chinatelecom.cn writes:
> 
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> Setup a negative feedback system when vCPU thread
>> handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
>> throttle_us_per_full field in struct CPUState. Sleep
>> throttle_us_per_full microseconds to throttle vCPU
>> if dirtylimit is enabled.
>>
>> Start a thread to track current dirty page rates and
>> tune the throttle_us_per_full dynamically untill current
>> dirty page rate reach the quota.
>>
>> Introduce the util function in the header for dirtylimit
>> implementation.
>>
>> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> 
> [...]
> 
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index bbfd48c..ac5fa56 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -1850,6 +1850,25 @@
>>   { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
>>   
>>   ##
>> +# @DirtyLimitInfo:
>> +#
>> +# Dirty page rate limit information of virtual CPU.
>> +#
>> +# @cpu-index: index of virtual CPU.
>> +#
>> +# @limit-rate: upper limit of dirty page rate for virtual CPU.
> 
> If I understand your code correctly, zero means unlimited.  This is
> undocumented.
> 
Ok
> In review of v9, I asked to "make @dirty-rate optional, present means
> enable, absent means disable."  Any particular reason for not doing it
> that way?
> 
No, :(

Actually, i replied the review advice as the following:

-------------
if we drop @enable, enabling dirty limit should be like:
vcpu-dirty-limit cpu-index=0 dirty-rate=1000

And disabling dirty limit like:
vcpu-dirty-limit cpu-index=0

For disabling case, there is no hint of disabling literally in command
"vcpu-dirty-limit".

How about make @dirty-rate optional, when enable dirty limit, it should
present, ignored otherwise?
-------------

The discussion are still ongoing :)

>> +#
>> +# @current-rate: current dirty page rate for virtual CPU.
>> +#
>> +# Since: 7.0
>> +#
>> +##
>> +{ 'struct': 'DirtyLimitInfo',
>> +  'data': { 'cpu-index': 'int',
>> +            'limit-rate': 'int64',
>> +            'current-rate': 'int64' } }
>> +
>> +##
>>   # @snapshot-save:
>>   #
>>   # Save a VM snapshot
> 
> [...]
> 

-- 
Best regard

Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-15  7:37   ` Markus Armbruster
@ 2021-12-15  7:56     ` Hyman Huang
  2021-12-15  8:09       ` Peter Xu
  0 siblings, 1 reply; 30+ messages in thread
From: Hyman Huang @ 2021-12-15  7:56 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



在 2021/12/15 15:37, Markus Armbruster 写道:
> huangy81@chinatelecom.cn writes:
> 
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> Implement dirtyrate calculation periodically basing on
>> dirty-ring and throttle virtual CPU until it reachs the quota
>> dirty page rate given by user.
>>
>> Introduce qmp commands "vcpu-dirty-limit", "query-vcpu-dirty-limit"
>> to enable, disable, query dirty page limit for virtual CPU.
>>
>> Meanwhile, introduce corresponding hmp commands "vcpu_dirty_limit",
>> "info vcpu_dirty_limit" so developers can play with them easier.
>>
>> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> 
> [...]
> 
>> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
>> index 96d0148..04879a2 100644
>> --- a/include/monitor/hmp.h
>> +++ b/include/monitor/hmp.h
>> @@ -131,6 +131,8 @@ void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
>>   void hmp_replay_seek(Monitor *mon, const QDict *qdict);
>>   void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
>>   void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
>> +void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
>> +void hmp_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
>>   void hmp_human_readable_text_helper(Monitor *mon,
>>                                       HumanReadableText *(*qmp_handler)(Error **));
>>   
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index ac5fa56..7d8da4f 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -1869,6 +1869,50 @@
>>               'current-rate': 'int64' } }
>>   
>>   ##
>> +# @vcpu-dirty-limit:
>> +#
>> +# Set or cancel the upper limit of dirty page rate for a virtual CPU.
>> +#
>> +# Requires KVM with accelerator property "dirty-ring-size" set.
>> +# A virtual CPU's dirty page rate is a measure of its memory load.
>> +# To observe dirty page rates, use @calc-dirty-rate.
>> +#
>> +# @enable: true to enable, false to disable.
>> +#
>> +# @cpu-index: index of virtual CPU, default is all.
> 
> Suggest
> 
>      # @cpu-index: The vCPU to act upon (all by default).
> 
> (I'm stealing this from trace.json)
> 
>> +#
>> +# @dirty-rate: upper limit of dirty page rate for virtual CPU, to
>> +#              cancel dirty limit, this field will be ignored.
>> +#
>> +# Since: 7.0
>> +#
>> +# Example:
>> +#   {"execute": "vcpu-dirty-limit"}
>> +#    "arguments": { "enable": true,
>> +#                   "cpu-index": 1,
>> +#                   "dirty-rate": 200 } }
>> +#
>> +##
>> +{ 'command': 'vcpu-dirty-limit',
>> +  'data': { 'enable': 'bool',
>> +            '*cpu-index': 'uint64',
>> +            '*dirty-rate': 'uint64'} }
> 
> Drop @enable, please.
> 
> If @dirty-rate is present, set the limit to its value.
> 
> If it's absent, cancel the limit.
> 
Ok. Indeed, this is the simplest style. :)

So the final qmp format should be like:

case 1: setup vcpu 0 dirty page limit 100MB/s
vcpu-dirty-limit  cpu-index=0   dirty-rate=100MB/s

case 2: cancle vcpu 0 dirty page limit
vcpu-dirty-limit  cpu-index=0

I'll do that next version.

Thanks Markus very much

>> +
>> +##
>> +# @query-vcpu-dirty-limit:
>> +#
>> +# Returns information about all virtual CPU dirty limit if enabled.
>> +#
>> +# Since: 7.0
>> +#
>> +# Example:
>> +#   {"execute": "query-vcpu-dirty-limit"}
>> +#
>> +##
>> +{ 'command': 'query-vcpu-dirty-limit',
>> +  'returns': [ 'DirtyLimitInfo' ] }
>> +
>> +##
>>   # @snapshot-save:
>>   #
>>   # Save a VM snapshot
> 

-- 
Best regard

Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-15  7:56     ` Hyman Huang
@ 2021-12-15  8:09       ` Peter Xu
  2021-12-15  8:29         ` Hyman Huang
  2021-12-15 13:41         ` Markus Armbruster
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Xu @ 2021-12-15  8:09 UTC (permalink / raw)
  To: Hyman Huang
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, Markus Armbruster, qemu-devel, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Wed, Dec 15, 2021 at 03:56:55PM +0800, Hyman Huang wrote:
> > > +{ 'command': 'vcpu-dirty-limit',
> > > +  'data': { 'enable': 'bool',
> > > +            '*cpu-index': 'uint64',
> > > +            '*dirty-rate': 'uint64'} }
> > 
> > Drop @enable, please.
> > 
> > If @dirty-rate is present, set the limit to its value.
> > 
> > If it's absent, cancel the limit.
> > 
> Ok. Indeed, this is the simplest style. :)
> 
> So the final qmp format should be like:
> 
> case 1: setup vcpu 0 dirty page limit 100MB/s
> vcpu-dirty-limit  cpu-index=0   dirty-rate=100MB/s
> 
> case 2: cancle vcpu 0 dirty page limit
> vcpu-dirty-limit  cpu-index=0

I actually agree with what you said... for human beings no one will read it as
"disable vcpu throttling", instead people could consider it enables vcpu
throttle with a default dirty rate from a gut feeling.

I think what Markus suggested is the simplest solution for computers, but it
can confuse human beings.  So it turns out to be a general question to QMP
scheme design: should we always assume QMP client to be a piece of software, or
should we still consider the feeling of human beings operating on QMP
interfaces using qmp-shell.

IMHO we should still consider the latter, if we don't lose much, anyway.  But I
don't have a strong opinion.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-15  8:09       ` Peter Xu
@ 2021-12-15  8:29         ` Hyman Huang
  2021-12-15 10:16           ` Hyman Huang
  2021-12-15 13:41         ` Markus Armbruster
  1 sibling, 1 reply; 30+ messages in thread
From: Hyman Huang @ 2021-12-15  8:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, Markus Armbruster, qemu-devel, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert



在 2021/12/15 16:09, Peter Xu 写道:
> On Wed, Dec 15, 2021 at 03:56:55PM +0800, Hyman Huang wrote:
>>>> +{ 'command': 'vcpu-dirty-limit',
>>>> +  'data': { 'enable': 'bool',
>>>> +            '*cpu-index': 'uint64',
>>>> +            '*dirty-rate': 'uint64'} }
>>>
>>> Drop @enable, please.
>>>
>>> If @dirty-rate is present, set the limit to its value.
>>>
>>> If it's absent, cancel the limit.
>>>
>> Ok. Indeed, this is the simplest style. :)
>>
>> So the final qmp format should be like:
>>
>> case 1: setup vcpu 0 dirty page limit 100MB/s
>> vcpu-dirty-limit  cpu-index=0   dirty-rate=100MB/s
>>
>> case 2: cancle vcpu 0 dirty page limit
>> vcpu-dirty-limit  cpu-index=0
> 
> I actually agree with what you said... for human beings no one will read it as
> "disable vcpu throttling", instead people could consider it enables vcpu
> throttle with a default dirty rate from a gut feeling.
> 
> I think what Markus suggested is the simplest solution for computers, but it
> can confuse human beings.  So it turns out to be a general question to QMP
> scheme design: should we always assume QMP client to be a piece of software, or
> should we still consider the feeling of human beings operating on QMP
> interfaces using qmp-shell.
> 
> IMHO we should still consider the latter, if we don't lose much, anyway.  But I
> don't have a strong opinion.
>  > Thanks,
> 
So, how do you think about it, Markus?
-- 
Best regard

Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-15  8:29         ` Hyman Huang
@ 2021-12-15 10:16           ` Hyman Huang
  0 siblings, 0 replies; 30+ messages in thread
From: Hyman Huang @ 2021-12-15 10:16 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



在 2021/12/15 16:29, Hyman Huang 写道:
> 
> 
> 在 2021/12/15 16:09, Peter Xu 写道:
>> On Wed, Dec 15, 2021 at 03:56:55PM +0800, Hyman Huang wrote:
>>>>> +{ 'command': 'vcpu-dirty-limit',
>>>>> +  'data': { 'enable': 'bool',
>>>>> +            '*cpu-index': 'uint64',
>>>>> +            '*dirty-rate': 'uint64'} }
>>>>
>>>> Drop @enable, please.
>>>>
>>>> If @dirty-rate is present, set the limit to its value.
>>>>
>>>> If it's absent, cancel the limit.
>>>>
>>> Ok. Indeed, this is the simplest style. :)
>>>
>>> So the final qmp format should be like:
>>>
>>> case 1: setup vcpu 0 dirty page limit 100MB/s
>>> vcpu-dirty-limit  cpu-index=0   dirty-rate=100MB/s
>>>
>>> case 2: cancle vcpu 0 dirty page limit
>>> vcpu-dirty-limit  cpu-index=0
>>
>> I actually agree with what you said... for human beings no one will 
>> read it as
>> "disable vcpu throttling", instead people could consider it enables vcpu
>> throttle with a default dirty rate from a gut feeling.
>>
>> I think what Markus suggested is the simplest solution for computers, 
>> but it
>> can confuse human beings.  So it turns out to be a general question to 
>> QMP
>> scheme design: should we always assume QMP client to be a piece of 
>> software, or
>> should we still consider the feeling of human beings operating on QMP
>> interfaces using qmp-shell.
>>
>> IMHO we should still consider the latter, if we don't lose much, 
>> anyway.  But I
>> don't have a strong opinion.
>>  > Thanks,
>>
> So, how do you think about it, Markus?

I prefer Peter's advice and there is another reason:

In current implementation, we introduced the global on/off switch,
enable dirty page rate limit on specified vcpu only if @cpu-index is 
passed, otherwise, all vcpu will be affected.

If we remove the @enable and use @dirty-rate to indicate the 
enabled/disable switch. The qmp format should be like the following:

case 1: setup vcpu 0 dirty page limit 100MB/s
vcpu-dirty-limit  cpu-index=0   dirty-rate=100MB/s

case 2: setup all vcpu dirty page limit 100MB/s
vcpu-dirty-limit dirty-rate=100MB/s

case 3: cancel vcpu 0 dirty page limit
vcpu-dirty-limit  cpu-index=0

case 4: cancel all vcpu dirty page limit
vcpu-dirty-limit

For case 4, it seems to be weired.

-- 
Best regard

Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-15  8:09       ` Peter Xu
  2021-12-15  8:29         ` Hyman Huang
@ 2021-12-15 13:41         ` Markus Armbruster
  2021-12-16  6:22           ` Peter Xu
  1 sibling, 1 reply; 30+ messages in thread
From: Markus Armbruster @ 2021-12-15 13:41 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, David Hildenbrand, Richard Henderson,
	Juan Quintela, Hyman Huang, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

Peter Xu <peterx@redhat.com> writes:

> On Wed, Dec 15, 2021 at 03:56:55PM +0800, Hyman Huang wrote:
>> > > +{ 'command': 'vcpu-dirty-limit',
>> > > +  'data': { 'enable': 'bool',
>> > > +            '*cpu-index': 'uint64',
>> > > +            '*dirty-rate': 'uint64'} }
>> > 
>> > Drop @enable, please.
>> > 
>> > If @dirty-rate is present, set the limit to its value.
>> > 
>> > If it's absent, cancel the limit.
>> > 
>> Ok. Indeed, this is the simplest style. :)
>> 
>> So the final qmp format should be like:
>> 
>> case 1: setup vcpu 0 dirty page limit 100MB/s
>> vcpu-dirty-limit  cpu-index=0   dirty-rate=100MB/s
>> 
>> case 2: cancle vcpu 0 dirty page limit
>> vcpu-dirty-limit  cpu-index=0
>
> I actually agree with what you said... for human beings no one will read it as
> "disable vcpu throttling", instead people could consider it enables vcpu
> throttle with a default dirty rate from a gut feeling.
>
> I think what Markus suggested is the simplest solution for computers, but it
> can confuse human beings.  So it turns out to be a general question to QMP
> scheme design: should we always assume QMP client to be a piece of software, or
> should we still consider the feeling of human beings operating on QMP
> interfaces using qmp-shell.
>
> IMHO we should still consider the latter, if we don't lose much, anyway.  But I
> don't have a strong opinion.

If you want a more explicit interface, then I'd recommend to go right
back to v7:

    {"execute": "set-vcpu-dirty-limit",
     "arguments": {"cpu-index": 0, "dirtyrate": 200}}

    {"execute": "cancel-vcpu-dirty-limit",
     "arguments": {"cpu-index": 0}}

Bonus: it already has my Acked-by.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-15 13:41         ` Markus Armbruster
@ 2021-12-16  6:22           ` Peter Xu
  2021-12-16  9:16             ` Hyman Huang
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2021-12-16  6:22 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Eduardo Habkost, David Hildenbrand, Richard Henderson,
	Juan Quintela, Hyman Huang, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

On Wed, Dec 15, 2021 at 02:41:32PM +0100, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Dec 15, 2021 at 03:56:55PM +0800, Hyman Huang wrote:
> >> > > +{ 'command': 'vcpu-dirty-limit',
> >> > > +  'data': { 'enable': 'bool',
> >> > > +            '*cpu-index': 'uint64',
> >> > > +            '*dirty-rate': 'uint64'} }
> >> > 
> >> > Drop @enable, please.
> >> > 
> >> > If @dirty-rate is present, set the limit to its value.
> >> > 
> >> > If it's absent, cancel the limit.
> >> > 
> >> Ok. Indeed, this is the simplest style. :)
> >> 
> >> So the final qmp format should be like:
> >> 
> >> case 1: setup vcpu 0 dirty page limit 100MB/s
> >> vcpu-dirty-limit  cpu-index=0   dirty-rate=100MB/s
> >> 
> >> case 2: cancle vcpu 0 dirty page limit
> >> vcpu-dirty-limit  cpu-index=0
> >
> > I actually agree with what you said... for human beings no one will read it as
> > "disable vcpu throttling", instead people could consider it enables vcpu
> > throttle with a default dirty rate from a gut feeling.
> >
> > I think what Markus suggested is the simplest solution for computers, but it
> > can confuse human beings.  So it turns out to be a general question to QMP
> > scheme design: should we always assume QMP client to be a piece of software, or
> > should we still consider the feeling of human beings operating on QMP
> > interfaces using qmp-shell.
> >
> > IMHO we should still consider the latter, if we don't lose much, anyway.  But I
> > don't have a strong opinion.
> 
> If you want a more explicit interface, then I'd recommend to go right
> back to v7:
> 
>     {"execute": "set-vcpu-dirty-limit",
>      "arguments": {"cpu-index": 0, "dirtyrate": 200}}
> 
>     {"execute": "cancel-vcpu-dirty-limit",
>      "arguments": {"cpu-index": 0}}
> 
> Bonus: it already has my Acked-by.

Fair enough. :)  That looks good to me too.

Yong, please hold-off a bit on reposting (if there's a plan) - I'll read the
other parts soon..

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-16  6:22           ` Peter Xu
@ 2021-12-16  9:16             ` Hyman Huang
  2021-12-16 10:23               ` Markus Armbruster
  0 siblings, 1 reply; 30+ messages in thread
From: Hyman Huang @ 2021-12-16  9:16 UTC (permalink / raw)
  To: Peter Xu, Markus Armbruster
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



在 2021/12/16 14:22, Peter Xu 写道:
> On Wed, Dec 15, 2021 at 02:41:32PM +0100, Markus Armbruster wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>>> On Wed, Dec 15, 2021 at 03:56:55PM +0800, Hyman Huang wrote:
>>>>>> +{ 'command': 'vcpu-dirty-limit',
>>>>>> +  'data': { 'enable': 'bool',
>>>>>> +            '*cpu-index': 'uint64',
>>>>>> +            '*dirty-rate': 'uint64'} }
>>>>>
>>>>> Drop @enable, please.
>>>>>
>>>>> If @dirty-rate is present, set the limit to its value.
>>>>>
>>>>> If it's absent, cancel the limit.
>>>>>
>>>> Ok. Indeed, this is the simplest style. :)
>>>>
>>>> So the final qmp format should be like:
>>>>
>>>> case 1: setup vcpu 0 dirty page limit 100MB/s
>>>> vcpu-dirty-limit  cpu-index=0   dirty-rate=100MB/s
>>>>
>>>> case 2: cancle vcpu 0 dirty page limit
>>>> vcpu-dirty-limit  cpu-index=0
>>>
>>> I actually agree with what you said... for human beings no one will read it as
>>> "disable vcpu throttling", instead people could consider it enables vcpu
>>> throttle with a default dirty rate from a gut feeling.
>>>
>>> I think what Markus suggested is the simplest solution for computers, but it
>>> can confuse human beings.  So it turns out to be a general question to QMP
>>> scheme design: should we always assume QMP client to be a piece of software, or
>>> should we still consider the feeling of human beings operating on QMP
>>> interfaces using qmp-shell.
>>>
>>> IMHO we should still consider the latter, if we don't lose much, anyway.  But I
>>> don't have a strong opinion.
>>
>> If you want a more explicit interface, then I'd recommend to go right
>> back to v7:
>>
>>      {"execute": "set-vcpu-dirty-limit",
>>       "arguments": {"cpu-index": 0, "dirtyrate": 200}}
>>
>>      {"execute": "cancel-vcpu-dirty-limit",
>>       "arguments": {"cpu-index": 0}}
>>
>> Bonus: it already has my Acked-by.
> 
> Fair enough. :)  That looks good to me too.
> 
> Yong, please hold-off a bit on reposting (if there's a plan) - I'll read the
> other parts soon..
> 
Ok, i'm not going to repost the next version untill the consensus is 
achieved.

So the final format of qmp we conclude are:

case 1: setup vcpu 0 dirty page limit 100MB/s
set-vcpu-dirty-limit cpu-index=0 dirty-rate=100

case 2: setup all vcpu dirty page limit 100MB/s
set-vcpu-dirty-limit dirty-rate=100

case 3: cancel vcpu 0 dirty page limit
cancel-vcpu-dirty-limit cpu-index=0

case 4: cancel all vcpu dirty page limit
cancel-vcpu-dirty-limit

case 5: query limit infomatioin of all vcpu enabled
query-vcpu-dirty-limit

And the corresponding hmp format keep the same style:

Is there any advice? :)

> Thanks,
> 

-- 
Best regard

Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-16  9:16             ` Hyman Huang
@ 2021-12-16 10:23               ` Markus Armbruster
  2021-12-24  5:16                 ` Peter Xu
  0 siblings, 1 reply; 30+ messages in thread
From: Markus Armbruster @ 2021-12-16 10:23 UTC (permalink / raw)
  To: Hyman Huang
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

Hyman Huang <huangy81@chinatelecom.cn> writes:

[...]

> So the final format of qmp we conclude are:
>
> case 1: setup vcpu 0 dirty page limit 100MB/s
> set-vcpu-dirty-limit cpu-index=0 dirty-rate=100
>
> case 2: setup all vcpu dirty page limit 100MB/s
> set-vcpu-dirty-limit dirty-rate=100
>
> case 3: cancel vcpu 0 dirty page limit
> cancel-vcpu-dirty-limit cpu-index=0
>
> case 4: cancel all vcpu dirty page limit
> cancel-vcpu-dirty-limit
>
> case 5: query limit infomatioin of all vcpu enabled
> query-vcpu-dirty-limit
>
> And the corresponding hmp format keep the same style:
>
> Is there any advice? :)

Looks okay to me.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  2021-12-14 11:07 ` [PATCH v10 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
@ 2021-12-23 11:12   ` Peter Xu
  2021-12-26 15:58     ` Hyman
  2021-12-30  5:01     ` Hyman Huang
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Xu @ 2021-12-23 11:12 UTC (permalink / raw)
  To: huangy81
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

Hi, Yong,

On Tue, Dec 14, 2021 at 07:07:32PM +0800, huangy81@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> 
> Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
> tracking for calculate dirtyrate periodly for dirty restraint.
> 
> Implement thread for calculate dirtyrate periodly, which will
> be used for dirty page limit.
> 
> Add dirtylimit.h to introduce the util function for dirty
> limit implementation.

Sorry to be late on reading it, my apologies.

> 
> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> ---
>  include/exec/memory.h       |   5 +-
>  include/sysemu/dirtylimit.h |  51 ++++++++++++++
>  migration/dirtyrate.c       | 160 +++++++++++++++++++++++++++++++++++++++++---
>  migration/dirtyrate.h       |   2 +
>  4 files changed, 207 insertions(+), 11 deletions(-)
>  create mode 100644 include/sysemu/dirtylimit.h
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 20f1b27..606bec8 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
>  /* Dirty tracking enabled because measuring dirty rate */
>  #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
>  
> -#define GLOBAL_DIRTY_MASK  (0x3)
> +/* Dirty tracking enabled because dirty limit */
> +#define GLOBAL_DIRTY_LIMIT      (1U << 2)
> +
> +#define GLOBAL_DIRTY_MASK  (0x7)
>  
>  extern unsigned int global_dirty_tracking;
>  
> diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
> new file mode 100644
> index 0000000..34e48f8
> --- /dev/null
> +++ b/include/sysemu/dirtylimit.h
> @@ -0,0 +1,51 @@
> +/*
> + * dirty limit helper functions
> + *
> + * Copyright (c) 2021 CHINA TELECOM CO.,LTD.
> + *
> + * Authors:
> + *  Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#ifndef QEMU_DIRTYRLIMIT_H
> +#define QEMU_DIRTYRLIMIT_H
> +
> +#define DIRTYLIMIT_CALC_TIME_MS         1000    /* 1000ms */
> +
> +/**
> + * dirtylimit_calc_current
> + *
> + * get current dirty page rate for specified virtual CPU.
> + */
> +int64_t dirtylimit_calc_current(int cpu_index);
> +
> +/**
> + * dirtylimit_calc_start
> + *
> + * start dirty page rate calculation thread.
> + */
> +void dirtylimit_calc_start(void);
> +
> +/**
> + * dirtylimit_calc_quit
> + *
> + * quit dirty page rate calculation thread.
> + */
> +void dirtylimit_calc_quit(void);
> +
> +/**
> + * dirtylimit_calc_state_init
> + *
> + * initialize dirty page rate calculation state.
> + */
> +void dirtylimit_calc_state_init(int max_cpus);
> +
> +/**
> + * dirtylimit_calc_state_finalize
> + *
> + * finalize dirty page rate calculation state.
> + */
> +void dirtylimit_calc_state_finalize(void);
> +#endif

Since dirtylimit and dirtyrate looks so alike, not sure it's easier to just
reuse dirtyrate.h; after all you reused dirtyrate.c.

> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> index d65e744..e8d4e4a 100644
> --- a/migration/dirtyrate.c
> +++ b/migration/dirtyrate.c
> @@ -27,6 +27,7 @@
>  #include "qapi/qmp/qdict.h"
>  #include "sysemu/kvm.h"
>  #include "sysemu/runstate.h"
> +#include "sysemu/dirtylimit.h"
>  #include "exec/memory.h"
>  
>  /*
> @@ -46,6 +47,155 @@ static struct DirtyRateStat DirtyStat;
>  static DirtyRateMeasureMode dirtyrate_mode =
>                  DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
>  
> +struct {
> +    DirtyRatesData data;
> +    bool quit;
> +    QemuThread thread;
> +} *dirtylimit_calc_state;
> +
> +static void dirtylimit_global_dirty_log_start(void)
> +{
> +    qemu_mutex_lock_iothread();
> +    memory_global_dirty_log_start(GLOBAL_DIRTY_LIMIT);
> +    qemu_mutex_unlock_iothread();
> +}
> +
> +static void dirtylimit_global_dirty_log_stop(void)
> +{
> +    qemu_mutex_lock_iothread();
> +    memory_global_dirty_log_stop(GLOBAL_DIRTY_LIMIT);
> +    qemu_mutex_unlock_iothread();
> +}

This is merely dirtyrate_global_dirty_log_start/stop but with a different flag.

Let's introduce global_dirty_log_change() with BQL?

  global_dirty_log_change(flag, onoff)
  {
    qemu_mutex_lock_iothread();
    if (start) {
        memory_global_dirty_log_start(flag);
    } else {
        memory_global_dirty_log_stop(flag);
    }
    qemu_mutex_unlock_iothread();
  }

Then we merge 4 functions into one.

We can also have a BQL-version of global_dirty_log_sync() in the same patch if
you think above helpful.

> +
> +static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
> +                                     CPUState *cpu, bool start)
> +{
> +    if (start) {
> +        dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
> +    } else {
> +        dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
> +    }
> +}
> +
> +static void dirtylimit_calc_func(void)

Would you still consider merging this with calculate_dirtyrate_dirty_ring?

I still don't see why it can't.

Maybe it cannot be directly reused, but the whole logic is really, really
similar: alloc an array of DirtyPageRecord, take notes, sleep, take some other
notes, calculate per-vcpu dirty rates.

There's some trivial details that are different (whether start/stop logging,
whether use sync), but they can be abstracted.

At least it can be changed into something like:

  dirtylimit_calc_func(DirtyRateVcpu *stat)
  {
      // never race against cpu hotplug/unplug
      cpu_list_lock();

      // this should include allocate buffers and record initial values for all cores
      record = vcpu_dirty_stat_alloc();
      // do the sleep
      duration = vcpu_dirty_stat_wait(records)

      // the "difference"..
      global_dirty_log_sync();

      // collect end timestamps, calculates
      vcpu_dirty_stat_collect(stat, records);
      vcpu_dirty_stat_free(stat);

      cpu_list_unlock();

      return stat;
  }

It may miss something but just to show what I meant..

> +{
> +    CPUState *cpu;
> +    DirtyPageRecord *dirty_pages;
> +    int64_t start_time, end_time, calc_time;
> +    DirtyRateVcpu rate;
> +    int i = 0;
> +
> +    dirty_pages = g_malloc0(sizeof(*dirty_pages) *
> +        dirtylimit_calc_state->data.nvcpu);
> +
> +    CPU_FOREACH(cpu) {
> +        record_dirtypages(dirty_pages, cpu, true);
> +    }

Note that I used cpu_list_lock() above and I think we need it.

Initially I thought rcu read lock is fine too (which is missing! btw) but I
don't really think it'll work, because rcu assignment won't wait for a grace
period when add/remove cpus into global cpu list.  So I don't see a good way to
do this safely with cpu plug/unplug but to take the cpu list lock, otherwise be
prepared to crash qemu when it happens..

I don't know whether the cpu list is doing the right thing on RCU assignment,
but I know the mutex should work..

> +
> +    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000);
> +    end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    calc_time = end_time - start_time;
> +
> +    qemu_mutex_lock_iothread();
> +    memory_global_dirty_log_sync();
> +    qemu_mutex_unlock_iothread();
> +
> +    CPU_FOREACH(cpu) {
> +        record_dirtypages(dirty_pages, cpu, false);
> +    }
> +
> +    for (i = 0; i < dirtylimit_calc_state->data.nvcpu; i++) {
> +        uint64_t increased_dirty_pages =
> +            dirty_pages[i].end_pages - dirty_pages[i].start_pages;
> +        uint64_t memory_size_MB =
> +            (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
> +        int64_t dirtyrate = (memory_size_MB * 1000) / calc_time;
> +
> +        rate.id = i;
> +        rate.dirty_rate  = dirtyrate;
> +        dirtylimit_calc_state->data.rates[i] = rate;
> +
> +        trace_dirtyrate_do_calculate_vcpu(i,
> +            dirtylimit_calc_state->data.rates[i].dirty_rate);
> +    }
> +
> +    free(dirty_pages);
> +}
> +
> +static void *dirtylimit_calc_thread(void *opaque)
> +{
> +    rcu_register_thread();
> +
> +    dirtylimit_global_dirty_log_start();
> +
> +    while (!qatomic_read(&dirtylimit_calc_state->quit)) {
> +        dirtylimit_calc_func();
> +    }
> +
> +    dirtylimit_global_dirty_log_stop();
> +
> +    rcu_unregister_thread();
> +    return NULL;
> +}
> +
> +int64_t dirtylimit_calc_current(int cpu_index)

It's not "calculating" but "fetching", maybe simply call it
vcpu_get_dirty_rate()?

> +{
> +    DirtyRateVcpu *rates = dirtylimit_calc_state->data.rates;
> +
> +    return qatomic_read(&rates[cpu_index].dirty_rate);
> +}
> +
> +void dirtylimit_calc_start(void)
> +{
> +    if (!qatomic_read(&dirtylimit_calc_state->quit)) {

If we can have a "running", then we should check "running==true" instead.
Please see below on...

> +        goto end;

"return" would work.

> +    }
> +
> +    qatomic_set(&dirtylimit_calc_state->quit, 0);

Same here, set running=true, then clear it when thread quits.

> +    qemu_thread_create(&dirtylimit_calc_state->thread,
> +                       "dirtylimit-calc",
> +                       dirtylimit_calc_thread,
> +                       NULL,
> +                       QEMU_THREAD_JOINABLE);
> +end:
> +    return;

No need for both of them..

> +}
> +
> +void dirtylimit_calc_quit(void)
> +{
> +    qatomic_set(&dirtylimit_calc_state->quit, 1);
> +
> +    if (qemu_mutex_iothread_locked()) {

Ideally I think this should already with BQL so not necessary?  I'll check
later patches, just leave a mark.

> +        qemu_mutex_unlock_iothread();
> +        qemu_thread_join(&dirtylimit_calc_state->thread);
> +        qemu_mutex_lock_iothread();
> +    } else {
> +        qemu_thread_join(&dirtylimit_calc_state->thread);
> +    }
> +}
> +
> +void dirtylimit_calc_state_init(int max_cpus)
> +{
> +    dirtylimit_calc_state =
> +        g_malloc0(sizeof(*dirtylimit_calc_state));
> +
> +    dirtylimit_calc_state->data.nvcpu = max_cpus;
> +    dirtylimit_calc_state->data.rates =
> +        g_malloc0(sizeof(DirtyRateVcpu) * max_cpus);
> +
> +    dirtylimit_calc_state->quit = true;

Instead of using "quit", I'd rather use "running".  It'll be false by default
and only set when thread runs.

Setting "quit" by default just reads weird.. or let's keep both "quit" or
"running", I think it'll be cleaner, then here we should make running=false and
quit=false too.

> +}
> +
> +void dirtylimit_calc_state_finalize(void)
> +{
> +    free(dirtylimit_calc_state->data.rates);
> +    dirtylimit_calc_state->data.rates = NULL;
> +
> +    free(dirtylimit_calc_state);
> +    dirtylimit_calc_state = NULL;
> +}
> +
>  static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
>  {
>      int64_t current_time;
> @@ -396,16 +546,6 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
>      return true;
>  }
>  
> -static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
> -                                     CPUState *cpu, bool start)
> -{
> -    if (start) {
> -        dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
> -    } else {
> -        dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
> -    }
> -}
> -
>  static void dirtyrate_global_dirty_log_start(void)
>  {
>      qemu_mutex_lock_iothread();
> diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
> index 69d4c5b..e96acdc 100644
> --- a/migration/dirtyrate.h
> +++ b/migration/dirtyrate.h
> @@ -70,6 +70,8 @@ typedef struct VcpuStat {
>      DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
>  } VcpuStat;
>  
> +typedef struct VcpuStat DirtyRatesData;
> +
>  /*
>   * Store calculation statistics for each measure.
>   */
> -- 
> 1.8.3.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle
  2021-12-14 11:07 ` [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle huangy81
  2021-12-15  7:15   ` Markus Armbruster
@ 2021-12-24  5:12   ` Peter Xu
  2021-12-30 16:36     ` Hyman Huang
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Xu @ 2021-12-24  5:12 UTC (permalink / raw)
  To: huangy81
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Tue, Dec 14, 2021 at 07:07:33PM +0800, huangy81@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> 
> Setup a negative feedback system when vCPU thread
> handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
> throttle_us_per_full field in struct CPUState. Sleep
> throttle_us_per_full microseconds to throttle vCPU
> if dirtylimit is enabled.
> 
> Start a thread to track current dirty page rates and
> tune the throttle_us_per_full dynamically untill current
> dirty page rate reach the quota.
> 
> Introduce the util function in the header for dirtylimit
> implementation.
> 
> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> ---
>  accel/kvm/kvm-all.c           |  11 ++
>  include/hw/core/cpu.h         |   6 +
>  include/sysemu/cpu-throttle.h |  77 +++++++++
>  include/sysemu/kvm.h          |   2 +
>  qapi/migration.json           |  19 +++
>  softmmu/cpu-throttle.c        | 371 ++++++++++++++++++++++++++++++++++++++++++
>  softmmu/trace-events          |   6 +
>  7 files changed, 492 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index eecd803..cba5fed 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -45,6 +45,7 @@
>  #include "qemu/guest-random.h"
>  #include "sysemu/hw_accel.h"
>  #include "kvm-cpus.h"
> +#include "sysemu/cpu-throttle.h"
>  
>  #include "hw/boards.h"
>  
> @@ -2303,6 +2304,11 @@ bool kvm_dirty_ring_enabled(void)
>      return kvm_state->kvm_dirty_ring_size ? true : false;
>  }
>  
> +uint32_t kvm_dirty_ring_size(void)
> +{
> +    return kvm_state->kvm_dirty_ring_size;
> +}
> +
>  static int kvm_init(MachineState *ms)
>  {
>      MachineClass *mc = MACHINE_GET_CLASS(ms);
> @@ -2933,6 +2939,11 @@ int kvm_cpu_exec(CPUState *cpu)
>              qemu_mutex_lock_iothread();
>              kvm_dirty_ring_reap(kvm_state);
>              qemu_mutex_unlock_iothread();
> +            if (dirtylimit_in_service() &&
> +                dirtylimit_is_enabled(cpu->cpu_index) &&
> +                cpu->throttle_us_per_full) {
> +                usleep(cpu->throttle_us_per_full);
> +            }

Looks good, but perhaps put it into a dirty limit exported helper?

>              ret = 0;
>              break;
>          case KVM_EXIT_SYSTEM_EVENT:
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index e948e81..be80fe2 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -411,6 +411,12 @@ struct CPUState {
>       */
>      bool throttle_thread_scheduled;
>  
> +    /*
> +     * Sleep throttle_us_per_full microseconds once dirty ring is full
> +     * when dirty page limit is enabled.
> +     */
> +    int64_t throttle_us_per_full;
> +
>      bool ignore_memory_transaction_failures;
>  
>      struct hax_vcpu_state *hax_vcpu;
> diff --git a/include/sysemu/cpu-throttle.h b/include/sysemu/cpu-throttle.h
> index d65bdef..d4973a5 100644
> --- a/include/sysemu/cpu-throttle.h
> +++ b/include/sysemu/cpu-throttle.h
> @@ -65,4 +65,81 @@ bool cpu_throttle_active(void);
>   */
>  int cpu_throttle_get_percentage(void);
>  
> +/**
> + * dirtylimit_is_enabled
> + *
> + * Returns: %true if dirty page rate limit on specified virtual CPU is enabled,
> + *          %false otherwise.
> + */
> +bool dirtylimit_is_enabled(int cpu_index);
> +
> +/**
> + * dirtylimit_in_service
> + *
> + * Returns: %true if dirty page rate limit thread is running, %false otherwise.
> + */
> +bool dirtylimit_in_service(void);
> +
> +/**
> + * dirtylimit_stop
> + *
> + * stop dirty page rate limit thread.
> + */
> +void dirtylimit_stop(void);
> +
> +/**
> + * dirtylimit_is_vcpu_index_valid
> + *
> + * Returns: %true if cpu index valid, %false otherwise.
> + */
> +bool dirtylimit_is_vcpu_index_valid(int cpu_index);
> +
> +/**
> + * dirtylimit_state_init
> + *
> + * initialize golobal state for dirty page rate limit.
> + */
> +void dirtylimit_state_init(int max_cpus);
> +
> +/**
> + * dirtylimit_state_finalize
> + *
> + * finalize golobal state for dirty page rate limit.
> + */
> +void dirtylimit_state_finalize(void);
> +
> +/**
> + * dirtylimit_vcpu
> + *
> + * setup dirty page rate limit on specified virtual CPU with quota.
> + */
> +void dirtylimit_vcpu(int cpu_index, uint64_t quota);
> +
> +/**
> + * dirtylimit_all
> + *
> + * setup dirty page rate limit on all virtual CPU with quota.
> + */
> +void dirtylimit_all(uint64_t quota);
> +
> +/**
> + * dirtylimit_query_all
> + *
> + * Returns: dirty page limit information of all virtual CPU enabled.
> + */
> +struct DirtyLimitInfoList *dirtylimit_query_all(void);
> +
> +/**
> + * dirtylimit_cancel_vcpu
> + *
> + * cancel dirtylimit for the specified virtual CPU.
> + */
> +void dirtylimit_cancel_vcpu(int cpu_index);
> +
> +/**
> + * dirtylimit_cancel_all
> + *
> + * cancel dirtylimit for all virtual CPU enabled.
> + */
> +void dirtylimit_cancel_all(void);
>  #endif /* SYSEMU_CPU_THROTTLE_H */
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 7b22aeb..e5a9a28 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -548,4 +548,6 @@ bool kvm_cpu_check_are_resettable(void);
>  bool kvm_arch_cpu_check_are_resettable(void);
>  
>  bool kvm_dirty_ring_enabled(void);
> +
> +uint32_t kvm_dirty_ring_size(void);
>  #endif
> diff --git a/qapi/migration.json b/qapi/migration.json
> index bbfd48c..ac5fa56 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1850,6 +1850,25 @@
>  { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
>  
>  ##
> +# @DirtyLimitInfo:
> +#
> +# Dirty page rate limit information of virtual CPU.
> +#
> +# @cpu-index: index of virtual CPU.
> +#
> +# @limit-rate: upper limit of dirty page rate for virtual CPU.
> +#
> +# @current-rate: current dirty page rate for virtual CPU.
> +#
> +# Since: 7.0
> +#
> +##
> +{ 'struct': 'DirtyLimitInfo',
> +  'data': { 'cpu-index': 'int',
> +            'limit-rate': 'int64',
> +            'current-rate': 'int64' } }
> +
> +##
>  # @snapshot-save:
>  #
>  # Save a VM snapshot
> diff --git a/softmmu/cpu-throttle.c b/softmmu/cpu-throttle.c
> index 8c2144a..4472ab3 100644
> --- a/softmmu/cpu-throttle.c
> +++ b/softmmu/cpu-throttle.c
> @@ -29,6 +29,10 @@
>  #include "qemu/main-loop.h"
>  #include "sysemu/cpus.h"
>  #include "sysemu/cpu-throttle.h"
> +#include "sysemu/dirtylimit.h"
> +#include "sysemu/kvm.h"
> +#include "qapi/qapi-commands-migration.h"
> +#include "trace.h"
>  
>  /* vcpu throttling controls */
>  static QEMUTimer *throttle_timer;
> @@ -38,6 +42,373 @@ static unsigned int throttle_percentage;
>  #define CPU_THROTTLE_PCT_MAX 99
>  #define CPU_THROTTLE_TIMESLICE_NS 10000000
>  
> +#define DIRTYLIMIT_TOLERANCE_RANGE  25      /* 25MB/s */
> +#define DIRTYLIMIT_THROTTLE_PCT_WATERMARK   50
> +
> +typedef struct DirtyLimitState {
> +    int cpu_index;
> +    bool enabled;
> +    uint64_t quota;     /* quota dirtyrate MB/s */
> +    int unfit_cnt;

What is this?

Thanks for documenting all the exported functions in the headers.  It's just
that IMHO it still misses some nice documents on the ones we need, like this
one, or DIRTYLIMIT_TOLERANCE_RANGE && DIRTYLIMIT_THROTTLE_PCT_WATERMARK above.

> +} DirtyLimitState;

How about name per-vcpu structures to be something like VcpuDirtyLimitState?
Otherwise it has merely the same naming for the global structure right below,
it can be confusing.

> +
> +struct {
> +    DirtyLimitState *states;
> +    int max_cpus;
> +    unsigned long *bmap; /* running thread bitmap */
> +    unsigned long nr;
> +    QemuThread thread;
> +} *dirtylimit_state;
> +
> +static bool dirtylimit_quit = true;

Again, I think "quit" is not a good wording to show "whether dirtylimit is in
service".  How about "dirtylimit_global_enabled"?

You can actually use "dirtylimit_state" to show whether it's enabled already
(then drop the global value) since it's a pointer.  It shouldn't need to be
init-once-for-all, but we can alloc the strucuture when dirty limit enabled
globally, and destroy it (and reset it to NULL) when globally disabled.

Then "whether it's enabled" is simply to check "!!dirtylimit_state" under BQL.

> +
> +bool dirtylimit_is_enabled(int cpu_index)
> +{
> +    return qatomic_read(&dirtylimit_state->states[cpu_index].enabled);
> +}
> +
> +static inline void dirtylimit_enable(int cpu_index)
> +{
> +    qatomic_set(&dirtylimit_state->states[cpu_index].enabled, 1);
> +}
> +
> +static inline void dirtylimit_disable(int cpu_index)
> +{
> +    qatomic_set(&dirtylimit_state->states[cpu_index].enabled, 0);
> +}
> +
> +bool dirtylimit_in_service(void)
> +{
> +    return !qatomic_read(&dirtylimit_quit);
> +}
> +
> +void dirtylimit_stop(void)
> +{
> +    qatomic_set(&dirtylimit_quit, 1);
> +    if (qemu_mutex_iothread_locked()) {
> +        qemu_mutex_unlock_iothread();
> +        qemu_thread_join(&dirtylimit_state->thread);
> +        qemu_mutex_lock_iothread();
> +    } else {
> +        qemu_thread_join(&dirtylimit_state->thread);

Again, I'm confused, this function is always called with BQL, right?

> +    }
> +}
> +
> +static void dirtylimit_start(void)
> +{
> +    qatomic_set(&dirtylimit_quit, 0);
> +}
> +
> +bool dirtylimit_is_vcpu_index_valid(int cpu_index)
> +{
> +    return !(cpu_index < 0 ||
> +             cpu_index >= dirtylimit_state->max_cpus);

After/If you agree to free dirtylimit_state when disabled, we'd better not
check state->max_cpus but the MachineState one - smp.max_cpus, just in case the
state is not initialized.

> +}
> +
> +static inline void dirtylimit_set_quota(int cpu_index, uint64_t quota)
> +{
> +    dirtylimit_state->states[cpu_index].quota = quota;
> +}
> +
> +static inline uint64_t dirtylimit_quota(int cpu_index)
> +{
> +    return dirtylimit_state->states[cpu_index].quota;
> +}
> +
> +static int64_t dirtylimit_current(int cpu_index)
> +{
> +    return dirtylimit_calc_current(cpu_index);
> +}
> +
> +static inline int dirtylimit_unfit_cnt(int cpu_index)
> +{
> +    return dirtylimit_state->states[cpu_index].unfit_cnt;
> +}
> +
> +static inline int dirtylimit_unfit_cnt_inc(int cpu_index)
> +{
> +    return ++dirtylimit_state->states[cpu_index].unfit_cnt;
> +}
> +
> +static inline void dirtylimit_set_unfit_cnt(int cpu_index, int count)
> +{
> +    dirtylimit_state->states[cpu_index].unfit_cnt = count;
> +}

I tend to drop most of these accesors and instead provide:

static inline DirtyLimitState *dirtylimit_vcpu_get(int cpu_index)
{
    return &dirtylimit_state->states[cpu_index];
}

> +
> +static inline int64_t dirtylimit_dirty_ring_full_time(uint64_t dirtyrate)
> +{
> +    static uint64_t max_dirtyrate;
> +    uint32_t dirty_ring_size = kvm_dirty_ring_size();
> +    uint64_t dirty_ring_size_meory_MB =
> +        dirty_ring_size * TARGET_PAGE_SIZE >> 20;
> +
> +    if (max_dirtyrate < dirtyrate) {
> +        max_dirtyrate = dirtyrate;
> +    }
> +
> +    return dirty_ring_size_meory_MB * 1000000 / max_dirtyrate;
> +}
> +
> +static inline bool dirtylimit_hit(uint64_t quota,
> +                                  uint64_t current)
> +{
> +    uint64_t min, max;
> +
> +    min = MIN(quota, current);
> +    max = MAX(quota, current);
> +
> +    return ((max - min) <= DIRTYLIMIT_TOLERANCE_RANGE) ? true : false;
> +}
> +
> +static inline bool dirtylimit_turbo(uint64_t quota,
> +                                    uint64_t current)
> +{
> +    uint64_t min, max, pct;
> +
> +    min = MIN(quota, current);
> +    max = MAX(quota, current);
> +
> +    pct = (max - min) * 100 / max;
> +
> +    return pct > DIRTYLIMIT_THROTTLE_PCT_WATERMARK;
> +}
> +
> +static void dirtylimit_throttle_init(CPUState *cpu,
> +                                     uint64_t quota,
> +                                     uint64_t current)
> +{
> +    uint64_t pct = 0;
> +    int64_t throttle_us;
> +
> +    if (quota >= current || (current == 0)) {
> +        cpu->throttle_us_per_full = 0;
> +    } else {
> +        pct = (current - quota) * 100 / current;
> +        pct = MIN(pct, DIRTYLIMIT_THROTTLE_PCT_WATERMARK);
> +        pct = (double)pct / 100;
> +
> +        throttle_us = dirtylimit_dirty_ring_full_time(current) / (1 - pct);
> +        cpu->throttle_us_per_full = throttle_us;
> +    }
> +}
> +
> +static void dirtylimit_throttle(CPUState *cpu)
> +{
> +    int64_t ring_full_time_us = 0;
> +    uint64_t quota = 0;
> +    uint64_t current = 0;
> +    uint64_t sleep_pct = 0;
> +    uint64_t throttle_us = 0;
> +
> +    quota = dirtylimit_quota(cpu->cpu_index);
> +    current = dirtylimit_current(cpu->cpu_index);
> +
> +    if (current == 0 &&
> +        dirtylimit_unfit_cnt(cpu->cpu_index) == 0) {

Until here, there're already 3 helpers to access the same object.  How about
fetching dirtylimit_state->states[cpu_index] once and remove these helpers?  We
can directly reference the fields.  The helpers make nothing better but just
less efficient by looking up cpu_index every time..

> +        cpu->throttle_us_per_full = 0;
> +        goto end;
> +    } else if (cpu->throttle_us_per_full == 0) {
> +        dirtylimit_throttle_init(cpu, quota, current);

Why we need to init() if us_per_full==0?  What if we just have a valid
us_per_full==0 value during throttling simply because the vcpu doesn't write at
all so we reduced it to zero?

Can we simply init throttle_us_per_full to zero when enable?  If we can't
access CPUState* during enablement, we can have a cached throttle_us in
DirtyLimitState and init it there.  Then at the end of dirtylimit_throttle() we
apply that to the one in CPUState.

> +        goto end;
> +    } else if (dirtylimit_hit(quota, current)) {

This is the only "undertandable" code to me, until so far.. :)

> +        goto end;
> +    } else if (dirtylimit_unfit_cnt_inc(cpu->cpu_index) < 2) {

Could you help define "unfit_cnt" first with some comment?

Why choose 2?  Have you tried 4/8/...?

> +        goto end;
> +    }
> +
> +    dirtylimit_set_unfit_cnt(cpu->cpu_index, 0);
> +
> +    ring_full_time_us = dirtylimit_dirty_ring_full_time(current);
> +    if (dirtylimit_turbo(quota, current)) {
> +        if (quota < current) {
> +            sleep_pct = (current - quota) * 100 / current;
> +            throttle_us =
> +                ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
> +            cpu->throttle_us_per_full += throttle_us;
> +        } else {
> +            sleep_pct = (quota - current) * 100 / quota;
> +            throttle_us =
> +                ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
> +            cpu->throttle_us_per_full -= throttle_us;
> +        }
> +
> +        trace_dirtylimit_throttle_pct(cpu->cpu_index,
> +                                      sleep_pct,
> +                                      throttle_us);
> +    } else {
> +        if (quota < current) {
> +            cpu->throttle_us_per_full += ring_full_time_us / 10;
> +        } else {
> +            cpu->throttle_us_per_full -= ring_full_time_us / 10;
> +        }
> +    }

So you chose to record the "global max dirty rate per-vcpu" and use it as a
baseline for every vcpu, that looks okay.

The rest is a linear plus a differential feedback, responses farely fast, looks
good too, mostly.  Hope it won't go into oscillation in some special case...

> +
> +    cpu->throttle_us_per_full = MIN(cpu->throttle_us_per_full,
> +        ring_full_time_us * CPU_THROTTLE_PCT_MAX);
> +
> +    cpu->throttle_us_per_full = MAX(cpu->throttle_us_per_full, 0);
> +
> +end:
> +    trace_dirtylimit_throttle(cpu->cpu_index,
> +                              quota, current,
> +                              cpu->throttle_us_per_full);
> +    return;
> +}
> +
> +static void *dirtylimit_thread(void *opaque)
> +{
> +    CPUState *cpu;
> +
> +    rcu_register_thread();
> +
> +    while (dirtylimit_in_service()) {
> +        sleep(DIRTYLIMIT_CALC_TIME_MS / 1000);
> +        CPU_FOREACH(cpu) {
> +            if (!dirtylimit_is_enabled(cpu->cpu_index)) {
> +                continue;
> +            }
> +            dirtylimit_throttle(cpu);
> +        }
> +    }
> +
> +    rcu_unregister_thread();
> +
> +    return NULL;
> +}
> +
> +static struct DirtyLimitInfo *dirtylimit_query_vcpu(int cpu_index)
> +{
> +    DirtyLimitInfo *info = NULL;
> +
> +    info = g_malloc0(sizeof(*info));
> +    info->cpu_index = cpu_index;
> +    info->limit_rate = dirtylimit_quota(cpu_index);
> +    info->current_rate = dirtylimit_current(cpu_index);
> +
> +    return info;
> +}
> +
> +struct DirtyLimitInfoList *dirtylimit_query_all(void)
> +{
> +    int i, index;
> +    DirtyLimitInfo *info = NULL;
> +    DirtyLimitInfoList *head = NULL, **tail = &head;
> +
> +    for (i = 0; i < dirtylimit_state->max_cpus; i++) {
> +        index = dirtylimit_state->states[i].cpu_index;
> +        if (dirtylimit_is_enabled(index)) {
> +            info = dirtylimit_query_vcpu(index);
> +            QAPI_LIST_APPEND(tail, info);
> +        }
> +    }
> +
> +    return head;
> +}
> +
> +static int dirtylimit_nvcpus(void)
> +{
> +    int i;
> +    int nr = 0;
> +    for (i = 0; i < dirtylimit_state->nr; i++) {
> +        unsigned long temp = dirtylimit_state->bmap[i];
> +        nr += ctpopl(temp);
> +    }
> +
> +   return nr;

This is IMHO not necessary, not only the function but the whole bitmap, because
IIUC the bitmap is only used for counting this..

Instead we can have a counter shows how many vcpu is throttled.  Check that
should work already and much easier..

> +}
> +
> +void dirtylimit_cancel_vcpu(int cpu_index)
> +{
> +    if (!dirtylimit_is_enabled(cpu_index)) {
> +        return;
> +    }
> +
> +    dirtylimit_set_quota(cpu_index, 0);
> +    dirtylimit_disable(cpu_index);
> +    bitmap_test_and_clear_atomic(dirtylimit_state->bmap, cpu_index, 1);
> +
> +    if (dirtylimit_nvcpus() == 0) {
> +        dirtylimit_stop();

Here we stopped the dirtyrate thread, while the rest cleanup is done in
qmp_vcpu_dirty_limit(), iiuc.

Can we merge them into one place so we cleanup everything?

> +    }
> +}
> +
> +void dirtylimit_cancel_all(void)
> +{
> +    int i, index;
> +
> +    for (i = 0; i < dirtylimit_state->max_cpus; i++) {
> +        index = dirtylimit_state->states[i].cpu_index;
> +        if (dirtylimit_is_enabled(index)) {
> +            dirtylimit_cancel_vcpu(index);
> +        }
> +    }
> +}
> +
> +void dirtylimit_vcpu(int cpu_index, uint64_t quota)
> +{
> +    trace_dirtylimit_vcpu(cpu_index, quota);
> +
> +    dirtylimit_set_quota(cpu_index, quota);
> +    dirtylimit_enable(cpu_index);
> +    bitmap_set_atomic(dirtylimit_state->bmap, cpu_index, 1);
> +
> +    if (dirtylimit_in_service()) {
> +        goto end;

return

> +    }
> +
> +    dirtylimit_start();
> +    qemu_thread_create(&dirtylimit_state->thread,
> +                       "dirtylimit",
> +                       dirtylimit_thread,
> +                       NULL,
> +                       QEMU_THREAD_JOINABLE);
> +end:
> +    return;

These can be dropped.

> +}
> +
> +void dirtylimit_all(uint64_t quota)
> +{
> +    int i, index;
> +
> +    for (i = 0; i < dirtylimit_state->max_cpus; i++) {
> +        index = dirtylimit_state->states[i].cpu_index;
> +        dirtylimit_vcpu(index, quota);
> +    }
> +}
> +
> +void dirtylimit_state_init(int max_cpus)
> +{
> +    int i;
> +
> +    dirtylimit_state = g_malloc0(sizeof(*dirtylimit_state));
> +
> +    dirtylimit_state->states =
> +            g_malloc0(sizeof(DirtyLimitState) * max_cpus);
> +
> +    for (i = 0; i < max_cpus; i++) {
> +        dirtylimit_state->states[i].cpu_index = i;
> +    }
> +
> +    dirtylimit_state->max_cpus = max_cpus;
> +    dirtylimit_state->bmap = bitmap_new(max_cpus);
> +    bitmap_clear(dirtylimit_state->bmap, 0, max_cpus);
> +    dirtylimit_state->nr = BITS_TO_LONGS(max_cpus);
> +
> +    trace_dirtylimit_state_init(max_cpus);
> +}
> +
> +void dirtylimit_state_finalize(void)
> +{
> +    free(dirtylimit_state->states);
> +    dirtylimit_state->states = NULL;
> +
> +    free(dirtylimit_state->bmap);
> +    dirtylimit_state->bmap = NULL;
> +
> +    free(dirtylimit_state);
> +    dirtylimit_state = NULL;
> +}
> +
>  static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
>  {
>      double pct;
> diff --git a/softmmu/trace-events b/softmmu/trace-events
> index 9c88887..f19acf8 100644
> --- a/softmmu/trace-events
> +++ b/softmmu/trace-events
> @@ -31,3 +31,9 @@ runstate_set(int current_state, const char *current_state_str, int new_state, co
>  system_wakeup_request(int reason) "reason=%d"
>  qemu_system_shutdown_request(int reason) "reason=%d"
>  qemu_system_powerdown_request(void) ""
> +
> +#cpu-throttle.c
> +dirtylimit_state_init(int max_cpus) "dirtylimit state init: max cpus %d"
> +dirtylimit_throttle(int cpu_index, uint64_t quota, uint64_t current, int64_t time_us) "CPU[%d] throttle: quota %" PRIu64 ", current %" PRIu64 ", throttle %"PRIi64 " us"
> +dirtylimit_throttle_pct(int cpu_index, uint64_t pct, int64_t time_us) "CPU[%d] throttle percent: %" PRIu64 ", throttle adjust time %"PRIi64 " us"
> +dirtylimit_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set dirty page rate limit %"PRIu64
> -- 
> 1.8.3.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-14 11:07 ` [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU huangy81
  2021-12-15  7:37   ` Markus Armbruster
@ 2021-12-24  5:14   ` Peter Xu
  2021-12-26 16:00     ` Hyman
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Xu @ 2021-12-24  5:14 UTC (permalink / raw)
  To: huangy81
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Tue, Dec 14, 2021 at 07:07:34PM +0800, huangy81@chinatelecom.cn wrote:
> +void qmp_vcpu_dirty_limit(bool enable,
> +                          bool has_cpu_index,
> +                          uint64_t cpu_index,
> +                          bool has_dirty_rate,
> +                          uint64_t dirty_rate,
> +                          Error **errp)
> +{
> +    static bool initialized;

IMHO this is not needed; if we're with a global state pointer then it's the
same to check against that.

The rest looks mostly good (besides the last proposal on API design which you
got confirmation from Markus).

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-16 10:23               ` Markus Armbruster
@ 2021-12-24  5:16                 ` Peter Xu
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Xu @ 2021-12-24  5:16 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Eduardo Habkost, David Hildenbrand, Richard Henderson,
	Juan Quintela, Hyman Huang, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

On Thu, Dec 16, 2021 at 11:23:37AM +0100, Markus Armbruster wrote:
> Hyman Huang <huangy81@chinatelecom.cn> writes:
> 
> [...]
> 
> > So the final format of qmp we conclude are:
> >
> > case 1: setup vcpu 0 dirty page limit 100MB/s
> > set-vcpu-dirty-limit cpu-index=0 dirty-rate=100
> >
> > case 2: setup all vcpu dirty page limit 100MB/s
> > set-vcpu-dirty-limit dirty-rate=100
> >
> > case 3: cancel vcpu 0 dirty page limit
> > cancel-vcpu-dirty-limit cpu-index=0
> >
> > case 4: cancel all vcpu dirty page limit
> > cancel-vcpu-dirty-limit
> >
> > case 5: query limit infomatioin of all vcpu enabled
> > query-vcpu-dirty-limit
> >
> > And the corresponding hmp format keep the same style:
> >
> > Is there any advice? :)
> 
> Looks okay to me.

Same here.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 0/3] support dirty restraint on vCPU
  2021-12-14 11:07 [PATCH v10 0/3] support dirty restraint on vCPU huangy81
                   ` (2 preceding siblings ...)
  2021-12-14 11:07 ` [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU huangy81
@ 2021-12-24  5:17 ` Peter Xu
  3 siblings, 0 replies; 30+ messages in thread
From: Peter Xu @ 2021-12-24  5:17 UTC (permalink / raw)
  To: huangy81
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Tue, Dec 14, 2021 at 07:07:31PM +0800, huangy81@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> 
> v10:
> - rebase on master
> - make the following modifications on patch [1/3]:
>   1. Make "dirtylimit-calc" thread joinable and join it after quitting.
> 
>   2. Add finalize function to free dirtylimit_calc_state
> 
>   3. Do some code clean work

I played with v10, much, much better and usable, thanks!

I still got some other comments though, please have a look on whether they're
sane.  Also sorry again for the delay on reviewing.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  2021-12-23 11:12   ` Peter Xu
@ 2021-12-26 15:58     ` Hyman
  2021-12-30  5:01     ` Hyman Huang
  1 sibling, 0 replies; 30+ messages in thread
From: Hyman @ 2021-12-26 15:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert



在 2021/12/23 19:12, Peter Xu 写道:
> Hi, Yong,
> 
> On Tue, Dec 14, 2021 at 07:07:32PM +0800, huangy81@chinatelecom.cn wrote:
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
>> tracking for calculate dirtyrate periodly for dirty restraint.
>>
>> Implement thread for calculate dirtyrate periodly, which will
>> be used for dirty page limit.
>>
>> Add dirtylimit.h to introduce the util function for dirty
>> limit implementation.
> 
> Sorry to be late on reading it, my apologies.
Never mind :)
> 
>>
>> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>> ---
>>   include/exec/memory.h       |   5 +-
>>   include/sysemu/dirtylimit.h |  51 ++++++++++++++
>>   migration/dirtyrate.c       | 160 +++++++++++++++++++++++++++++++++++++++++---
>>   migration/dirtyrate.h       |   2 +
>>   4 files changed, 207 insertions(+), 11 deletions(-)
>>   create mode 100644 include/sysemu/dirtylimit.h
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 20f1b27..606bec8 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
>>   /* Dirty tracking enabled because measuring dirty rate */
>>   #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
>>   
>> -#define GLOBAL_DIRTY_MASK  (0x3)
>> +/* Dirty tracking enabled because dirty limit */
>> +#define GLOBAL_DIRTY_LIMIT      (1U << 2)
>> +
>> +#define GLOBAL_DIRTY_MASK  (0x7)
>>   
>>   extern unsigned int global_dirty_tracking;
>>   
>> diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
>> new file mode 100644
>> index 0000000..34e48f8
>> --- /dev/null
>> +++ b/include/sysemu/dirtylimit.h
>> @@ -0,0 +1,51 @@
>> +/*
>> + * dirty limit helper functions
>> + *
>> + * Copyright (c) 2021 CHINA TELECOM CO.,LTD.
>> + *
>> + * Authors:
>> + *  Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +#ifndef QEMU_DIRTYRLIMIT_H
>> +#define QEMU_DIRTYRLIMIT_H
>> +
>> +#define DIRTYLIMIT_CALC_TIME_MS         1000    /* 1000ms */
>> +
>> +/**
>> + * dirtylimit_calc_current
>> + *
>> + * get current dirty page rate for specified virtual CPU.
>> + */
>> +int64_t dirtylimit_calc_current(int cpu_index);
>> +
>> +/**
>> + * dirtylimit_calc_start
>> + *
>> + * start dirty page rate calculation thread.
>> + */
>> +void dirtylimit_calc_start(void);
>> +
>> +/**
>> + * dirtylimit_calc_quit
>> + *
>> + * quit dirty page rate calculation thread.
>> + */
>> +void dirtylimit_calc_quit(void);
>> +
>> +/**
>> + * dirtylimit_calc_state_init
>> + *
>> + * initialize dirty page rate calculation state.
>> + */
>> +void dirtylimit_calc_state_init(int max_cpus);
>> +
>> +/**
>> + * dirtylimit_calc_state_finalize
>> + *
>> + * finalize dirty page rate calculation state.
>> + */
>> +void dirtylimit_calc_state_finalize(void);
>> +#endif
> 
> Since dirtylimit and dirtyrate looks so alike, not sure it's easier to just
> reuse dirtyrate.h; after all you reused dirtyrate.c.
> 
>> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
>> index d65e744..e8d4e4a 100644
>> --- a/migration/dirtyrate.c
>> +++ b/migration/dirtyrate.c
>> @@ -27,6 +27,7 @@
>>   #include "qapi/qmp/qdict.h"
>>   #include "sysemu/kvm.h"
>>   #include "sysemu/runstate.h"
>> +#include "sysemu/dirtylimit.h"
>>   #include "exec/memory.h"
>>   
>>   /*
>> @@ -46,6 +47,155 @@ static struct DirtyRateStat DirtyStat;
>>   static DirtyRateMeasureMode dirtyrate_mode =
>>                   DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
>>   
>> +struct {
>> +    DirtyRatesData data;
>> +    bool quit;
>> +    QemuThread thread;
>> +} *dirtylimit_calc_state;
>> +
>> +static void dirtylimit_global_dirty_log_start(void)
>> +{
>> +    qemu_mutex_lock_iothread();
>> +    memory_global_dirty_log_start(GLOBAL_DIRTY_LIMIT);
>> +    qemu_mutex_unlock_iothread();
>> +}
>> +
>> +static void dirtylimit_global_dirty_log_stop(void)
>> +{
>> +    qemu_mutex_lock_iothread();
>> +    memory_global_dirty_log_stop(GLOBAL_DIRTY_LIMIT);
>> +    qemu_mutex_unlock_iothread();
>> +}
> 
> This is merely dirtyrate_global_dirty_log_start/stop but with a different flag.
> 
> Let's introduce global_dirty_log_change() with BQL?
> 
>    global_dirty_log_change(flag, onoff)
>    {
>      qemu_mutex_lock_iothread();
>      if (start) {
>          memory_global_dirty_log_start(flag);
>      } else {
>          memory_global_dirty_log_stop(flag);
>      }
>      qemu_mutex_unlock_iothread();
>    }
> 
> Then we merge 4 functions into one.
> 
> We can also have a BQL-version of global_dirty_log_sync() in the same patch if
> you think above helpful.
This make things simple.
> 
>> +
>> +static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
>> +                                     CPUState *cpu, bool start)
>> +{
>> +    if (start) {
>> +        dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
>> +    } else {
>> +        dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
>> +    }
>> +}
>> +
>> +static void dirtylimit_calc_func(void)
> 
> Would you still consider merging this with calculate_dirtyrate_dirty_ring?
> 
> I still don't see why it can't.
> 
> Maybe it cannot be directly reused, but the whole logic is really, really
> similar: alloc an array of DirtyPageRecord, take notes, sleep, take some other
> notes, calculate per-vcpu dirty rates.
> 
> There's some trivial details that are different (whether start/stop logging,
> whether use sync), but they can be abstracted.
> 
> At least it can be changed into something like:
> 
>    dirtylimit_calc_func(DirtyRateVcpu *stat)
>    {
>        // never race against cpu hotplug/unplug
>        cpu_list_lock();
> 
>        // this should include allocate buffers and record initial values for all cores
>        record = vcpu_dirty_stat_alloc();
>        // do the sleep
>        duration = vcpu_dirty_stat_wait(records)
> 
>        // the "difference"..
>        global_dirty_log_sync();
> 
>        // collect end timestamps, calculates
>        vcpu_dirty_stat_collect(stat, records);
>        vcpu_dirty_stat_free(stat);
> 
>        cpu_list_unlock();
> 
>        return stat;
>    }
> 
> It may miss something but just to show what I meant..
As i explained in cover letter, i just think merging the 
GLOBAL_DIRTY_LIMIT and GLOBAL_DIRTY_DIRTYRATE into one can not be achieved.
      "I think maybe it's not suitable to touch the 'calc-dirty-rate',
      because 'calc-dirty-rate' will stop sync log after calculating
      the dirtyrate and the 'dirtylimit-cal' will not untill the last
      dirtylimit be canceled, if we merge the GLOBAL_DIRTY_LIMIT into
      GLOBAL_DIRTY_DIRTYRATE, the two are interacted with each other."
And now i see that is not what you mean. :) So i'll merge the code and 
do the code clean next version.
> 
>> +{
>> +    CPUState *cpu;
>> +    DirtyPageRecord *dirty_pages;
>> +    int64_t start_time, end_time, calc_time;
>> +    DirtyRateVcpu rate;
>> +    int i = 0;
>> +
>> +    dirty_pages = g_malloc0(sizeof(*dirty_pages) *
>> +        dirtylimit_calc_state->data.nvcpu);
>> +
>> +    CPU_FOREACH(cpu) {
>> +        record_dirtypages(dirty_pages, cpu, true);
>> +    }
> 
> Note that I used cpu_list_lock() above and I think we need it.
> 
> Initially I thought rcu read lock is fine too (which is missing! btw) but I
> don't really think it'll work, because rcu assignment won't wait for a grace
> period when add/remove cpus into global cpu list.  So I don't see a good way to
> do this safely with cpu plug/unplug but to take the cpu list lock, otherwise be
> prepared to crash qemu when it happens..
> 
> I don't know whether the cpu list is doing the right thing on RCU assignment,
> but I know the mutex should work..
> 
Good advice and i'm not think about that, i'll do som hotpulg test when 
modify this.
>> +
>> +    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +    g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000);
>> +    end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +    calc_time = end_time - start_time;
>> +
>> +    qemu_mutex_lock_iothread();
>> +    memory_global_dirty_log_sync();
>> +    qemu_mutex_unlock_iothread();
>> +
>> +    CPU_FOREACH(cpu) {
>> +        record_dirtypages(dirty_pages, cpu, false);
>> +    }
>> +
>> +    for (i = 0; i < dirtylimit_calc_state->data.nvcpu; i++) {
>> +        uint64_t increased_dirty_pages =
>> +            dirty_pages[i].end_pages - dirty_pages[i].start_pages;
>> +        uint64_t memory_size_MB =
>> +            (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
>> +        int64_t dirtyrate = (memory_size_MB * 1000) / calc_time;
>> +
>> +        rate.id = i;
>> +        rate.dirty_rate  = dirtyrate;
>> +        dirtylimit_calc_state->data.rates[i] = rate;
>> +
>> +        trace_dirtyrate_do_calculate_vcpu(i,
>> +            dirtylimit_calc_state->data.rates[i].dirty_rate);
>> +    }
>> +
>> +    free(dirty_pages);
>> +}
>> +
>> +static void *dirtylimit_calc_thread(void *opaque)
>> +{
>> +    rcu_register_thread();
>> +
>> +    dirtylimit_global_dirty_log_start();
>> +
>> +    while (!qatomic_read(&dirtylimit_calc_state->quit)) {
>> +        dirtylimit_calc_func();
>> +    }
>> +
>> +    dirtylimit_global_dirty_log_stop();
>> +
>> +    rcu_unregister_thread();
>> +    return NULL;
>> +}
>> +
>> +int64_t dirtylimit_calc_current(int cpu_index)
> 
> It's not "calculating" but "fetching", maybe simply call it
> vcpu_get_dirty_rate()?
Ok
> 
>> +{
>> +    DirtyRateVcpu *rates = dirtylimit_calc_state->data.rates;
>> +
>> +    return qatomic_read(&rates[cpu_index].dirty_rate);
>> +}
>> +
>> +void dirtylimit_calc_start(void)
>> +{
>> +    if (!qatomic_read(&dirtylimit_calc_state->quit)) {
> 
> If we can have a "running", then we should check "running==true" instead.
> Please see below on...
> 
>> +        goto end;
> 
> "return" would work.
> 
>> +    }
>> +
>> +    qatomic_set(&dirtylimit_calc_state->quit, 0);
> 
> Same here, set running=true, then clear it when thread quits.
> 
>> +    qemu_thread_create(&dirtylimit_calc_state->thread,
>> +                       "dirtylimit-calc",
>> +                       dirtylimit_calc_thread,
>> +                       NULL,
>> +                       QEMU_THREAD_JOINABLE);
>> +end:
>> +    return;
> 
> No need for both of them..
Ok
> 
>> +}
>> +
>> +void dirtylimit_calc_quit(void)
>> +{
>> +    qatomic_set(&dirtylimit_calc_state->quit, 1);
>> +
>> +    if (qemu_mutex_iothread_locked()) {
> 
> Ideally I think this should already with BQL so not necessary?  I'll check
> later patches, just leave a mark.
> 
>> +        qemu_mutex_unlock_iothread();
>> +        qemu_thread_join(&dirtylimit_calc_state->thread);
>> +        qemu_mutex_lock_iothread();
>> +    } else {
>> +        qemu_thread_join(&dirtylimit_calc_state->thread);
>> +    }
>> +}
>> +
>> +void dirtylimit_calc_state_init(int max_cpus)
>> +{
>> +    dirtylimit_calc_state =
>> +        g_malloc0(sizeof(*dirtylimit_calc_state));
>> +
>> +    dirtylimit_calc_state->data.nvcpu = max_cpus;
>> +    dirtylimit_calc_state->data.rates =
>> +        g_malloc0(sizeof(DirtyRateVcpu) * max_cpus);
>> +
>> +    dirtylimit_calc_state->quit = true;
> 
> Instead of using "quit", I'd rather use "running".  It'll be false by default
> and only set when thread runs.
> 
> Setting "quit" by default just reads weird.. or let's keep both "quit" or
> "running", I think it'll be cleaner, then here we should make running=false and
> quit=false too.
Sounds good.
> 
>> +}
>> +
>> +void dirtylimit_calc_state_finalize(void)
>> +{
>> +    free(dirtylimit_calc_state->data.rates);
>> +    dirtylimit_calc_state->data.rates = NULL;
>> +
>> +    free(dirtylimit_calc_state);
>> +    dirtylimit_calc_state = NULL;
>> +}
>> +
>>   static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
>>   {
>>       int64_t current_time;
>> @@ -396,16 +546,6 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
>>       return true;
>>   }
>>   
>> -static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
>> -                                     CPUState *cpu, bool start)
>> -{
>> -    if (start) {
>> -        dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
>> -    } else {
>> -        dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
>> -    }
>> -}
>> -
>>   static void dirtyrate_global_dirty_log_start(void)
>>   {
>>       qemu_mutex_lock_iothread();
>> diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
>> index 69d4c5b..e96acdc 100644
>> --- a/migration/dirtyrate.h
>> +++ b/migration/dirtyrate.h
>> @@ -70,6 +70,8 @@ typedef struct VcpuStat {
>>       DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
>>   } VcpuStat;
>>   
>> +typedef struct VcpuStat DirtyRatesData;
>> +
>>   /*
>>    * Store calculation statistics for each measure.
>>    */
>> -- 
>> 1.8.3.1
>>
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU
  2021-12-24  5:14   ` Peter Xu
@ 2021-12-26 16:00     ` Hyman
  0 siblings, 0 replies; 30+ messages in thread
From: Hyman @ 2021-12-26 16:00 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert



在 2021/12/24 13:14, Peter Xu 写道:
> On Tue, Dec 14, 2021 at 07:07:34PM +0800, huangy81@chinatelecom.cn wrote:
>> +void qmp_vcpu_dirty_limit(bool enable,
>> +                          bool has_cpu_index,
>> +                          uint64_t cpu_index,
>> +                          bool has_dirty_rate,
>> +                          uint64_t dirty_rate,
>> +                          Error **errp)
>> +{
>> +    static bool initialized;
> 
> IMHO this is not needed; if we're with a global state pointer then it's the
> same to check against that.
Sound good, this make code simpler.
> 
> The rest looks mostly good (besides the last proposal on API design which you
> got confirmation from Markus).
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  2021-12-23 11:12   ` Peter Xu
  2021-12-26 15:58     ` Hyman
@ 2021-12-30  5:01     ` Hyman Huang
  2021-12-30  6:34       ` Peter Xu
  1 sibling, 1 reply; 30+ messages in thread
From: Hyman Huang @ 2021-12-30  5:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert



在 2021/12/23 19:12, Peter Xu 写道:
> Hi, Yong,
> 
> On Tue, Dec 14, 2021 at 07:07:32PM +0800, huangy81@chinatelecom.cn wrote:
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
>> tracking for calculate dirtyrate periodly for dirty restraint.
>>
>> Implement thread for calculate dirtyrate periodly, which will
>> be used for dirty page limit.
>>
>> Add dirtylimit.h to introduce the util function for dirty
>> limit implementation.
> 
> Sorry to be late on reading it, my apologies.
> 
>>
>> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>> ---
>>   include/exec/memory.h       |   5 +-
>>   include/sysemu/dirtylimit.h |  51 ++++++++++++++
>>   migration/dirtyrate.c       | 160 +++++++++++++++++++++++++++++++++++++++++---
>>   migration/dirtyrate.h       |   2 +
>>   4 files changed, 207 insertions(+), 11 deletions(-)
>>   create mode 100644 include/sysemu/dirtylimit.h
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 20f1b27..606bec8 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
>>   /* Dirty tracking enabled because measuring dirty rate */
>>   #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
>>   
>> -#define GLOBAL_DIRTY_MASK  (0x3)
>> +/* Dirty tracking enabled because dirty limit */
>> +#define GLOBAL_DIRTY_LIMIT      (1U << 2)
>> +
>> +#define GLOBAL_DIRTY_MASK  (0x7)
>>   
>>   extern unsigned int global_dirty_tracking;
>>   
>> diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
>> new file mode 100644
>> index 0000000..34e48f8
>> --- /dev/null
>> +++ b/include/sysemu/dirtylimit.h
>> @@ -0,0 +1,51 @@
>> +/*
>> + * dirty limit helper functions
>> + *
>> + * Copyright (c) 2021 CHINA TELECOM CO.,LTD.
>> + *
>> + * Authors:
>> + *  Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +#ifndef QEMU_DIRTYRLIMIT_H
>> +#define QEMU_DIRTYRLIMIT_H
>> +
>> +#define DIRTYLIMIT_CALC_TIME_MS         1000    /* 1000ms */
>> +
>> +/**
>> + * dirtylimit_calc_current
>> + *
>> + * get current dirty page rate for specified virtual CPU.
>> + */
>> +int64_t dirtylimit_calc_current(int cpu_index);
>> +
>> +/**
>> + * dirtylimit_calc_start
>> + *
>> + * start dirty page rate calculation thread.
>> + */
>> +void dirtylimit_calc_start(void);
>> +
>> +/**
>> + * dirtylimit_calc_quit
>> + *
>> + * quit dirty page rate calculation thread.
>> + */
>> +void dirtylimit_calc_quit(void);
>> +
>> +/**
>> + * dirtylimit_calc_state_init
>> + *
>> + * initialize dirty page rate calculation state.
>> + */
>> +void dirtylimit_calc_state_init(int max_cpus);
>> +
>> +/**
>> + * dirtylimit_calc_state_finalize
>> + *
>> + * finalize dirty page rate calculation state.
>> + */
>> +void dirtylimit_calc_state_finalize(void);
>> +#endif
> 
> Since dirtylimit and dirtyrate looks so alike, not sure it's easier to just
> reuse dirtyrate.h; after all you reused dirtyrate.c.
I'm working on this, and i find it's fine to just reuse dirtyrate.h, but 
the origin dirtyrate.h didn't export any function to other qemu module, 
so we should add a new file include/sysemu/dirtyrate.h to export the 
dirty page rage caluculation util function, how do you think about this?

I'm doing the code review and i find that it is more reasonable to 
implement the dirtylimit just in a standalone file such as 
softmmu/dirtylimit.c, since the implementation of dirtylimit in v10 has 
nothing to do with throttle algo in softmmu/cpu-throttle.c. If we merge 
the two implemenation into one file, it is weired. Is this ok?
> 
>> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
>> index d65e744..e8d4e4a 100644
>> --- a/migration/dirtyrate.c
>> +++ b/migration/dirtyrate.c
>> @@ -27,6 +27,7 @@
>>   #include "qapi/qmp/qdict.h"
>>   #include "sysemu/kvm.h"
>>   #include "sysemu/runstate.h"
>> +#include "sysemu/dirtylimit.h"
>>   #include "exec/memory.h"
>>   
>>   /*
>> @@ -46,6 +47,155 @@ static struct DirtyRateStat DirtyStat;
>>   static DirtyRateMeasureMode dirtyrate_mode =
>>                   DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
>>   
>> +struct {
>> +    DirtyRatesData data;
>> +    bool quit;
>> +    QemuThread thread;
>> +} *dirtylimit_calc_state;
>> +
>> +static void dirtylimit_global_dirty_log_start(void)
>> +{
>> +    qemu_mutex_lock_iothread();
>> +    memory_global_dirty_log_start(GLOBAL_DIRTY_LIMIT);
>> +    qemu_mutex_unlock_iothread();
>> +}
>> +
>> +static void dirtylimit_global_dirty_log_stop(void)
>> +{
>> +    qemu_mutex_lock_iothread();
>> +    memory_global_dirty_log_stop(GLOBAL_DIRTY_LIMIT);
>> +    qemu_mutex_unlock_iothread();
>> +}
> 
> This is merely dirtyrate_global_dirty_log_start/stop but with a different flag.
> 
> Let's introduce global_dirty_log_change() with BQL?
> 
>    global_dirty_log_change(flag, onoff)
>    {
>      qemu_mutex_lock_iothread();
>      if (start) {
>          memory_global_dirty_log_start(flag);
>      } else {
>          memory_global_dirty_log_stop(flag);
>      }
>      qemu_mutex_unlock_iothread();
>    }
> 
> Then we merge 4 functions into one.
> 
> We can also have a BQL-version of global_dirty_log_sync() in the same patch if
> you think above helpful.
> 
>> +
>> +static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
>> +                                     CPUState *cpu, bool start)
>> +{
>> +    if (start) {
>> +        dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
>> +    } else {
>> +        dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
>> +    }
>> +}
>> +
>> +static void dirtylimit_calc_func(void)
> 
> Would you still consider merging this with calculate_dirtyrate_dirty_ring?
> 
> I still don't see why it can't.
> 
> Maybe it cannot be directly reused, but the whole logic is really, really
> similar: alloc an array of DirtyPageRecord, take notes, sleep, take some other
> notes, calculate per-vcpu dirty rates.
> 
> There's some trivial details that are different (whether start/stop logging,
> whether use sync), but they can be abstracted.
> 
> At least it can be changed into something like:
> 
>    dirtylimit_calc_func(DirtyRateVcpu *stat)
>    {
>        // never race against cpu hotplug/unplug
>        cpu_list_lock();
> 
>        // this should include allocate buffers and record initial values for all cores
>        record = vcpu_dirty_stat_alloc();
>        // do the sleep
>        duration = vcpu_dirty_stat_wait(records)
> 
>        // the "difference"..
>        global_dirty_log_sync();
> 
>        // collect end timestamps, calculates
>        vcpu_dirty_stat_collect(stat, records);
>        vcpu_dirty_stat_free(stat);
> 
>        cpu_list_unlock();
> 
>        return stat;
>    }
> 
> It may miss something but just to show what I meant..
I think this work is refactor and i do this in a standalone commit 
before the dirtylimit commits. Is this ok?
> 
>> +{
>> +    CPUState *cpu;
>> +    DirtyPageRecord *dirty_pages;
>> +    int64_t start_time, end_time, calc_time;
>> +    DirtyRateVcpu rate;
>> +    int i = 0;
>> +
>> +    dirty_pages = g_malloc0(sizeof(*dirty_pages) *
>> +        dirtylimit_calc_state->data.nvcpu);
>> +
>> +    CPU_FOREACH(cpu) {
>> +        record_dirtypages(dirty_pages, cpu, true);
>> +    }
> 
> Note that I used cpu_list_lock() above and I think we need it.
> 
> Initially I thought rcu read lock is fine too (which is missing! btw) but I
> don't really think it'll work, because rcu assignment won't wait for a grace
> period when add/remove cpus into global cpu list.  So I don't see a good way to
> do this safely with cpu plug/unplug but to take the cpu list lock, otherwise be
> prepared to crash qemu when it happens..
> 
> I don't know whether the cpu list is doing the right thing on RCU assignment,
> but I know the mutex should work..
> 
>> +
>> +    start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +    g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000);
>> +    end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +    calc_time = end_time - start_time;
>> +
>> +    qemu_mutex_lock_iothread();
>> +    memory_global_dirty_log_sync();
>> +    qemu_mutex_unlock_iothread();
>> +
>> +    CPU_FOREACH(cpu) {
>> +        record_dirtypages(dirty_pages, cpu, false);
>> +    }
>> +
>> +    for (i = 0; i < dirtylimit_calc_state->data.nvcpu; i++) {
>> +        uint64_t increased_dirty_pages =
>> +            dirty_pages[i].end_pages - dirty_pages[i].start_pages;
>> +        uint64_t memory_size_MB =
>> +            (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
>> +        int64_t dirtyrate = (memory_size_MB * 1000) / calc_time;
>> +
>> +        rate.id = i;
>> +        rate.dirty_rate  = dirtyrate;
>> +        dirtylimit_calc_state->data.rates[i] = rate;
>> +
>> +        trace_dirtyrate_do_calculate_vcpu(i,
>> +            dirtylimit_calc_state->data.rates[i].dirty_rate);
>> +    }
>> +
>> +    free(dirty_pages);
>> +}
>> +
>> +static void *dirtylimit_calc_thread(void *opaque)
>> +{
>> +    rcu_register_thread();
>> +
>> +    dirtylimit_global_dirty_log_start();
>> +
>> +    while (!qatomic_read(&dirtylimit_calc_state->quit)) {
>> +        dirtylimit_calc_func();
>> +    }
>> +
>> +    dirtylimit_global_dirty_log_stop();
>> +
>> +    rcu_unregister_thread();
>> +    return NULL;
>> +}
>> +
>> +int64_t dirtylimit_calc_current(int cpu_index)
> 
> It's not "calculating" but "fetching", maybe simply call it
> vcpu_get_dirty_rate()?
> 
>> +{
>> +    DirtyRateVcpu *rates = dirtylimit_calc_state->data.rates;
>> +
>> +    return qatomic_read(&rates[cpu_index].dirty_rate);
>> +}
>> +
>> +void dirtylimit_calc_start(void)
>> +{
>> +    if (!qatomic_read(&dirtylimit_calc_state->quit)) {
> 
> If we can have a "running", then we should check "running==true" instead.
> Please see below on...
> 
>> +        goto end;
> 
> "return" would work.
> 
>> +    }
>> +
>> +    qatomic_set(&dirtylimit_calc_state->quit, 0);
> 
> Same here, set running=true, then clear it when thread quits.
> 
>> +    qemu_thread_create(&dirtylimit_calc_state->thread,
>> +                       "dirtylimit-calc",
>> +                       dirtylimit_calc_thread,
>> +                       NULL,
>> +                       QEMU_THREAD_JOINABLE);
>> +end:
>> +    return;
> 
> No need for both of them..
> 
>> +}
>> +
>> +void dirtylimit_calc_quit(void)
>> +{
>> +    qatomic_set(&dirtylimit_calc_state->quit, 1);
>> +
>> +    if (qemu_mutex_iothread_locked()) {
> 
> Ideally I think this should already with BQL so not necessary?  I'll check
> later patches, just leave a mark.
> 
>> +        qemu_mutex_unlock_iothread();
>> +        qemu_thread_join(&dirtylimit_calc_state->thread);
>> +        qemu_mutex_lock_iothread();
>> +    } else {
>> +        qemu_thread_join(&dirtylimit_calc_state->thread);
>> +    }
>> +}
>> +
>> +void dirtylimit_calc_state_init(int max_cpus)
>> +{
>> +    dirtylimit_calc_state =
>> +        g_malloc0(sizeof(*dirtylimit_calc_state));
>> +
>> +    dirtylimit_calc_state->data.nvcpu = max_cpus;
>> +    dirtylimit_calc_state->data.rates =
>> +        g_malloc0(sizeof(DirtyRateVcpu) * max_cpus);
>> +
>> +    dirtylimit_calc_state->quit = true;
> 
> Instead of using "quit", I'd rather use "running".  It'll be false by default
> and only set when thread runs.
> 
> Setting "quit" by default just reads weird.. or let's keep both "quit" or
> "running", I think it'll be cleaner, then here we should make running=false and
> quit=false too.
> 
>> +}
>> +
>> +void dirtylimit_calc_state_finalize(void)
>> +{
>> +    free(dirtylimit_calc_state->data.rates);
>> +    dirtylimit_calc_state->data.rates = NULL;
>> +
>> +    free(dirtylimit_calc_state);
>> +    dirtylimit_calc_state = NULL;
>> +}
>> +
>>   static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
>>   {
>>       int64_t current_time;
>> @@ -396,16 +546,6 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
>>       return true;
>>   }
>>   
>> -static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
>> -                                     CPUState *cpu, bool start)
>> -{
>> -    if (start) {
>> -        dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
>> -    } else {
>> -        dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
>> -    }
>> -}
>> -
>>   static void dirtyrate_global_dirty_log_start(void)
>>   {
>>       qemu_mutex_lock_iothread();
>> diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
>> index 69d4c5b..e96acdc 100644
>> --- a/migration/dirtyrate.h
>> +++ b/migration/dirtyrate.h
>> @@ -70,6 +70,8 @@ typedef struct VcpuStat {
>>       DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
>>   } VcpuStat;
>>   
>> +typedef struct VcpuStat DirtyRatesData;
>> +
>>   /*
>>    * Store calculation statistics for each measure.
>>    */
>> -- 
>> 1.8.3.1
>>
> 

-- 
Best regard

Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  2021-12-30  5:01     ` Hyman Huang
@ 2021-12-30  6:34       ` Peter Xu
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Xu @ 2021-12-30  6:34 UTC (permalink / raw)
  To: Hyman Huang
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Thu, Dec 30, 2021 at 01:01:09PM +0800, Hyman Huang wrote:
> > > +/**
> > > + * dirtylimit_calc_state_finalize
> > > + *
> > > + * finalize dirty page rate calculation state.
> > > + */
> > > +void dirtylimit_calc_state_finalize(void);
> > > +#endif
> > 
> > Since dirtylimit and dirtyrate looks so alike, not sure it's easier to just
> > reuse dirtyrate.h; after all you reused dirtyrate.c.
> I'm working on this, and i find it's fine to just reuse dirtyrate.h, but the
> origin dirtyrate.h didn't export any function to other qemu module, so we
> should add a new file include/sysemu/dirtyrate.h to export the dirty page
> rage caluculation util function, how do you think about this?

I see what you meant, yeah if it's exported outside migration/ then looks fine.

> 
> I'm doing the code review and i find that it is more reasonable to implement
> the dirtylimit just in a standalone file such as softmmu/dirtylimit.c, since
> the implementation of dirtylimit in v10 has nothing to do with throttle algo
> in softmmu/cpu-throttle.c. If we merge the two implemenation into one file,
> it is weired. Is this ok?

Sure.

> > At least it can be changed into something like:
> > 
> >    dirtylimit_calc_func(DirtyRateVcpu *stat)
> >    {
> >        // never race against cpu hotplug/unplug
> >        cpu_list_lock();
> > 
> >        // this should include allocate buffers and record initial values for all cores
> >        record = vcpu_dirty_stat_alloc();
> >        // do the sleep
> >        duration = vcpu_dirty_stat_wait(records)
> > 
> >        // the "difference"..
> >        global_dirty_log_sync();
> > 
> >        // collect end timestamps, calculates
> >        vcpu_dirty_stat_collect(stat, records);
> >        vcpu_dirty_stat_free(stat);
> > 
> >        cpu_list_unlock();
> > 
> >        return stat;
> >    }
> > 
> > It may miss something but just to show what I meant..
> I think this work is refactor and i do this in a standalone commit before
> the dirtylimit commits. Is this ok?

I think that's more than welcomed if you can split the patch into smaller ones;
they'll be even easier to be reviewed.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle
  2021-12-24  5:12   ` Peter Xu
@ 2021-12-30 16:36     ` Hyman Huang
  2022-01-04  2:32       ` Peter Xu
  0 siblings, 1 reply; 30+ messages in thread
From: Hyman Huang @ 2021-12-30 16:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert



在 2021/12/24 13:12, Peter Xu 写道:
> On Tue, Dec 14, 2021 at 07:07:33PM +0800, huangy81@chinatelecom.cn wrote:
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> Setup a negative feedback system when vCPU thread
>> handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
>> throttle_us_per_full field in struct CPUState. Sleep
>> throttle_us_per_full microseconds to throttle vCPU
>> if dirtylimit is enabled.
>>
>> Start a thread to track current dirty page rates and
>> tune the throttle_us_per_full dynamically untill current
>> dirty page rate reach the quota.
>>
>> Introduce the util function in the header for dirtylimit
>> implementation.
>>
>> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>> ---
>>   accel/kvm/kvm-all.c           |  11 ++
>>   include/hw/core/cpu.h         |   6 +
>>   include/sysemu/cpu-throttle.h |  77 +++++++++
>>   include/sysemu/kvm.h          |   2 +
>>   qapi/migration.json           |  19 +++
>>   softmmu/cpu-throttle.c        | 371 ++++++++++++++++++++++++++++++++++++++++++
>>   softmmu/trace-events          |   6 +
>>   7 files changed, 492 insertions(+)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index eecd803..cba5fed 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -45,6 +45,7 @@
>>   #include "qemu/guest-random.h"
>>   #include "sysemu/hw_accel.h"
>>   #include "kvm-cpus.h"
>> +#include "sysemu/cpu-throttle.h"
>>   
>>   #include "hw/boards.h"
>>   
>> @@ -2303,6 +2304,11 @@ bool kvm_dirty_ring_enabled(void)
>>       return kvm_state->kvm_dirty_ring_size ? true : false;
>>   }
>>   
>> +uint32_t kvm_dirty_ring_size(void)
>> +{
>> +    return kvm_state->kvm_dirty_ring_size;
>> +}
>> +
>>   static int kvm_init(MachineState *ms)
>>   {
>>       MachineClass *mc = MACHINE_GET_CLASS(ms);
>> @@ -2933,6 +2939,11 @@ int kvm_cpu_exec(CPUState *cpu)
>>               qemu_mutex_lock_iothread();
>>               kvm_dirty_ring_reap(kvm_state);
>>               qemu_mutex_unlock_iothread();
>> +            if (dirtylimit_in_service() &&
>> +                dirtylimit_is_enabled(cpu->cpu_index) &&
>> +                cpu->throttle_us_per_full) {
>> +                usleep(cpu->throttle_us_per_full);
>> +            }
> 
> Looks good, but perhaps put it into a dirty limit exported helper?
> 
>>               ret = 0;
>>               break;
>>           case KVM_EXIT_SYSTEM_EVENT:
>> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
>> index e948e81..be80fe2 100644
>> --- a/include/hw/core/cpu.h
>> +++ b/include/hw/core/cpu.h
>> @@ -411,6 +411,12 @@ struct CPUState {
>>        */
>>       bool throttle_thread_scheduled;
>>   
>> +    /*
>> +     * Sleep throttle_us_per_full microseconds once dirty ring is full
>> +     * when dirty page limit is enabled.
>> +     */
>> +    int64_t throttle_us_per_full;
>> +
>>       bool ignore_memory_transaction_failures;
>>   
>>       struct hax_vcpu_state *hax_vcpu;
>> diff --git a/include/sysemu/cpu-throttle.h b/include/sysemu/cpu-throttle.h
>> index d65bdef..d4973a5 100644
>> --- a/include/sysemu/cpu-throttle.h
>> +++ b/include/sysemu/cpu-throttle.h
>> @@ -65,4 +65,81 @@ bool cpu_throttle_active(void);
>>    */
>>   int cpu_throttle_get_percentage(void);
>>   
>> +/**
>> + * dirtylimit_is_enabled
>> + *
>> + * Returns: %true if dirty page rate limit on specified virtual CPU is enabled,
>> + *          %false otherwise.
>> + */
>> +bool dirtylimit_is_enabled(int cpu_index);
>> +
>> +/**
>> + * dirtylimit_in_service
>> + *
>> + * Returns: %true if dirty page rate limit thread is running, %false otherwise.
>> + */
>> +bool dirtylimit_in_service(void);
>> +
>> +/**
>> + * dirtylimit_stop
>> + *
>> + * stop dirty page rate limit thread.
>> + */
>> +void dirtylimit_stop(void);
>> +
>> +/**
>> + * dirtylimit_is_vcpu_index_valid
>> + *
>> + * Returns: %true if cpu index valid, %false otherwise.
>> + */
>> +bool dirtylimit_is_vcpu_index_valid(int cpu_index);
>> +
>> +/**
>> + * dirtylimit_state_init
>> + *
>> + * initialize golobal state for dirty page rate limit.
>> + */
>> +void dirtylimit_state_init(int max_cpus);
>> +
>> +/**
>> + * dirtylimit_state_finalize
>> + *
>> + * finalize golobal state for dirty page rate limit.
>> + */
>> +void dirtylimit_state_finalize(void);
>> +
>> +/**
>> + * dirtylimit_vcpu
>> + *
>> + * setup dirty page rate limit on specified virtual CPU with quota.
>> + */
>> +void dirtylimit_vcpu(int cpu_index, uint64_t quota);
>> +
>> +/**
>> + * dirtylimit_all
>> + *
>> + * setup dirty page rate limit on all virtual CPU with quota.
>> + */
>> +void dirtylimit_all(uint64_t quota);
>> +
>> +/**
>> + * dirtylimit_query_all
>> + *
>> + * Returns: dirty page limit information of all virtual CPU enabled.
>> + */
>> +struct DirtyLimitInfoList *dirtylimit_query_all(void);
>> +
>> +/**
>> + * dirtylimit_cancel_vcpu
>> + *
>> + * cancel dirtylimit for the specified virtual CPU.
>> + */
>> +void dirtylimit_cancel_vcpu(int cpu_index);
>> +
>> +/**
>> + * dirtylimit_cancel_all
>> + *
>> + * cancel dirtylimit for all virtual CPU enabled.
>> + */
>> +void dirtylimit_cancel_all(void);
>>   #endif /* SYSEMU_CPU_THROTTLE_H */
>> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
>> index 7b22aeb..e5a9a28 100644
>> --- a/include/sysemu/kvm.h
>> +++ b/include/sysemu/kvm.h
>> @@ -548,4 +548,6 @@ bool kvm_cpu_check_are_resettable(void);
>>   bool kvm_arch_cpu_check_are_resettable(void);
>>   
>>   bool kvm_dirty_ring_enabled(void);
>> +
>> +uint32_t kvm_dirty_ring_size(void);
>>   #endif
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index bbfd48c..ac5fa56 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -1850,6 +1850,25 @@
>>   { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
>>   
>>   ##
>> +# @DirtyLimitInfo:
>> +#
>> +# Dirty page rate limit information of virtual CPU.
>> +#
>> +# @cpu-index: index of virtual CPU.
>> +#
>> +# @limit-rate: upper limit of dirty page rate for virtual CPU.
>> +#
>> +# @current-rate: current dirty page rate for virtual CPU.
>> +#
>> +# Since: 7.0
>> +#
>> +##
>> +{ 'struct': 'DirtyLimitInfo',
>> +  'data': { 'cpu-index': 'int',
>> +            'limit-rate': 'int64',
>> +            'current-rate': 'int64' } }
>> +
>> +##
>>   # @snapshot-save:
>>   #
>>   # Save a VM snapshot
>> diff --git a/softmmu/cpu-throttle.c b/softmmu/cpu-throttle.c
>> index 8c2144a..4472ab3 100644
>> --- a/softmmu/cpu-throttle.c
>> +++ b/softmmu/cpu-throttle.c
>> @@ -29,6 +29,10 @@
>>   #include "qemu/main-loop.h"
>>   #include "sysemu/cpus.h"
>>   #include "sysemu/cpu-throttle.h"
>> +#include "sysemu/dirtylimit.h"
>> +#include "sysemu/kvm.h"
>> +#include "qapi/qapi-commands-migration.h"
>> +#include "trace.h"
>>   
>>   /* vcpu throttling controls */
>>   static QEMUTimer *throttle_timer;
>> @@ -38,6 +42,373 @@ static unsigned int throttle_percentage;
>>   #define CPU_THROTTLE_PCT_MAX 99
>>   #define CPU_THROTTLE_TIMESLICE_NS 10000000
>>   
>> +#define DIRTYLIMIT_TOLERANCE_RANGE  25      /* 25MB/s */
>> +#define DIRTYLIMIT_THROTTLE_PCT_WATERMARK   50
>> +
>> +typedef struct DirtyLimitState {
>> +    int cpu_index;
>> +    bool enabled;
>> +    uint64_t quota;     /* quota dirtyrate MB/s */
>> +    int unfit_cnt;
> 
> What is this?
> 
> Thanks for documenting all the exported functions in the headers.  It's just
> that IMHO it still misses some nice documents on the ones we need, like this
> one, or DIRTYLIMIT_TOLERANCE_RANGE && DIRTYLIMIT_THROTTLE_PCT_WATERMARK above.
> 
>> +} DirtyLimitState;
> 
> How about name per-vcpu structures to be something like VcpuDirtyLimitState?
> Otherwise it has merely the same naming for the global structure right below,
> it can be confusing.
> 
>> +
>> +struct {
>> +    DirtyLimitState *states;
>> +    int max_cpus;
>> +    unsigned long *bmap; /* running thread bitmap */
>> +    unsigned long nr;
>> +    QemuThread thread;
>> +} *dirtylimit_state;
>> +
>> +static bool dirtylimit_quit = true;
> 
> Again, I think "quit" is not a good wording to show "whether dirtylimit is in
> service".  How about "dirtylimit_global_enabled"?
> 
> You can actually use "dirtylimit_state" to show whether it's enabled already
> (then drop the global value) since it's a pointer.  It shouldn't need to be
> init-once-for-all, but we can alloc the strucuture wAhen dirty limit enabled
> globally, and destroy it (and reset it to NULL) when globally disabled.
> 
> Then "whether it's enabled" is simply to check "!!dirtylimit_state" under BQL.
Yes, checking pointer is fairly straightforword, but since dirtylimit 
thread also access the dirtylimit_state when doing the limit, if we free 
dirtylimit_state after last limited vcpu be canceled, dirtylimit thread 
would crash when reference null pointer. And this method turn out to 
introduce a mutex lock to protect dirtylimit_state, comparing with 
qatomic operation, which is better ?
> 
>> +
>> +bool dirtylimit_is_enabled(int cpu_index)
>> +{
>> +    return qatomic_read(&dirtylimit_state->states[cpu_index].enabled);
>> +}
>> +
>> +static inline void dirtylimit_enable(int cpu_index)
>> +{
>> +    qatomic_set(&dirtylimit_state->states[cpu_index].enabled, 1);
>> +}
>> +
>> +static inline void dirtylimit_disable(int cpu_index)
>> +{
>> +    qatomic_set(&dirtylimit_state->states[cpu_index].enabled, 0);
>> +}
>> +
>> +bool dirtylimit_in_service(void)
>> +{
>> +    return !qatomic_read(&dirtylimit_quit);
>> +}
>> +
>> +void dirtylimit_stop(void)
>> +{
>> +    qatomic_set(&dirtylimit_quit, 1);
>> +    if (qemu_mutex_iothread_locked()) {
>> +        qemu_mutex_unlock_iothread();
>> +        qemu_thread_join(&dirtylimit_state->thread);
>> +        qemu_mutex_lock_iothread();
>> +    } else {
>> +        qemu_thread_join(&dirtylimit_state->thread);
> 
> Again, I'm confused, this function is always called with BQL, right?
Yes, i added locked check just in case this function is called without 
BQL, but after checking, only qmp command may call this function and 
it's always called with BQL in Qemu main thread, i'll remove this.
> 
>> +    }
>> +}
>> +
>> +static void dirtylimit_start(void)
>> +{
>> +    qatomic_set(&dirtylimit_quit, 0);
>> +}
>> +
>> +bool dirtylimit_is_vcpu_index_valid(int cpu_index)
>> +{
>> +    return !(cpu_index < 0 ||
>> +             cpu_index >= dirtylimit_state->max_cpus);
> 
> After/If you agree to free dirtylimit_state when disabled, we'd better not
> check state->max_cpus but the MachineState one - smp.max_cpus, just in case the
> state is not initialized.
Ok.
> 
>> +}
>> +
>> +static inline void dirtylimit_set_quota(int cpu_index, uint64_t quota)
>> +{
>> +    dirtylimit_state->states[cpu_index].quota = quota;
>> +}
>> +
>> +static inline uint64_t dirtylimit_quota(int cpu_index)
>> +{
>> +    return dirtylimit_state->states[cpu_index].quota;
>> +}
>> +
>> +static int64_t dirtylimit_current(int cpu_index)
>> +{
>> +    return dirtylimit_calc_current(cpu_index);
>> +}
>> +
>> +static inline int dirtylimit_unfit_cnt(int cpu_index)
>> +{
>> +    return dirtylimit_state->states[cpu_index].unfit_cnt;
>> +}
>> +
>> +static inline int dirtylimit_unfit_cnt_inc(int cpu_index)
>> +{
>> +    return ++dirtylimit_state->states[cpu_index].unfit_cnt;
>> +}
>> +
>> +static inline void dirtylimit_set_unfit_cnt(int cpu_index, int count)
>> +{
>> +    dirtylimit_state->states[cpu_index].unfit_cnt = count;
>> +}
> 
> I tend to drop most of these accesors and instead provide:
> 
> static inline DirtyLimitState *dirtylimit_vcpu_get(int cpu_index)
> {
>      return &dirtylimit_state->states[cpu_index];
> }
> 
>> +
>> +static inline int64_t dirtylimit_dirty_ring_full_time(uint64_t dirtyrate)
>> +{
>> +    static uint64_t max_dirtyrate;
>> +    uint32_t dirty_ring_size = kvm_dirty_ring_size();
>> +    uint64_t dirty_ring_size_meory_MB =
>> +        dirty_ring_size * TARGET_PAGE_SIZE >> 20;
>> +
>> +    if (max_dirtyrate < dirtyrate) {
>> +        max_dirtyrate = dirtyrate;
>> +    }
>> +
>> +    return dirty_ring_size_meory_MB * 1000000 / max_dirtyrate;
>> +}
>> +
>> +static inline bool dirtylimit_hit(uint64_t quota,
>> +                                  uint64_t current)
>> +{
>> +    uint64_t min, max;
>> +
>> +    min = MIN(quota, current);
>> +    max = MAX(quota, current);
>> +
>> +    return ((max - min) <= DIRTYLIMIT_TOLERANCE_RANGE) ? true : false;
>> +}
>> +
>> +static inline bool dirtylimit_turbo(uint64_t quota,
>> +                                    uint64_t current)
>> +{
>> +    uint64_t min, max, pct;
>> +
>> +    min = MIN(quota, current);
>> +    max = MAX(quota, current);
>> +
>> +    pct = (max - min) * 100 / max;
>> +
>> +    return pct > DIRTYLIMIT_THROTTLE_PCT_WATERMARK;
>> +}
>> +
>> +static void dirtylimit_throttle_init(CPUState *cpu,
>> +                                     uint64_t quota,
>> +                                     uint64_t current)
>> +{
>> +    uint64_t pct = 0;
>> +    int64_t throttle_us;
>> +
>> +    if (quota >= current || (current == 0)) {
>> +        cpu->throttle_us_per_full = 0;
>> +    } else {
>> +        pct = (current - quota) * 100 / current;
>> +        pct = MIN(pct, DIRTYLIMIT_THROTTLE_PCT_WATERMARK);
>> +        pct = (double)pct / 100;
>> +
>> +        throttle_us = dirtylimit_dirty_ring_full_time(current) / (1 - pct);
>> +        cpu->throttle_us_per_full = throttle_us;
>> +    }
>> +}
>> +
>> +static void dirtylimit_throttle(CPUState *cpu)
>> +{
>> +    int64_t ring_full_time_us = 0;
>> +    uint64_t quota = 0;
>> +    uint64_t current = 0;
>> +    uint64_t sleep_pct = 0;
>> +    uint64_t throttle_us = 0;
>> +
>> +    quota = dirtylimit_quota(cpu->cpu_index);
>> +    current = dirtylimit_current(cpu->cpu_index);
>> +
>> +    if (current == 0 &&
>> +        dirtylimit_unfit_cnt(cpu->cpu_index) == 0) {
> 
> Until here, there're already 3 helpers to access the same object.  How about
> fetching dirtylimit_state->states[cpu_index] once and remove these helpers?  We
> can directly reference the fields.  The helpers make nothing better but just
> less efficient by looking up cpu_index every time..
ok, thanks for the advice.
> 
>> +        cpu->throttle_us_per_full = 0;
>> +        goto end;
>> +    } else if (cpu->throttle_us_per_full == 0) {
>> +        dirtylimit_throttle_init(cpu, quota, current);
> 
> Why we need to init() if us_per_full==0?  What if we just have a valid
> us_per_full==0 value during throttling simply because the vcpu doesn't write at
> all so we reduced it to zero?
> 
> Can we simply init throttle_us_per_full to zero when enable?  If we can't
> access CPUState* during enablement, we can have a cached throttle_us in
> DirtyLimitState and init it there.  Then at the end of dirtylimit_throttle() we
> apply that to the one in CPUState.
> 
>> +        goto end;
>> +    } else if (dirtylimit_hit(quota, current)) {
> 
> This is the only "undertandable" code to me, until so far.. :)
> 
>> +        goto end;
>> +    } else if (dirtylimit_unfit_cnt_inc(cpu->cpu_index) < 2) {
> 
> Could you help define "unfit_cnt" first with some comment?
> 
> Why choose 2?  Have you tried 4/8/...?
Unfit_cnt means how many times we detect the quota dirtyrate does not 
match current dirtyrate, if we do the throttle once we detect the 
unmatched case, we may misjudge cause the throttle we impose may take 
effect after a little while.
> 
>> +        goto end;
>> +    }
>> +
>> +    dirtylimit_set_unfit_cnt(cpu->cpu_index, 0);
>> +
>> +    ring_full_time_us = dirtylimit_dirty_ring_full_time(current);
>> +    if (dirtylimit_turbo(quota, current)) {
>> +        if (quota < current) {
>> +            sleep_pct = (current - quota) * 100 / current;
>> +            throttle_us =
>> +                ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
>> +            cpu->throttle_us_per_full += throttle_us;
>> +        } else {
>> +            sleep_pct = (quota - current) * 100 / quota;
>> +            throttle_us =
>> +                ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
>> +            cpu->throttle_us_per_full -= throttle_us;
>> +        }
>> +
>> +        trace_dirtylimit_throttle_pct(cpu->cpu_index,
>> +                                      sleep_pct,
>> +                                      throttle_us);
>> +    } else {
>> +        if (quota < current) {
>> +            cpu->throttle_us_per_full += ring_full_time_us / 10;
>> +        } else {
>> +            cpu->throttle_us_per_full -= ring_full_time_us / 10;
>> +        }
>> +    }
> 
> So you chose to record the "global max dirty rate per-vcpu" and use it as a
> baseline for every vcpu, that looks okay.
> 
> The rest is a linear plus a differential feedback, responses farely fast, looks
> good too, mostly.  Hope it won't go into oscillation in some special case...
> 
>> +
>> +    cpu->throttle_us_per_full = MIN(cpu->throttle_us_per_full,
>> +        ring_full_time_us * CPU_THROTTLE_PCT_MAX);
>> +
>> +    cpu->throttle_us_per_full = MAX(cpu->throttle_us_per_full, 0);
>> +
>> +end:
>> +    trace_dirtylimit_throttle(cpu->cpu_index,
>> +                              quota, current,
>> +                              cpu->throttle_us_per_full);
>> +    return;
>> +}
>> +
>> +static void *dirtylimit_thread(void *opaque)
>> +{
>> +    CPUState *cpu;
>> +
>> +    rcu_register_thread();
>> +
>> +    while (dirtylimit_in_service()) {
>> +        sleep(DIRTYLIMIT_CALC_TIME_MS / 1000);
>> +        CPU_FOREACH(cpu) {
>> +            if (!dirtylimit_is_enabled(cpu->cpu_index)) {
>> +                continue;
>> +            }
>> +            dirtylimit_throttle(cpu);
>> +        }
>> +    }
>> +
>> +    rcu_unregister_thread();
>> +
>> +    return NULL;
>> +}
>> +
>> +static struct DirtyLimitInfo *dirtylimit_query_vcpu(int cpu_index)
>> +{
>> +    DirtyLimitInfo *info = NULL;
>> +
>> +    info = g_malloc0(sizeof(*info));
>> +    info->cpu_index = cpu_index;
>> +    info->limit_rate = dirtylimit_quota(cpu_index);
>> +    info->current_rate = dirtylimit_current(cpu_index);
>> +
>> +    return info;
>> +}
>> +
>> +struct DirtyLimitInfoList *dirtylimit_query_all(void)
>> +{
>> +    int i, index;
>> +    DirtyLimitInfo *info = NULL;
>> +    DirtyLimitInfoList *head = NULL, **tail = &head;
>> +
>> +    for (i = 0; i < dirtylimit_state->max_cpus; i++) {
>> +        index = dirtylimit_state->states[i].cpu_index;
>> +        if (dirtylimit_is_enabled(index)) {
>> +            info = dirtylimit_query_vcpu(index);
>> +            QAPI_LIST_APPEND(tail, info);
>> +        }
>> +    }
>> +
>> +    return head;
>> +}
>> +
>> +static int dirtylimit_nvcpus(void)
>> +{
>> +    int i;
>> +    int nr = 0;
>> +    for (i = 0; i < dirtylimit_state->nr; i++) {
>> +        unsigned long temp = dirtylimit_state->bmap[i];
>> +        nr += ctpopl(temp);
>> +    }
>> +
>> +   return nr;
> 
> This is IMHO not necessary, not only the function but the whole bitmap, because
> IIUC the bitmap is only used for counting this..
> 
> Instead we can have a counter shows how many vcpu is throttled.  Check that
> should work already and much easier..
That's a good advice. :)
> 
>> +}
>> +
>> +void dirtylimit_cancel_vcpu(int cpu_index)
>> +{
>> +    if (!dirtylimit_is_enabled(cpu_index)) {
>> +        return;
>> +    }
>> +
>> +    dirtylimit_set_quota(cpu_index, 0);
>> +    dirtylimit_disable(cpu_index);
>> +    bitmap_test_and_clear_atomic(dirtylimit_state->bmap, cpu_index, 1);
>> +
>> +    if (dirtylimit_nvcpus() == 0) {
>> +        dirtylimit_stop();
> 
> Here we stopped the dirtyrate thread, while the rest cleanup is done in
> qmp_vcpu_dirty_limit(), iiuc.
> 
> Can we merge them into one place so we cleanup everything?
Sure.
> 
>> +    }
>> +}
>> +
>> +void dirtylimit_cancel_all(void)
>> +{
>> +    int i, index;
>> +
>> +    for (i = 0; i < dirtylimit_state->max_cpus; i++) {
>> +        index = dirtylimit_state->states[i].cpu_index;
>> +        if (dirtylimit_is_enabled(index)) {
>> +            dirtylimit_cancel_vcpu(index);
>> +        }
>> +    }
>> +}
>> +
>> +void dirtylimit_vcpu(int cpu_index, uint64_t quota)
>> +{
>> +    trace_dirtylimit_vcpu(cpu_index, quota);
>> +
>> +    dirtylimit_set_quota(cpu_index, quota);
>> +    dirtylimit_enable(cpu_index);
>> +    bitmap_set_atomic(dirtylimit_state->bmap, cpu_index, 1);
>> +
>> +    if (dirtylimit_in_service()) {
>> +        goto end;
> 
> return
> 
>> +    }
>> +
>> +    dirtylimit_start();
>> +    qemu_thread_create(&dirtylimit_state->thread,
>> +                       "dirtylimit",
>> +                       dirtylimit_thread,
>> +                       NULL,
>> +                       QEMU_THREAD_JOINABLE);
>> +end:
>> +    return;
> 
> These can be dropped.
> 
>> +}
>> +
>> +void dirtylimit_all(uint64_t quota)
>> +{
>> +    int i, index;
>> +
>> +    for (i = 0; i < dirtylimit_state->max_cpus; i++) {
>> +        index = dirtylimit_state->states[i].cpu_index;
>> +        dirtylimit_vcpu(index, quota);
>> +    }
>> +}
>> +
>> +void dirtylimit_state_init(int max_cpus)
>> +{
>> +    int i;
>> +
>> +    dirtylimit_state = g_malloc0(sizeof(*dirtylimit_state));
>> +
>> +    dirtylimit_state->states =
>> +            g_malloc0(sizeof(DirtyLimitState) * max_cpus);
>> +
>> +    for (i = 0; i < max_cpus; i++) {
>> +        dirtylimit_state->states[i].cpu_index = i;
>> +    }
>> +
>> +    dirtylimit_state->max_cpus = max_cpus;
>> +    dirtylimit_state->bmap = bitmap_new(max_cpus);
>> +    bitmap_clear(dirtylimit_state->bmap, 0, max_cpus);
>> +    dirtylimit_state->nr = BITS_TO_LONGS(max_cpus);
>> +
>> +    trace_dirtylimit_state_init(max_cpus);
>> +}
>> +
>> +void dirtylimit_state_finalize(void)
>> +{
>> +    free(dirtylimit_state->states);
>> +    dirtylimit_state->states = NULL;
>> +
>> +    free(dirtylimit_state->bmap);
>> +    dirtylimit_state->bmap = NULL;
>> +
>> +    free(dirtylimit_state);
>> +    dirtylimit_state = NULL;
>> +}
>> +
>>   static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
>>   {
>>       double pct;
>> diff --git a/softmmu/trace-events b/softmmu/trace-events
>> index 9c88887..f19acf8 100644
>> --- a/softmmu/trace-events
>> +++ b/softmmu/trace-events
>> @@ -31,3 +31,9 @@ runstate_set(int current_state, const char *current_state_str, int new_state, co
>>   system_wakeup_request(int reason) "reason=%d"
>>   qemu_system_shutdown_request(int reason) "reason=%d"
>>   qemu_system_powerdown_request(void) ""
>> +
>> +#cpu-throttle.c
>> +dirtylimit_state_init(int max_cpus) "dirtylimit state init: max cpus %d"
>> +dirtylimit_throttle(int cpu_index, uint64_t quota, uint64_t current, int64_t time_us) "CPU[%d] throttle: quota %" PRIu64 ", current %" PRIu64 ", throttle %"PRIi64 " us"
>> +dirtylimit_throttle_pct(int cpu_index, uint64_t pct, int64_t time_us) "CPU[%d] throttle percent: %" PRIu64 ", throttle adjust time %"PRIi64 " us"
>> +dirtylimit_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set dirty page rate limit %"PRIu64
>> -- 
>> 1.8.3.1
>>
> 

-- 
Best regard

Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle
  2021-12-30 16:36     ` Hyman Huang
@ 2022-01-04  2:32       ` Peter Xu
  2022-01-04  3:27         ` Hyman Huang
  2022-01-13 16:22         ` Markus Armbruster
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Xu @ 2022-01-04  2:32 UTC (permalink / raw)
  To: Hyman Huang
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Fri, Dec 31, 2021 at 12:36:40AM +0800, Hyman Huang wrote:
> > > +struct {
> > > +    DirtyLimitState *states;
> > > +    int max_cpus;
> > > +    unsigned long *bmap; /* running thread bitmap */
> > > +    unsigned long nr;
> > > +    QemuThread thread;
> > > +} *dirtylimit_state;
> > > +
> > > +static bool dirtylimit_quit = true;
> > 
> > Again, I think "quit" is not a good wording to show "whether dirtylimit is in
> > service".  How about "dirtylimit_global_enabled"?
> > 
> > You can actually use "dirtylimit_state" to show whether it's enabled already
> > (then drop the global value) since it's a pointer.  It shouldn't need to be
> > init-once-for-all, but we can alloc the strucuture wAhen dirty limit enabled
> > globally, and destroy it (and reset it to NULL) when globally disabled.
> > 
> > Then "whether it's enabled" is simply to check "!!dirtylimit_state" under BQL.
> Yes, checking pointer is fairly straightforword, but since dirtylimit thread
> also access the dirtylimit_state when doing the limit, if we free
> dirtylimit_state after last limited vcpu be canceled, dirtylimit thread
> would crash when reference null pointer. And this method turn out to
> introduce a mutex lock to protect dirtylimit_state, comparing with qatomic
> operation, which is better ?

I don't see much difference here on using either atomic or mutex, because it's
not a hot path.

If to use mutex and not overload BQL we can use a dirtylimit_mutex then before
referencing the pointer anywhere we need to fetch it, and release when sleep.

The only thing confusing to me about the global variable way is having
quit=true as initial value, and clearing it when start.  I think it'll work,
but just reads weird.

How about having "enabled" and "quit" as a normal threaded app?  Then:

  - When init: enabled=0 quit=0
  - When start: enabled=1 quit=0
  - When stop
    - main thread set enabled=1 quit=1
    - dirtylimit sees quit=1, goes to join()
    - main thread reset enable=quit=0

dirtylimit_in_service() should reference "enabled", and "quit" should be only
used for sync on exit.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle
  2022-01-04  2:32       ` Peter Xu
@ 2022-01-04  3:27         ` Hyman Huang
  2022-01-13 16:22         ` Markus Armbruster
  1 sibling, 0 replies; 30+ messages in thread
From: Hyman Huang @ 2022-01-04  3:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert



在 2022/1/4 10:32, Peter Xu 写道:
> On Fri, Dec 31, 2021 at 12:36:40AM +0800, Hyman Huang wrote:
>>>> +struct {
>>>> +    DirtyLimitState *states;
>>>> +    int max_cpus;
>>>> +    unsigned long *bmap; /* running thread bitmap */
>>>> +    unsigned long nr;
>>>> +    QemuThread thread;
>>>> +} *dirtylimit_state;
>>>> +
>>>> +static bool dirtylimit_quit = true;
>>>
>>> Again, I think "quit" is not a good wording to show "whether dirtylimit is in
>>> service".  How about "dirtylimit_global_enabled"?
>>>
>>> You can actually use "dirtylimit_state" to show whether it's enabled already
>>> (then drop the global value) since it's a pointer.  It shouldn't need to be
>>> init-once-for-all, but we can alloc the strucuture wAhen dirty limit enabled
>>> globally, and destroy it (and reset it to NULL) when globally disabled.
>>>
>>> Then "whether it's enabled" is simply to check "!!dirtylimit_state" under BQL.
>> Yes, checking pointer is fairly straightforword, but since dirtylimit thread
>> also access the dirtylimit_state when doing the limit, if we free
>> dirtylimit_state after last limited vcpu be canceled, dirtylimit thread
>> would crash when reference null pointer. And this method turn out to
>> introduce a mutex lock to protect dirtylimit_state, comparing with qatomic
>> operation, which is better ?
> 
> I don't see much difference here on using either atomic or mutex, because it's
> not a hot path.
> 
> If to use mutex and not overload BQL we can use a dirtylimit_mutex then before
> referencing the pointer anywhere we need to fetch it, and release when sleep.
Ok, i'm already try this in the next version :)
> 
> The only thing confusing to me about the global variable way is having
> quit=true as initial value, and clearing it when start.  I think it'll work,
> but just reads weird.
> 
> How about having "enabled" and "quit" as a normal threaded app?  Then:
> 
>    - When init: enabled=0 quit=0
>    - When start: enabled=1 quit=0
>    - When stop
>      - main thread set enabled=1 quit=1
>      - dirtylimit sees quit=1, goes to join()
>      - main thread reset enable=quit=0
> 
> dirtylimit_in_service() should reference "enabled", and "quit" should be only
> used for sync on exit.
> 
Ok, no problem
> Thanks,
> 

-- 
Best regard

Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle
  2022-01-04  2:32       ` Peter Xu
  2022-01-04  3:27         ` Hyman Huang
@ 2022-01-13 16:22         ` Markus Armbruster
  2022-01-14  1:30           ` Hyman Huang
  1 sibling, 1 reply; 30+ messages in thread
From: Markus Armbruster @ 2022-01-13 16:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eduardo Habkost, David Hildenbrand, Richard Henderson,
	Juan Quintela, Hyman Huang, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé

Peter Xu <peterx@redhat.com> writes:

> On Fri, Dec 31, 2021 at 12:36:40AM +0800, Hyman Huang wrote:
>> > > +struct {
>> > > +    DirtyLimitState *states;
>> > > +    int max_cpus;
>> > > +    unsigned long *bmap; /* running thread bitmap */
>> > > +    unsigned long nr;
>> > > +    QemuThread thread;
>> > > +} *dirtylimit_state;
>> > > +
>> > > +static bool dirtylimit_quit = true;
>> > 
>> > Again, I think "quit" is not a good wording to show "whether dirtylimit is in
>> > service".  How about "dirtylimit_global_enabled"?
>> > 
>> > You can actually use "dirtylimit_state" to show whether it's enabled already
>> > (then drop the global value) since it's a pointer.  It shouldn't need to be
>> > init-once-for-all, but we can alloc the strucuture wAhen dirty limit enabled
>> > globally, and destroy it (and reset it to NULL) when globally disabled.
>> > 
>> > Then "whether it's enabled" is simply to check "!!dirtylimit_state" under BQL.
>> Yes, checking pointer is fairly straightforword, but since dirtylimit thread
>> also access the dirtylimit_state when doing the limit, if we free
>> dirtylimit_state after last limited vcpu be canceled, dirtylimit thread
>> would crash when reference null pointer. And this method turn out to
>> introduce a mutex lock to protect dirtylimit_state, comparing with qatomic
>> operation, which is better ?
>
> I don't see much difference here on using either atomic or mutex, because it's
> not a hot path.

Quick interjection without having bothered to understand the details:
correct use of atomics and memory barriers is *much* harder than correct
use of locks.  Stick to locks unless you *know* they impair performance.

[...]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle
  2022-01-13 16:22         ` Markus Armbruster
@ 2022-01-14  1:30           ` Hyman Huang
  2022-01-14  3:35             ` Peter Xu
  0 siblings, 1 reply; 30+ messages in thread
From: Hyman Huang @ 2022-01-14  1:30 UTC (permalink / raw)
  To: Markus Armbruster, Peter Xu
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Paolo Bonzini, Philippe Mathieu-Daudé



在 2022/1/14 0:22, Markus Armbruster 写道:
> Peter Xu <peterx@redhat.com> writes:
> 
>> On Fri, Dec 31, 2021 at 12:36:40AM +0800, Hyman Huang wrote:
>>>>> +struct {
>>>>> +    DirtyLimitState *states;
>>>>> +    int max_cpus;
>>>>> +    unsigned long *bmap; /* running thread bitmap */
>>>>> +    unsigned long nr;
>>>>> +    QemuThread thread;
>>>>> +} *dirtylimit_state;
>>>>> +
>>>>> +static bool dirtylimit_quit = true;
>>>>
>>>> Again, I think "quit" is not a good wording to show "whether dirtylimit is in
>>>> service".  How about "dirtylimit_global_enabled"?
>>>>
>>>> You can actually use "dirtylimit_state" to show whether it's enabled already
>>>> (then drop the global value) since it's a pointer.  It shouldn't need to be
>>>> init-once-for-all, but we can alloc the strucuture wAhen dirty limit enabled
>>>> globally, and destroy it (and reset it to NULL) when globally disabled.
>>>>
>>>> Then "whether it's enabled" is simply to check "!!dirtylimit_state" under BQL.
>>> Yes, checking pointer is fairly straightforword, but since dirtylimit thread
>>> also access the dirtylimit_state when doing the limit, if we free
>>> dirtylimit_state after last limited vcpu be canceled, dirtylimit thread
>>> would crash when reference null pointer. And this method turn out to
>>> introduce a mutex lock to protect dirtylimit_state, comparing with qatomic
>>> operation, which is better ?
>>
>> I don't see much difference here on using either atomic or mutex, because it's
>> not a hot path.
> 
> Quick interjection without having bothered to understand the details:
> correct use of atomics and memory barriers is *much* harder than correct
> use of locks.  Stick to locks unless you *know* they impair performance.
> 
Ok, i get it, i removed most of atomic operations in v11 and use the 
lock instead. But still thanks for the advice :)
> [...]
> 

-- 
Best regard

Hyman Huang(黄勇)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle
  2022-01-14  1:30           ` Hyman Huang
@ 2022-01-14  3:35             ` Peter Xu
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Xu @ 2022-01-14  3:35 UTC (permalink / raw)
  To: Hyman Huang
  Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
	Richard Henderson, Markus Armbruster, qemu-devel, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	Dr. David Alan Gilbert

On Fri, Jan 14, 2022 at 09:30:39AM +0800, Hyman Huang wrote:
> 
> 
> 在 2022/1/14 0:22, Markus Armbruster 写道:
> > Peter Xu <peterx@redhat.com> writes:
> > 
> > > On Fri, Dec 31, 2021 at 12:36:40AM +0800, Hyman Huang wrote:
> > > > > > +struct {
> > > > > > +    DirtyLimitState *states;
> > > > > > +    int max_cpus;
> > > > > > +    unsigned long *bmap; /* running thread bitmap */
> > > > > > +    unsigned long nr;
> > > > > > +    QemuThread thread;
> > > > > > +} *dirtylimit_state;
> > > > > > +
> > > > > > +static bool dirtylimit_quit = true;
> > > > > 
> > > > > Again, I think "quit" is not a good wording to show "whether dirtylimit is in
> > > > > service".  How about "dirtylimit_global_enabled"?
> > > > > 
> > > > > You can actually use "dirtylimit_state" to show whether it's enabled already
> > > > > (then drop the global value) since it's a pointer.  It shouldn't need to be
> > > > > init-once-for-all, but we can alloc the strucuture wAhen dirty limit enabled
> > > > > globally, and destroy it (and reset it to NULL) when globally disabled.
> > > > > 
> > > > > Then "whether it's enabled" is simply to check "!!dirtylimit_state" under BQL.
> > > > Yes, checking pointer is fairly straightforword, but since dirtylimit thread
> > > > also access the dirtylimit_state when doing the limit, if we free
> > > > dirtylimit_state after last limited vcpu be canceled, dirtylimit thread
> > > > would crash when reference null pointer. And this method turn out to
> > > > introduce a mutex lock to protect dirtylimit_state, comparing with qatomic
> > > > operation, which is better ?
> > > 
> > > I don't see much difference here on using either atomic or mutex, because it's
> > > not a hot path.
> > 
> > Quick interjection without having bothered to understand the details:
> > correct use of atomics and memory barriers is *much* harder than correct
> > use of locks.  Stick to locks unless you *know* they impair performance

Yong,

Just a heads up - You seem to have replied something but there's really nothing
I saw... it happened multiple times, so I hope you didn't miss it by sending
something empty.

I agree with Markus, and that's also what I wanted to express too (it's not a
perf critical path, so we don't necessarily need to use atomics; obviously I
failed again on using English to express myself.. :).  But I don't urge it if
the atomics works pretty simple and well.  I think I'll read the atomic version
you posted first and I'll comment again there.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-01-14  3:36 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-14 11:07 [PATCH v10 0/3] support dirty restraint on vCPU huangy81
2021-12-14 11:07 ` [PATCH v10 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically huangy81
2021-12-23 11:12   ` Peter Xu
2021-12-26 15:58     ` Hyman
2021-12-30  5:01     ` Hyman Huang
2021-12-30  6:34       ` Peter Xu
2021-12-14 11:07 ` [PATCH v10 2/3] cpu-throttle: implement virtual CPU throttle huangy81
2021-12-15  7:15   ` Markus Armbruster
2021-12-15  7:40     ` Hyman Huang
2021-12-24  5:12   ` Peter Xu
2021-12-30 16:36     ` Hyman Huang
2022-01-04  2:32       ` Peter Xu
2022-01-04  3:27         ` Hyman Huang
2022-01-13 16:22         ` Markus Armbruster
2022-01-14  1:30           ` Hyman Huang
2022-01-14  3:35             ` Peter Xu
2021-12-14 11:07 ` [PATCH v10 3/3] cpus-common: implement dirty page limit on virtual CPU huangy81
2021-12-15  7:37   ` Markus Armbruster
2021-12-15  7:56     ` Hyman Huang
2021-12-15  8:09       ` Peter Xu
2021-12-15  8:29         ` Hyman Huang
2021-12-15 10:16           ` Hyman Huang
2021-12-15 13:41         ` Markus Armbruster
2021-12-16  6:22           ` Peter Xu
2021-12-16  9:16             ` Hyman Huang
2021-12-16 10:23               ` Markus Armbruster
2021-12-24  5:16                 ` Peter Xu
2021-12-24  5:14   ` Peter Xu
2021-12-26 16:00     ` Hyman
2021-12-24  5:17 ` [PATCH v10 0/3] support dirty restraint on vCPU Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.