* [PATCH v11 0/4] support dirty restraint on vCPU
@ 2022-01-04 17:14 huangy81
[not found] ` <cover.1641316375.git.huangy81@chinatelecom.cn>
2022-01-17 8:54 ` [PATCH v11 0/4] support dirty restraint on vCPU Peter Xu
0 siblings, 2 replies; 33+ messages in thread
From: huangy81 @ 2022-01-04 17:14 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, David Hildenbrand, Hyman, Juan Quintela,
Richard Henderson, Markus ArmBruster, Peter Xu,
Dr. David Alan Gilbert, Paolo Bonzini,
Philippe Mathieu-Daudé
From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
v11
- rebase on master
- add a commit " refactor dirty page rate calculation" so that dirty page rate limit
can reuse the calculation logic.
- handle the cpu hotplug/unplug case in the dirty page rate calculation logic.
- modify the qmp commands according to Markus's advice.
- introduce a standalone file dirtylimit.c to implement dirty page rate limit
- check if dirty limit in service by dirtylimit_state pointer instead of global variable
- introduce dirtylimit_mutex to protect dirtylimit_state
- do some code clean and docs
See the commit for more detail, thanks Markus and Peter very mush for the code
review and give the experienced and insightful advices, most modifications are
based on these advices.
v10:
- rebase on master
- make the following modifications on patch [1/3]:
1. Make "dirtylimit-calc" thread joinable and join it after quitting.
2. Add finalize function to free dirtylimit_calc_state
3. Do some code clean work
- make the following modifications on patch [2/3]:
1. Remove the original implementation of throttle according to
Peter's advice.
2. Introduce a negative feedback system and implement the throttle
on all vcpu in one thread named "dirtylimit".
3. Simplify the algo when calculation the throttle_us_per_full:
increase/decrease linearly when there exists a wide difference
between quota and current dirty page rate, increase/decrease
a fixed time slice when the difference is narrow. This makes
throttle responds faster and reach the quota smoothly.
4. Introduce a unfit_cnt in algo to make sure throttle really
takes effect.
5. Set the max sleep time 99 times more than "ring_full_time_us".
6. Make "dirtylimit" thread joinable and join it after quitting.
- make the following modifications on patch [3/3]:
1. Remove the unplug cpu handling logic.
2. "query-vcpu-dirty-limit" only return dirtylimit information of
vcpus that enable dirtylimit
3. Remove the "dirtylimit_setup" function
4. Trigger the dirtylimit and initialize the global state only
when someone enable dirtylimit, and finalize it after the last
dirtylimit be canceled.
5. Redefine the qmp command vcpu-dirty-limit/query-vcpu-dirty-limit:
enable/disable dirtylimit use a single command "vcpu-dirty-limit",
to enable/disabled dirtylimit on specified vcpu only if "cpu-index"
is specified, otherwise, all vcpu will be affected.
6. Redefine the hmp command vcpu_dirty_limit/info vcpu_dirty_limit
- other points about the code review:
1. "merge the code of calculation dirty page rate"
I think maybe it's not suitable to touch the 'calc-dirty-rate',
because 'calc-dirty-rate' will stop sync log after calculating
the dirtyrate and the 'dirtylimit-cal' will not untill the last
dirtylimit be canceled, if we merge the GLOBAL_DIRTY_LIMIT into
GLOBAL_DIRTY_DIRTYRATE, the two are interacted with each other.
2. The new implementaion of throttle algo enlightened by Peter
responds faster and consume less cpu resource than the older,
we make a impressed progress.
And there is a viewpoint may be discussed, it is that the new
throttle logic is "passive", vcpu sleeps only after dirty ring,
is full, unlike the "auto-converge" which will kick vcpu instead
in a fixed slice time. If the vcpu is memory-write intensive
and the ring size is large, it will produce dirty memory during
the dirty ring full time and the throttle works not so good, it
means the throttle depends on the dirty ring size.
I actually tested the new algo in two case:
case 1: dirty-ring-size: 4096, dirtyrate: 1170MB/s
result: minimum quota dirtyrate is 25MB/s or even less
minimum vcpu util is 6%
case 2: dirty-ring-size: 65536, dirtyrate: 1170MB/s
result: minimum quota dirtyrate is 256MB/s
minimum vcpu util is 24%
I post this just for discussion, i think this is not a big deal
beacase if we set the dirty-ring-size to the maximum value(65536),
we assume the server's bandwidth is capable of handling it.
3. I hard-code the acceptable deviation value to 25MB/s, see the
macro DIRTYLIMIT_TOLERANCE_RANGE. I'm struggling to decide
whether to let it configurable
4. Another point is the unplug cpu handle, current algo affects the
unplugged vcpu, if we set dirty limit on it, we should fork 2
thread "dirtylimit" and "dirtylimit-calc" but do nothing, once the
vcpu is hot-plugged, dirty limit works, i think the logic is ok
but still there can be different advice.
- to let developers play with it easier, i post the hmp usage example:
(qemu) vcpu_dirty_limit -g on -1 500
[Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
(qemu) info vcpu_dirty_limit
vcpu[0], limit rate 500 (MB/s), current rate 415 (MB/s)
vcpu[1], limit rate 500 (MB/s), current rate 496 (MB/s)
vcpu[2], limit rate 500 (MB/s), current rate 0 (MB/s)
vcpu[3], limit rate 500 (MB/s), current rate 0 (MB/s)
(qemu) vcpu_dirty_limit -g off
[Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
(qemu) info vcpu_dirty_limit
Dirty page limit not enabled!
(qemu) vcpu_dirty_limit on 0 300
[Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
(qemu) vcpu_dirty_limit on 1 500
[Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
(qemu) info vcpu_dirty_limit
vcpu[0], limit rate 300 (MB/s), current rate 342 (MB/s)
vcpu[1], limit rate 500 (MB/s), current rate 485 (MB/s)
(qemu) vcpu_dirty_limit off 0
[Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
(qemu) info vcpu_dirty_limit
vcpu[1], limit rate 500 (MB/s), current rate 528 (MB/s)
(qemu) vcpu_dirty_limit off 1
[Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
(qemu) info vcpu_dirty_limit
Dirty page limit not enabled!
Thanks very much for the instructive algo suggestion given by Peter,
the comment and other code reviews made by Markus.
Please review, thanks!
v9:
- rebase on master
- fix the meson directory change, keep it untouched.
v8:
- rebase on master
- polish the error message and remove the "unlikely" compilation syntax
according to the advice given by Markus.
- keep the dirty tracking enabled during "dirtylimit-calc" lifecycle
so that the overhead can be reduced according to the advice given by
Peter.
- merge the "set/cancel" qmp commands into one named "vcpu-dirty-limit"
and introduce qmp command "query-vcpu-dirty-limit" to query dirty
limit information about virtual CPU, according to the advice given by
Peter.
- check if vcpu index is valid and handle the unplug case before
enabling, disabling dirty limit for virtual CPU.
- introduce hmp commands so developers can play with them easier, use
"vcpu_dirty_limit" to enable dirty limit and "info vcpu_dirty_limit"
to query.
The patch [2/3] has not been touched so far. Any corrections and
suggetions are welcome.
Please review, thanks!
v7:
- rebase on master
- polish the comments and error message according to the
advices given by Markus
- introduce dirtylimit_enabled function to pre-check if dirty
page limit is enabled before canceling.
v6:
- rebase on master
- fix dirtylimit setup crash found by Markus
- polish the comments according to the advice given by Markus
- adjust the qemu qmp command tag to 7.0
v5:
- rebase on master
- adjust the throttle algorithm by removing the tuning in
RESTRAINT_RATIO case so that dirty page rate could reachs the quota
more quickly.
- fix percentage update in throttle iteration.
v4:
- rebase on master
- modify the following points according to the advice given by Markus
1. move the defination into migration.json
2. polish the comments of set-dirty-limit
3. do the syntax check and change dirty rate to dirty page rate
Thanks for the carefule reviews made by Markus.
Please review, thanks!
v3:
- rebase on master
- modify the following points according to the advice given by Markus
1. remove the DirtyRateQuotaVcpu and use its field as option directly
2. add comments to show details of what dirtylimit setup do
3. explain how to use dirtylimit in combination with existing qmp
commands "calc-dirty-rate" and "query-dirty-rate" in documentation.
Thanks for the carefule reviews made by Markus.
Please review, thanks!
Hyman
v2:
- rebase on master
- modify the following points according to the advices given by Juan
1. rename dirtyrestraint to dirtylimit
2. implement the full lifecyle function of dirtylimit_calc, include
dirtylimit_calc and dirtylimit_calc_quit
3. introduce 'quit' field in dirtylimit_calc_state to implement the
dirtylimit_calc_quit
4. remove the ready_cond and ready_mtx since it may not be suitable
5. put the 'record_dirtypage' function code at the beggining of the
file
6. remove the unnecesary return;
- other modifications has been made after code review
1. introduce 'bmap' and 'nr' field in dirtylimit_state to record the
number of running thread forked by dirtylimit
2. stop the dirtyrate calculation thread if all the dirtylimit thread
are stopped
3. do some renaming works
dirtyrate calulation thread -> dirtylimit-calc
dirtylimit thread -> dirtylimit-{cpu_index}
function name do_dirtyrestraint -> dirtylimit_check
qmp command dirty-restraint -> set-drity-limit
qmp command dirty-restraint-cancel -> cancel-dirty-limit
header file dirtyrestraint.h -> dirtylimit.h
Please review, thanks !
thanks for the accurate and timely advices given by Juan. we really
appreciate it if corrections and suggetions about this patchset are
proposed.
Best Regards !
Hyman
v1:
this patchset introduce a mechanism to impose dirty restraint
on vCPU, aiming to keep the vCPU running in a certain dirtyrate
given by user. dirty restraint on vCPU maybe an alternative
method to implement convergence logic for live migration,
which could improve guest memory performance during migration
compared with traditional method in theory.
For the current live migration implementation, the convergence
logic throttles all vCPUs of the VM, which has some side effects.
-'read processes' on vCPU will be unnecessarily penalized
- throttle increase percentage step by step, which seems
struggling to find the optimal throttle percentage when
dirtyrate is high.
- hard to predict the remaining time of migration if the
throttling percentage reachs 99%
to a certain extent, the dirty restraint machnism can fix these
effects by throttling at vCPU granularity during migration.
the implementation is rather straightforward, we calculate
vCPU dirtyrate via the Dirty Ring mechanism periodically
as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
does, for vCPU that be specified to impose dirty restraint,
we throttle it periodically as the auto-converge does, once after
throttling, we compare the quota dirtyrate with current dirtyrate,
if current dirtyrate is not under the quota, increase the throttling
percentage until current dirtyrate is under the quota.
this patchset is the basis of implmenting a new auto-converge method
for live migration, we introduce two qmp commands for impose/cancel
the dirty restraint on specified vCPU, so it also can be an independent
api to supply the upper app such as libvirt, which can use it to
implement the convergence logic during live migration, supplemented
with the qmp 'calc-dirty-rate' command or whatever.
we post this patchset for RFC and any corrections and suggetions about
the implementation, api, throttleing algorithm or whatever are very
appreciated!
Please review, thanks !
Best Regards !
Hyman Huang (4):
migration/dirtyrate: refactor dirty page rate calculation
softmmu/dirtylimit: implement vCPU dirtyrate calculation periodically
softmmu/dirtylimit: implement virtual CPU throttle
softmmu/dirtylimit: implement dirty page rate limit
accel/kvm/kvm-all.c | 8 +
hmp-commands-info.hx | 13 +
hmp-commands.hx | 32 +++
include/exec/memory.h | 5 +-
include/hw/core/cpu.h | 6 +
include/monitor/hmp.h | 3 +
include/sysemu/dirtylimit.h | 34 +++
include/sysemu/dirtyrate.h | 29 ++
include/sysemu/kvm.h | 2 +
migration/dirtyrate.c | 220 +++++++++------
migration/dirtyrate.h | 7 +-
qapi/migration.json | 79 ++++++
softmmu/dirtylimit.c | 644 ++++++++++++++++++++++++++++++++++++++++++++
softmmu/meson.build | 1 +
softmmu/trace-events | 8 +
15 files changed, 1005 insertions(+), 86 deletions(-)
create mode 100644 include/sysemu/dirtylimit.h
create mode 100644 include/sysemu/dirtyrate.h
create mode 100644 softmmu/dirtylimit.c
--
1.8.3.1
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v11 1/4] migration/dirtyrate: refactor dirty page rate calculation
[not found] ` <cover.1641316375.git.huangy81@chinatelecom.cn>
@ 2022-01-04 17:14 ` huangy81
2022-01-17 2:19 ` Peter Xu
2022-01-04 17:14 ` [PATCH v11 2/4] softmmu/dirtylimit: implement vCPU dirtyrate calculation periodically huangy81
` (2 subsequent siblings)
3 siblings, 1 reply; 33+ messages in thread
From: huangy81 @ 2022-01-04 17:14 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, David Hildenbrand, Hyman, Juan Quintela,
Richard Henderson, Markus ArmBruster, Peter Xu,
Dr. David Alan Gilbert, Paolo Bonzini,
Philippe Mathieu-Daudé
From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
abstract out dirty log change logic into function
global_dirty_log_change.
abstract out dirty page rate calculation logic via
dirty-ring into function vcpu_calculate_dirtyrate.
abstract out mathematical dirty page rate calculation
into do_calculate_dirtyrate, decouple it from DirtyStat.
rename set_sample_page_period to dirty_stat_wait, which
is well-understood and will be reused in dirtylimit.
add cpu_list_lock to protect cpu list before walking
through it in case of race against cpu hotplug/unplug.
export util functions outside migration.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
include/sysemu/dirtyrate.h | 29 ++++++
migration/dirtyrate.c | 220 ++++++++++++++++++++++++++++-----------------
migration/dirtyrate.h | 7 +-
3 files changed, 171 insertions(+), 85 deletions(-)
create mode 100644 include/sysemu/dirtyrate.h
diff --git a/include/sysemu/dirtyrate.h b/include/sysemu/dirtyrate.h
new file mode 100644
index 0000000..cb6f02b
--- /dev/null
+++ b/include/sysemu/dirtyrate.h
@@ -0,0 +1,29 @@
+/*
+ * dirty page rate helper functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ * Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_DIRTYRATE_H
+#define QEMU_DIRTYRATE_H
+
+typedef struct VcpuStat {
+ int nvcpu; /* number of vcpu */
+ DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
+} VcpuStat;
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ int64_t init_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot);
+
+void global_dirty_log_change(unsigned int flag,
+ bool start);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index d65e744..1407455 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -46,7 +46,7 @@ static struct DirtyRateStat DirtyStat;
static DirtyRateMeasureMode dirtyrate_mode =
DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
-static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
+static int64_t dirty_stat_wait(int64_t msec, int64_t initial_time)
{
int64_t current_time;
@@ -60,6 +60,128 @@ static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
return msec;
}
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+ CPUState *cpu, bool start)
+{
+ if (start) {
+ dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+ } else {
+ dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+ }
+}
+
+static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
+ int64_t calc_time_ms)
+{
+ uint64_t memory_size_MB;
+ uint64_t increased_dirty_pages =
+ dirty_pages.end_pages - dirty_pages.start_pages;
+
+ memory_size_MB = (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+
+ return memory_size_MB * 1000 / calc_time_ms;
+}
+
+void global_dirty_log_change(unsigned int flag, bool start)
+{
+ qemu_mutex_lock_iothread();
+ if (start) {
+ memory_global_dirty_log_start(flag);
+ } else {
+ memory_global_dirty_log_stop(flag);
+ }
+ qemu_mutex_unlock_iothread();
+}
+
+/*
+ * global_dirty_log_sync
+ * 1. sync dirty log from kvm
+ * 2. stop dirty tracking if needed.
+ */
+static void global_dirty_log_sync(unsigned int flag, bool one_shot)
+{
+ qemu_mutex_lock_iothread();
+ memory_global_dirty_log_sync();
+ if (one_shot) {
+ memory_global_dirty_log_stop(flag);
+ }
+ qemu_mutex_unlock_iothread();
+}
+
+static DirtyPageRecord *vcpu_dirty_stat_alloc(VcpuStat *stat)
+{
+ CPUState *cpu;
+ DirtyPageRecord *records;
+ int nvcpu = 0;
+
+ CPU_FOREACH(cpu) {
+ nvcpu++;
+ }
+
+ stat->nvcpu = nvcpu;
+ stat->rates = g_malloc0(sizeof(DirtyRateVcpu) * nvcpu);
+
+ records = g_malloc0(sizeof(DirtyPageRecord) * nvcpu);
+
+ return records;
+}
+
+static void vcpu_dirty_stat_collect(VcpuStat *stat,
+ DirtyPageRecord *records,
+ bool start)
+{
+ CPUState *cpu;
+
+ CPU_FOREACH(cpu) {
+ if (!start && cpu->cpu_index >= stat->nvcpu) {
+ /*
+ * Never go there unless cpu is hot-plugged,
+ * just ignore in this case.
+ */
+ continue;
+ }
+ record_dirtypages(records, cpu, start);
+ }
+}
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ int64_t init_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot)
+{
+ DirtyPageRecord *records;
+ int64_t duration;
+ int64_t dirtyrate;
+ int i = 0;
+
+ cpu_list_lock();
+ records = vcpu_dirty_stat_alloc(stat);
+ vcpu_dirty_stat_collect(stat, records, true);
+ cpu_list_unlock();
+
+ duration = dirty_stat_wait(calc_time_ms, init_time_ms);
+
+ global_dirty_log_sync(flag, one_shot);
+
+ cpu_list_lock();
+ vcpu_dirty_stat_collect(stat, records, false);
+ cpu_list_unlock();
+
+ for (i = 0; i < stat->nvcpu; i++) {
+ dirtyrate = do_calculate_dirtyrate(records[i], duration);
+
+ stat->rates[i].id = i;
+ stat->rates[i].dirty_rate = dirtyrate;
+
+ trace_dirtyrate_do_calculate_vcpu(i, dirtyrate);
+ }
+
+ g_free(records);
+
+ return duration;
+}
+
static bool is_sample_period_valid(int64_t sec)
{
if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC ||
@@ -396,44 +518,6 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
return true;
}
-static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
- CPUState *cpu, bool start)
-{
- if (start) {
- dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
- } else {
- dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
- }
-}
-
-static void dirtyrate_global_dirty_log_start(void)
-{
- qemu_mutex_lock_iothread();
- memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE);
- qemu_mutex_unlock_iothread();
-}
-
-static void dirtyrate_global_dirty_log_stop(void)
-{
- qemu_mutex_lock_iothread();
- memory_global_dirty_log_sync();
- memory_global_dirty_log_stop(GLOBAL_DIRTY_DIRTY_RATE);
- qemu_mutex_unlock_iothread();
-}
-
-static int64_t do_calculate_dirtyrate_vcpu(DirtyPageRecord dirty_pages)
-{
- uint64_t memory_size_MB;
- int64_t time_s;
- uint64_t increased_dirty_pages =
- dirty_pages.end_pages - dirty_pages.start_pages;
-
- memory_size_MB = (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
- time_s = DirtyStat.calc_time;
-
- return memory_size_MB / time_s;
-}
-
static inline void record_dirtypages_bitmap(DirtyPageRecord *dirty_pages,
bool start)
{
@@ -444,11 +528,6 @@ static inline void record_dirtypages_bitmap(DirtyPageRecord *dirty_pages,
}
}
-static void do_calculate_dirtyrate_bitmap(DirtyPageRecord dirty_pages)
-{
- DirtyStat.dirty_rate = do_calculate_dirtyrate_vcpu(dirty_pages);
-}
-
static inline void dirtyrate_manual_reset_protect(void)
{
RAMBlock *block = NULL;
@@ -492,71 +571,52 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
DirtyStat.start_time = start_time / 1000;
msec = config.sample_period_seconds * 1000;
- msec = set_sample_page_period(msec, start_time);
+ msec = dirty_stat_wait(msec, start_time);
DirtyStat.calc_time = msec / 1000;
/*
- * dirtyrate_global_dirty_log_stop do two things.
+ * do two things.
* 1. fetch dirty bitmap from kvm
* 2. stop dirty tracking
*/
- dirtyrate_global_dirty_log_stop();
+ global_dirty_log_sync(GLOBAL_DIRTY_DIRTY_RATE, true);
record_dirtypages_bitmap(&dirty_pages, false);
- do_calculate_dirtyrate_bitmap(dirty_pages);
+ DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec);
}
static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config)
{
- CPUState *cpu;
- int64_t msec = 0;
int64_t start_time;
+ int64_t duration;
uint64_t dirtyrate = 0;
uint64_t dirtyrate_sum = 0;
- DirtyPageRecord *dirty_pages;
- int nvcpu = 0;
int i = 0;
- CPU_FOREACH(cpu) {
- nvcpu++;
- }
-
- dirty_pages = malloc(sizeof(*dirty_pages) * nvcpu);
-
- DirtyStat.dirty_ring.nvcpu = nvcpu;
- DirtyStat.dirty_ring.rates = malloc(sizeof(DirtyRateVcpu) * nvcpu);
-
- dirtyrate_global_dirty_log_start();
-
- CPU_FOREACH(cpu) {
- record_dirtypages(dirty_pages, cpu, true);
- }
+ /* start log sync */
+ global_dirty_log_change(GLOBAL_DIRTY_DIRTY_RATE, true);
start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
DirtyStat.start_time = start_time / 1000;
- msec = config.sample_period_seconds * 1000;
- msec = set_sample_page_period(msec, start_time);
- DirtyStat.calc_time = msec / 1000;
+ /* calculate vcpu dirtyrate */
+ duration = vcpu_calculate_dirtyrate(config.sample_period_seconds * 1000,
+ start_time,
+ &DirtyStat.dirty_ring,
+ GLOBAL_DIRTY_DIRTY_RATE,
+ true);
- dirtyrate_global_dirty_log_stop();
-
- CPU_FOREACH(cpu) {
- record_dirtypages(dirty_pages, cpu, false);
- }
+ DirtyStat.calc_time = duration / 1000;
+ /* calculate vm dirtyrate */
for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) {
- dirtyrate = do_calculate_dirtyrate_vcpu(dirty_pages[i]);
- trace_dirtyrate_do_calculate_vcpu(i, dirtyrate);
-
- DirtyStat.dirty_ring.rates[i].id = i;
+ dirtyrate = DirtyStat.dirty_ring.rates[i].dirty_rate;
DirtyStat.dirty_ring.rates[i].dirty_rate = dirtyrate;
dirtyrate_sum += dirtyrate;
}
DirtyStat.dirty_rate = dirtyrate_sum;
- free(dirty_pages);
}
static void calculate_dirtyrate_sample_vm(struct DirtyRateConfig config)
@@ -574,7 +634,7 @@ static void calculate_dirtyrate_sample_vm(struct DirtyRateConfig config)
rcu_read_unlock();
msec = config.sample_period_seconds * 1000;
- msec = set_sample_page_period(msec, initial_time);
+ msec = dirty_stat_wait(msec, initial_time);
DirtyStat.start_time = initial_time / 1000;
DirtyStat.calc_time = msec / 1000;
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index 69d4c5b..594a5c0 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -13,6 +13,8 @@
#ifndef QEMU_MIGRATION_DIRTYRATE_H
#define QEMU_MIGRATION_DIRTYRATE_H
+#include "sysemu/dirtyrate.h"
+
/*
* Sample 512 pages per GB as default.
*/
@@ -65,11 +67,6 @@ typedef struct SampleVMStat {
uint64_t total_block_mem_MB; /* size of total sampled pages in MB */
} SampleVMStat;
-typedef struct VcpuStat {
- int nvcpu; /* number of vcpu */
- DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
-} VcpuStat;
-
/*
* Store calculation statistics for each measure.
*/
--
1.8.3.1
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v11 2/4] softmmu/dirtylimit: implement vCPU dirtyrate calculation periodically
[not found] ` <cover.1641316375.git.huangy81@chinatelecom.cn>
2022-01-04 17:14 ` [PATCH v11 1/4] migration/dirtyrate: refactor dirty page rate calculation huangy81
@ 2022-01-04 17:14 ` huangy81
2022-01-17 2:31 ` Peter Xu
2022-01-04 17:14 ` [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle huangy81
2022-01-04 17:14 ` [PATCH v11 4/4] softmmu/dirtylimit: implement dirty page rate limit huangy81
3 siblings, 1 reply; 33+ messages in thread
From: huangy81 @ 2022-01-04 17:14 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, David Hildenbrand, Hyman, Juan Quintela,
Richard Henderson, Markus ArmBruster, Peter Xu,
Dr. David Alan Gilbert, Paolo Bonzini,
Philippe Mathieu-Daudé
From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty page
rate limit.
Add dirtylimit.c to implement dirtyrate calculation periodly,
which will be used for dirty page rate limit.
Add dirtylimit.h to export util functions for dirty page rate
limit implementation.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
include/exec/memory.h | 5 +-
include/sysemu/dirtylimit.h | 22 ++++++++
softmmu/dirtylimit.c | 120 ++++++++++++++++++++++++++++++++++++++++++++
softmmu/meson.build | 1 +
4 files changed, 147 insertions(+), 1 deletion(-)
create mode 100644 include/sysemu/dirtylimit.h
create mode 100644 softmmu/dirtylimit.c
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 20f1b27..606bec8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
/* Dirty tracking enabled because measuring dirty rate */
#define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
-#define GLOBAL_DIRTY_MASK (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT (1U << 2)
+
+#define GLOBAL_DIRTY_MASK (0x7)
extern unsigned int global_dirty_tracking;
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 0000000..da459f0
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,22 @@
+/*
+ * Dirty page rate limit common functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ * Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_TIME_MS 1000 /* 1000ms */
+
+int64_t vcpu_dirty_rate_get(int cpu_index);
+void vcpu_dirty_rate_stat_start(void);
+void vcpu_dirty_rate_stat_stop(void);
+void vcpu_dirty_rate_stat_initialize(void);
+void vcpu_dirty_rate_stat_finalize(void);
+#endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
new file mode 100644
index 0000000..a10ac6f
--- /dev/null
+++ b/softmmu/dirtylimit.c
@@ -0,0 +1,120 @@
+/*
+ * Dirty page rate limit implementation code
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ * Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/main-loop.h"
+#include "qapi/qapi-commands-migration.h"
+#include "sysemu/dirtyrate.h"
+#include "sysemu/dirtylimit.h"
+#include "exec/memory.h"
+#include "hw/boards.h"
+
+struct {
+ VcpuStat stat;
+ bool running;
+ QemuThread thread;
+} *vcpu_dirty_rate_stat;
+
+static void vcpu_dirty_rate_stat_collect(void)
+{
+ int64_t start_time;
+ VcpuStat stat;
+ int i = 0;
+
+ start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+ /* calculate vcpu dirtyrate */
+ vcpu_calculate_dirtyrate(DIRTYLIMIT_CALC_TIME_MS,
+ start_time,
+ &stat,
+ GLOBAL_DIRTY_LIMIT,
+ false);
+
+ for (i = 0; i < stat.nvcpu; i++) {
+ vcpu_dirty_rate_stat->stat.rates[i].id = i;
+ vcpu_dirty_rate_stat->stat.rates[i].dirty_rate =
+ stat.rates[i].dirty_rate;
+ }
+
+ free(stat.rates);
+}
+
+static void *vcpu_dirty_rate_stat_thread(void *opaque)
+{
+ rcu_register_thread();
+
+ /* start log sync */
+ global_dirty_log_change(GLOBAL_DIRTY_LIMIT, true);
+
+ while (qatomic_read(&vcpu_dirty_rate_stat->running)) {
+ vcpu_dirty_rate_stat_collect();
+ }
+
+ /* stop log sync */
+ global_dirty_log_change(GLOBAL_DIRTY_LIMIT, false);
+
+ rcu_unregister_thread();
+ return NULL;
+}
+
+int64_t vcpu_dirty_rate_get(int cpu_index)
+{
+ DirtyRateVcpu *rates = vcpu_dirty_rate_stat->stat.rates;
+ return qatomic_read(&rates[cpu_index].dirty_rate);
+}
+
+void vcpu_dirty_rate_stat_start(void)
+{
+ if (qatomic_read(&vcpu_dirty_rate_stat->running)) {
+ return;
+ }
+
+ qatomic_set(&vcpu_dirty_rate_stat->running, 1);
+ qemu_thread_create(&vcpu_dirty_rate_stat->thread,
+ "dirtyrate-stat",
+ vcpu_dirty_rate_stat_thread,
+ NULL,
+ QEMU_THREAD_JOINABLE);
+}
+
+void vcpu_dirty_rate_stat_stop(void)
+{
+ qatomic_set(&vcpu_dirty_rate_stat->running, 0);
+ qemu_mutex_unlock_iothread();
+ qemu_thread_join(&vcpu_dirty_rate_stat->thread);
+ qemu_mutex_lock_iothread();
+}
+
+void vcpu_dirty_rate_stat_initialize(void)
+{
+ MachineState *ms = MACHINE(qdev_get_machine());
+ int max_cpus = ms->smp.max_cpus;
+
+ vcpu_dirty_rate_stat =
+ g_malloc0(sizeof(*vcpu_dirty_rate_stat));
+
+ vcpu_dirty_rate_stat->stat.nvcpu = max_cpus;
+ vcpu_dirty_rate_stat->stat.rates =
+ g_malloc0(sizeof(DirtyRateVcpu) * max_cpus);
+
+ vcpu_dirty_rate_stat->running = false;
+}
+
+void vcpu_dirty_rate_stat_finalize(void)
+{
+ free(vcpu_dirty_rate_stat->stat.rates);
+ vcpu_dirty_rate_stat->stat.rates = NULL;
+
+ free(vcpu_dirty_rate_stat);
+ vcpu_dirty_rate_stat = NULL;
+}
diff --git a/softmmu/meson.build b/softmmu/meson.build
index d8e0301..95029a5 100644
--- a/softmmu/meson.build
+++ b/softmmu/meson.build
@@ -15,6 +15,7 @@ specific_ss.add(when: 'CONFIG_SOFTMMU', if_true: [files(
'vl.c',
'cpu-timers.c',
'runstate-action.c',
+ 'dirtylimit.c',
)])
specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TCG'], if_true: [files(
--
1.8.3.1
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
[not found] ` <cover.1641316375.git.huangy81@chinatelecom.cn>
2022-01-04 17:14 ` [PATCH v11 1/4] migration/dirtyrate: refactor dirty page rate calculation huangy81
2022-01-04 17:14 ` [PATCH v11 2/4] softmmu/dirtylimit: implement vCPU dirtyrate calculation periodically huangy81
@ 2022-01-04 17:14 ` huangy81
2022-01-17 7:32 ` Peter Xu
` (2 more replies)
2022-01-04 17:14 ` [PATCH v11 4/4] softmmu/dirtylimit: implement dirty page rate limit huangy81
3 siblings, 3 replies; 33+ messages in thread
From: huangy81 @ 2022-01-04 17:14 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, David Hildenbrand, Hyman, Juan Quintela,
Richard Henderson, Markus ArmBruster, Peter Xu,
Dr. David Alan Gilbert, Paolo Bonzini,
Philippe Mathieu-Daudé
From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Setup a negative feedback system when vCPU thread
handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
throttle_us_per_full field in struct CPUState. Sleep
throttle_us_per_full microseconds to throttle vCPU
if dirtylimit is enabled.
Start a thread to track current dirty page rates and
tune the throttle_us_per_full dynamically untill current
dirty page rate reach the quota.
Introduce the util function in the header for dirtylimit
implementation.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
accel/kvm/kvm-all.c | 8 +
include/hw/core/cpu.h | 6 +
include/sysemu/dirtylimit.h | 12 ++
include/sysemu/kvm.h | 2 +
qapi/migration.json | 19 +++
softmmu/dirtylimit.c | 357 ++++++++++++++++++++++++++++++++++++++++++++
softmmu/trace-events | 8 +
7 files changed, 412 insertions(+)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0e66ebb..908d954 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -45,6 +45,7 @@
#include "qemu/guest-random.h"
#include "sysemu/hw_accel.h"
#include "kvm-cpus.h"
+#include "sysemu/dirtylimit.h"
#include "hw/boards.h"
@@ -476,6 +477,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
cpu->kvm_state = s;
cpu->vcpu_dirty = true;
cpu->dirty_pages = 0;
+ cpu->throttle_us_per_full = 0;
mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
if (mmap_size < 0) {
@@ -2309,6 +2311,11 @@ bool kvm_dirty_ring_enabled(void)
return kvm_state->kvm_dirty_ring_size ? true : false;
}
+uint32_t kvm_dirty_ring_size(void)
+{
+ return kvm_state->kvm_dirty_ring_size;
+}
+
static int kvm_init(MachineState *ms)
{
MachineClass *mc = MACHINE_GET_CLASS(ms);
@@ -2958,6 +2965,7 @@ int kvm_cpu_exec(CPUState *cpu)
qemu_mutex_lock_iothread();
kvm_dirty_ring_reap(kvm_state);
qemu_mutex_unlock_iothread();
+ dirtylimit_vcpu_execute(cpu);
ret = 0;
break;
case KVM_EXIT_SYSTEM_EVENT:
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index e948e81..9631c1e 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -411,6 +411,12 @@ struct CPUState {
*/
bool throttle_thread_scheduled;
+ /*
+ * Sleep throttle_us_per_full microseconds once dirty ring is full
+ * if dirty page rate limit is enabled.
+ */
+ int64_t throttle_us_per_full;
+
bool ignore_memory_transaction_failures;
struct hax_vcpu_state *hax_vcpu;
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
index da459f0..6eadd16 100644
--- a/include/sysemu/dirtylimit.h
+++ b/include/sysemu/dirtylimit.h
@@ -19,4 +19,16 @@ void vcpu_dirty_rate_stat_start(void);
void vcpu_dirty_rate_stat_stop(void);
void vcpu_dirty_rate_stat_initialize(void);
void vcpu_dirty_rate_stat_finalize(void);
+
+void dirtylimit_state_initialize(void);
+void dirtylimit_state_finalize(void);
+void dirtylimit_thread_finalize(void);
+bool dirtylimit_in_service(void);
+bool dirtylimit_vcpu_index_valid(int cpu_index);
+void dirtylimit_set_vcpu(int cpu_index,
+ uint64_t quota,
+ bool enable);
+void dirtylimit_set_all(uint64_t quota,
+ bool enable);
+void dirtylimit_vcpu_execute(CPUState *cpu);
#endif
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 6eb39a0..bc3f0b5 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -563,4 +563,6 @@ bool kvm_cpu_check_are_resettable(void);
bool kvm_arch_cpu_check_are_resettable(void);
bool kvm_dirty_ring_enabled(void);
+
+uint32_t kvm_dirty_ring_size(void);
#endif
diff --git a/qapi/migration.json b/qapi/migration.json
index bbfd48c..ac5fa56 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1850,6 +1850,25 @@
{ 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
##
+# @DirtyLimitInfo:
+#
+# Dirty page rate limit information of virtual CPU.
+#
+# @cpu-index: index of virtual CPU.
+#
+# @limit-rate: upper limit of dirty page rate for virtual CPU.
+#
+# @current-rate: current dirty page rate for virtual CPU.
+#
+# Since: 7.0
+#
+##
+{ 'struct': 'DirtyLimitInfo',
+ 'data': { 'cpu-index': 'int',
+ 'limit-rate': 'int64',
+ 'current-rate': 'int64' } }
+
+##
# @snapshot-save:
#
# Save a VM snapshot
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index a10ac6f..c9f5745 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -18,6 +18,26 @@
#include "sysemu/dirtylimit.h"
#include "exec/memory.h"
#include "hw/boards.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
+/*
+ * Plus or minus vcpu sleep time linearly if dirty
+ * page rate error value percentage over
+ * DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT.
+ * Otherwise, plus or minus a fixed vcpu sleep time.
+ */
+#define DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT 50
+/*
+ * Max vcpu sleep time percentage during a cycle
+ * composed of dirty ring full and sleep time.
+ */
+#define DIRTYLIMIT_THROTTLE_PCT_MAX 99
struct {
VcpuStat stat;
@@ -25,6 +45,36 @@ struct {
QemuThread thread;
} *vcpu_dirty_rate_stat;
+typedef struct VcpuDirtyLimitState {
+ int cpu_index;
+ bool enabled;
+ /*
+ * Quota dirty page rate, unit is MB/s
+ * zero if not enabled.
+ */
+ uint64_t quota;
+ /*
+ * How many times that the current dirty page
+ * rate unmatch the quota dirty page rate.
+ */
+ int unmatched_cnt;
+} VcpuDirtyLimitState;
+
+struct {
+ VcpuDirtyLimitState *states;
+ /* Max cpus number configured by user */
+ int max_cpus;
+ /* Number of vcpu under dirtylimit */
+ int limited_nvcpu;
+} *dirtylimit_state;
+
+/* protect dirtylimit_state */
+static QemuMutex dirtylimit_mutex;
+static QemuThread dirtylimit_thr;
+
+/* dirtylimit thread quit if dirtylimit_quit is true */
+static bool dirtylimit_quit;
+
static void vcpu_dirty_rate_stat_collect(void)
{
int64_t start_time;
@@ -118,3 +168,310 @@ void vcpu_dirty_rate_stat_finalize(void)
free(vcpu_dirty_rate_stat);
vcpu_dirty_rate_stat = NULL;
}
+
+static void dirtylimit_state_lock(void)
+{
+ qemu_mutex_lock(&dirtylimit_mutex);
+}
+
+static void dirtylimit_state_unlock(void)
+{
+ qemu_mutex_unlock(&dirtylimit_mutex);
+}
+
+static void
+__attribute__((__constructor__)) dirtylimit_mutex_init(void)
+{
+ qemu_mutex_init(&dirtylimit_mutex);
+}
+
+static inline VcpuDirtyLimitState *dirtylimit_vcpu_get_state(int cpu_index)
+{
+ return &dirtylimit_state->states[cpu_index];
+}
+
+void dirtylimit_state_initialize(void)
+{
+ MachineState *ms = MACHINE(qdev_get_machine());
+ int max_cpus = ms->smp.max_cpus;
+ int i;
+
+ dirtylimit_state = g_malloc0(sizeof(*dirtylimit_state));
+
+ dirtylimit_state->states =
+ g_malloc0(sizeof(VcpuDirtyLimitState) * max_cpus);
+
+ for (i = 0; i < max_cpus; i++) {
+ dirtylimit_state->states[i].cpu_index = i;
+ }
+
+ dirtylimit_state->max_cpus = max_cpus;
+ trace_dirtylimit_state_initialize(max_cpus);
+}
+
+void dirtylimit_state_finalize(void)
+{
+ free(dirtylimit_state->states);
+ dirtylimit_state->states = NULL;
+
+ free(dirtylimit_state);
+ dirtylimit_state = NULL;
+
+ trace_dirtylimit_state_finalize();
+}
+
+bool dirtylimit_in_service(void)
+{
+ return !!dirtylimit_state;
+}
+
+bool dirtylimit_vcpu_index_valid(int cpu_index)
+{
+ MachineState *ms = MACHINE(qdev_get_machine());
+
+ return !(cpu_index < 0 ||
+ cpu_index >= ms->smp.max_cpus);
+}
+
+static inline void dirtylimit_vcpu_set_quota(int cpu_index,
+ uint64_t quota,
+ bool on)
+{
+ dirtylimit_state->states[cpu_index].quota = quota;
+ if (on) {
+ if (!dirtylimit_vcpu_get_state(cpu_index)->enabled) {
+ dirtylimit_state->limited_nvcpu++;
+ }
+ } else {
+ if (dirtylimit_state->states[cpu_index].enabled) {
+ dirtylimit_state->limited_nvcpu--;
+ }
+ }
+
+ dirtylimit_state->states[cpu_index].enabled = on;
+}
+
+static inline int64_t dirtylimit_dirty_ring_full_time(uint64_t dirtyrate)
+{
+ static uint64_t max_dirtyrate;
+ uint32_t dirty_ring_size = kvm_dirty_ring_size();
+ uint64_t dirty_ring_size_meory_MB =
+ dirty_ring_size * TARGET_PAGE_SIZE >> 20;
+
+ if (max_dirtyrate < dirtyrate) {
+ max_dirtyrate = dirtyrate;
+ }
+
+ return dirty_ring_size_meory_MB * 1000000 / max_dirtyrate;
+}
+
+static inline bool dirtylimit_done(uint64_t quota,
+ uint64_t current)
+{
+ uint64_t min, max;
+
+ min = MIN(quota, current);
+ max = MAX(quota, current);
+
+ return ((max - min) <= DIRTYLIMIT_TOLERANCE_RANGE) ? true : false;
+}
+
+static inline bool
+dirtylimit_need_linear_adjustment(uint64_t quota,
+ uint64_t current)
+{
+ uint64_t min, max, pct;
+
+ min = MIN(quota, current);
+ max = MAX(quota, current);
+
+ pct = (max - min) * 100 / max;
+
+ return pct > DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT;
+}
+
+static void dirtylimit_set_throttle(CPUState *cpu,
+ uint64_t quota,
+ uint64_t current)
+{
+ int64_t ring_full_time_us = 0;
+ uint64_t sleep_pct = 0;
+ uint64_t throttle_us = 0;
+
+ ring_full_time_us = dirtylimit_dirty_ring_full_time(current);
+
+ if (dirtylimit_need_linear_adjustment(quota, current)) {
+ if (quota < current) {
+ sleep_pct = (current - quota) * 100 / current;
+ throttle_us =
+ ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
+ cpu->throttle_us_per_full += throttle_us;
+ } else {
+ sleep_pct = (quota - current) * 100 / quota;
+ throttle_us =
+ ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
+ cpu->throttle_us_per_full -= throttle_us;
+ }
+
+ trace_dirtylimit_throttle_pct(cpu->cpu_index,
+ sleep_pct,
+ throttle_us);
+ } else {
+ if (quota < current) {
+ cpu->throttle_us_per_full += ring_full_time_us / 10;
+ } else {
+ cpu->throttle_us_per_full -= ring_full_time_us / 10;
+ }
+ }
+
+ cpu->throttle_us_per_full = MIN(cpu->throttle_us_per_full,
+ ring_full_time_us * DIRTYLIMIT_THROTTLE_PCT_MAX);
+
+ cpu->throttle_us_per_full = MAX(cpu->throttle_us_per_full, 0);
+}
+
+static void dirtylimit_adjust_throttle(CPUState *cpu)
+{
+ uint64_t quota = 0;
+ uint64_t current = 0;
+ int cpu_index = cpu->cpu_index;
+
+ quota = dirtylimit_vcpu_get_state(cpu_index)->quota;
+ current = vcpu_dirty_rate_get(cpu_index);
+
+ if (current == 0 &&
+ dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt == 0) {
+ cpu->throttle_us_per_full = 0;
+ goto end;
+ } else if (++dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt
+ < 2) {
+ goto end;
+ } else if (dirtylimit_done(quota, current)) {
+ goto end;
+ } else {
+ dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt = 0;
+ dirtylimit_set_throttle(cpu, quota, current);
+ }
+end:
+ trace_dirtylimit_adjust_throttle(cpu_index,
+ quota, current,
+ cpu->throttle_us_per_full);
+ return;
+}
+
+static void *dirtylimit_thread(void *opaque)
+{
+ CPUState *cpu;
+
+ rcu_register_thread();
+
+ while (!qatomic_read(&dirtylimit_quit)) {
+ sleep(DIRTYLIMIT_CALC_TIME_MS / 1000);
+
+ dirtylimit_state_lock();
+
+ if (!dirtylimit_in_service()) {
+ dirtylimit_state_unlock();
+ break;
+ }
+
+ CPU_FOREACH(cpu) {
+ if (!dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled) {
+ continue;
+ }
+ dirtylimit_adjust_throttle(cpu);
+ }
+ dirtylimit_state_unlock();
+ }
+
+ rcu_unregister_thread();
+
+ return NULL;
+}
+
+static void dirtylimit_thread_start(void)
+{
+ qatomic_set(&dirtylimit_quit, 0);
+ qemu_thread_create(&dirtylimit_thr,
+ "dirtylimit",
+ dirtylimit_thread,
+ NULL,
+ QEMU_THREAD_JOINABLE);
+}
+
+static void dirtylimit_thread_stop(void)
+{
+ qatomic_set(&dirtylimit_quit, 1);
+ qemu_mutex_unlock_iothread();
+ qemu_thread_join(&dirtylimit_thr);
+ qemu_mutex_lock_iothread();
+}
+
+void dirtylimit_set_vcpu(int cpu_index,
+ uint64_t quota,
+ bool enable)
+{
+ trace_dirtylimit_set_vcpu(cpu_index, quota);
+
+ if (enable) {
+ if (dirtylimit_in_service()) {
+ /* only set the vcpu dirty page rate limit */
+ dirtylimit_vcpu_set_quota(cpu_index, quota, true);
+ return;
+ }
+
+ /* initialize state when set dirtylimit first time */
+ dirtylimit_state_lock();
+ dirtylimit_state_initialize();
+ dirtylimit_vcpu_set_quota(cpu_index, quota, true);
+ dirtylimit_state_unlock();
+
+ dirtylimit_thread_start();
+ } else {
+ if (!dirtylimit_in_service()) {
+ return;
+ }
+
+ dirtylimit_state_lock();
+ /* dirty page rate limit is not enabled */
+ if (!dirtylimit_vcpu_get_state(cpu_index)->enabled) {
+ dirtylimit_state_unlock();
+ return;
+ }
+
+ /* switch off vcpu dirty page rate limit */
+ dirtylimit_vcpu_set_quota(cpu_index, 0, false);
+ dirtylimit_state_unlock();
+
+ if (!dirtylimit_state->limited_nvcpu) {
+ dirtylimit_thread_stop();
+
+ dirtylimit_state_lock();
+ dirtylimit_state_finalize();
+ dirtylimit_state_unlock();
+ }
+ }
+}
+
+void dirtylimit_set_all(uint64_t quota,
+ bool enable)
+{
+ MachineState *ms = MACHINE(qdev_get_machine());
+ int max_cpus = ms->smp.max_cpus;
+ int i;
+
+ for (i = 0; i < max_cpus; i++) {
+ dirtylimit_set_vcpu(i, quota, enable);
+ }
+}
+
+void dirtylimit_vcpu_execute(CPUState *cpu)
+{
+ if (dirtylimit_in_service() &&
+ dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled &&
+ cpu->throttle_us_per_full) {
+ trace_dirtylimit_vcpu_execute(cpu->cpu_index,
+ cpu->throttle_us_per_full);
+ usleep(cpu->throttle_us_per_full);
+ }
+}
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 9c88887..ff441ac 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -31,3 +31,11 @@ runstate_set(int current_state, const char *current_state_str, int new_state, co
system_wakeup_request(int reason) "reason=%d"
qemu_system_shutdown_request(int reason) "reason=%d"
qemu_system_powerdown_request(void) ""
+
+#dirtylimit.c
+dirtylimit_state_initialize(int max_cpus) "dirtylimit state initialize: max cpus %d"
+dirtylimit_state_finalize(void)
+dirtylimit_adjust_throttle(int cpu_index, uint64_t quota, uint64_t current, int64_t time_us) "CPU[%d] throttle: quota %" PRIu64 ", current %" PRIu64 ", throttle %"PRIi64 " us"
+dirtylimit_throttle_pct(int cpu_index, uint64_t pct, int64_t time_us) "CPU[%d] throttle percent: %" PRIu64 ", throttle adjust time %"PRIi64 " us"
+dirtylimit_set_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set dirty page rate limit %"PRIu64
+dirtylimit_vcpu_execute(int cpu_index, int64_t sleep_time_us) "CPU[%d] sleep %"PRIi64 " us"
--
1.8.3.1
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v11 4/4] softmmu/dirtylimit: implement dirty page rate limit
[not found] ` <cover.1641316375.git.huangy81@chinatelecom.cn>
` (2 preceding siblings ...)
2022-01-04 17:14 ` [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle huangy81
@ 2022-01-04 17:14 ` huangy81
2022-01-17 7:35 ` Peter Xu
2022-01-19 12:16 ` Markus Armbruster
3 siblings, 2 replies; 33+ messages in thread
From: huangy81 @ 2022-01-04 17:14 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, David Hildenbrand, Hyman, Juan Quintela,
Richard Henderson, Markus ArmBruster, Peter Xu,
Dr. David Alan Gilbert, Paolo Bonzini,
Philippe Mathieu-Daudé
From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Implement dirtyrate calculation periodically basing on
dirty-ring and throttle virtual CPU until it reachs the quota
dirty page rate given by user.
Introduce qmp commands "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit"
to enable, disable, query dirty page limit for virtual CPU.
Meanwhile, introduce corresponding hmp commands
"set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit",
"info vcpu_dirty_limit" so the feature can be more usable.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
hmp-commands-info.hx | 13 ++++
hmp-commands.hx | 32 ++++++++++
include/monitor/hmp.h | 3 +
qapi/migration.json | 60 ++++++++++++++++++
softmmu/dirtylimit.c | 167 ++++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 275 insertions(+)
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 407a1da..5dd3001 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -863,6 +863,19 @@ SRST
Display the vcpu dirty rate information.
ERST
+ {
+ .name = "vcpu_dirty_limit",
+ .args_type = "",
+ .params = "",
+ .help = "show dirty page limit information of all vCPU",
+ .cmd = hmp_info_vcpu_dirty_limit,
+ },
+
+SRST
+ ``info vcpu_dirty_limit``
+ Display the vcpu dirty page limit information.
+ERST
+
#if defined(TARGET_I386)
{
.name = "sgx",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 70a9136..416982c 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1744,3 +1744,35 @@ ERST
"\n\t\t\t -b to specify dirty bitmap as method of calculation)",
.cmd = hmp_calc_dirty_rate,
},
+
+SRST
+``set_vcpu_dirty_limit``
+ Set dirty page rate limit on virtual CPU, the information about all the
+ virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+ command.
+ERST
+
+ {
+ .name = "set_vcpu_dirty_limit",
+ .args_type = "dirty_rate:l,cpu_index:l?",
+ .params = "dirty_rate [cpu_index]",
+ .help = "set dirty page rate limit, use cpu_index to set limit on a "
+ "\n\t\t specified virtual cpu",
+ .cmd = hmp_set_vcpu_dirty_limit,
+ },
+
+SRST
+``cancel_vcpu_dirty_limit``
+ Cancel dirty page rate limit on virtual CPU, the information about all the
+ virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+ command.
+ERST
+
+ {
+ .name = "cancel_vcpu_dirty_limit",
+ .args_type = "cpu_index:l?",
+ .params = "[cpu_index]",
+ .help = "cancel dirty page rate limit, use cpu_index to cancel limit "
+ "\n\t\t on a specified virtual cpu",
+ .cmd = hmp_cancel_vcpu_dirty_limit,
+ },
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 96d0148..478820e 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -131,6 +131,9 @@ void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
void hmp_replay_seek(Monitor *mon, const QDict *qdict);
void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
+void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_cancel_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
void hmp_human_readable_text_helper(Monitor *mon,
HumanReadableText *(*qmp_handler)(Error **));
diff --git a/qapi/migration.json b/qapi/migration.json
index ac5fa56..9d406f4 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1869,6 +1869,66 @@
'current-rate': 'int64' } }
##
+# @set-vcpu-dirty-limit:
+#
+# Set the upper limit of dirty page rate for virtual CPU.
+#
+# Requires KVM with accelerator property "dirty-ring-size" set.
+# A virtual CPU's dirty page rate is a measure of its memory load.
+# To observe dirty page rates, use @calc-dirty-rate.
+#
+# @cpu-index: index of virtual CPU, default is all.
+#
+# @dirty-rate: upper limit of dirty page rate for virtual CPU.
+#
+# Since: 7.0
+#
+# Example:
+# {"execute": "set-vcpu-dirty-limit"}
+# "arguments": { "dirty-rate": 200,
+# "cpu-index": 1 } }
+#
+##
+{ 'command': 'set-vcpu-dirty-limit',
+ 'data': { '*cpu-index': 'uint64',
+ 'dirty-rate': 'uint64' } }
+
+##
+# @cancel-vcpu-dirty-limit:
+#
+# Cancel the upper limit of dirty page rate for virtual CPU.
+#
+# Cancel the dirty page limit for the vCPU which has been set with
+# set-vcpu-dirty-limit command. Note that this command requires
+# support from dirty ring, same as the "set-vcpu-dirty-limit".
+#
+# @cpu-index: index of virtual CPU, default is all.
+#
+# Since: 7.0
+#
+# Example:
+# {"execute": "cancel-vcpu-dirty-limit"}
+# "arguments": { "cpu-index": 1 } }
+#
+##
+{ 'command': 'cancel-vcpu-dirty-limit',
+ 'data': { '*cpu-index': 'uint64'} }
+
+##
+# @query-vcpu-dirty-limit:
+#
+# Returns information about all virtual CPU dirty limit if enabled.
+#
+# Since: 7.0
+#
+# Example:
+# {"execute": "query-vcpu-dirty-limit"}
+#
+##
+{ 'command': 'query-vcpu-dirty-limit',
+ 'returns': [ 'DirtyLimitInfo' ] }
+
+##
# @snapshot-save:
#
# Save a VM snapshot
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index c9f5745..071f3b9 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -14,8 +14,12 @@
#include "qapi/error.h"
#include "qemu/main-loop.h"
#include "qapi/qapi-commands-migration.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/error.h"
#include "sysemu/dirtyrate.h"
#include "sysemu/dirtylimit.h"
+#include "monitor/hmp.h"
+#include "monitor/monitor.h"
#include "exec/memory.h"
#include "hw/boards.h"
#include "sysemu/kvm.h"
@@ -475,3 +479,166 @@ void dirtylimit_vcpu_execute(CPUState *cpu)
usleep(cpu->throttle_us_per_full);
}
}
+
+void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
+ uint64_t cpu_index,
+ uint64_t dirty_rate,
+ Error **errp)
+{
+ if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
+ error_setg(errp, "dirty page limit feature requires KVM with"
+ " accelerator property 'dirty-ring-size' set'");
+ return;
+ }
+
+ if (!dirtylimit_in_service()) {
+ vcpu_dirty_rate_stat_initialize();
+ vcpu_dirty_rate_stat_start();
+ }
+
+ if (has_cpu_index && !dirtylimit_vcpu_index_valid(cpu_index)) {
+ error_setg(errp, "incorrect cpu index specified");
+ return;
+ }
+
+ if (has_cpu_index) {
+ dirtylimit_set_vcpu(cpu_index, dirty_rate, true);
+ } else {
+ dirtylimit_set_all(dirty_rate, true);
+ }
+}
+
+void qmp_cancel_vcpu_dirty_limit(bool has_cpu_index,
+ uint64_t cpu_index,
+ Error **errp)
+{
+ if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
+ return;
+ }
+
+ if (!dirtylimit_in_service()) {
+ return;
+ }
+
+ if (has_cpu_index && !dirtylimit_vcpu_index_valid(cpu_index)) {
+ error_setg(errp, "incorrect cpu index specified");
+ return;
+ }
+
+ if (has_cpu_index) {
+ dirtylimit_set_vcpu(cpu_index, 0, false);
+ } else {
+ dirtylimit_set_all(0, false);
+ }
+
+ if (!dirtylimit_in_service()) {
+ vcpu_dirty_rate_stat_stop();
+ vcpu_dirty_rate_stat_finalize();
+ }
+}
+
+void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict *qdict)
+{
+ int64_t dirty_rate = qdict_get_int(qdict, "dirty_rate");
+ int64_t cpu_index = qdict_get_try_int(qdict, "cpu_index", -1);
+ Error *err = NULL;
+
+ qmp_set_vcpu_dirty_limit(!!(cpu_index != -1), cpu_index, dirty_rate, &err);
+ if (err) {
+ hmp_handle_error(mon, err);
+ return;
+ }
+
+ monitor_printf(mon, "[Please use 'info vcpu_dirty_limit' to query "
+ "dirty limit for virtual CPU]\n");
+}
+
+void hmp_cancel_vcpu_dirty_limit(Monitor *mon, const QDict *qdict)
+{
+ int64_t cpu_index = qdict_get_try_int(qdict, "cpu_index", -1);
+ Error *err = NULL;
+
+ qmp_cancel_vcpu_dirty_limit(!!(cpu_index != -1), cpu_index, &err);
+ if (err) {
+ hmp_handle_error(mon, err);
+ return;
+ }
+
+ monitor_printf(mon, "[Please use 'info vcpu_dirty_limit' to query "
+ "dirty limit for virtual CPU]\n");
+}
+
+static struct DirtyLimitInfo *dirtylimit_query_vcpu(int cpu_index)
+{
+ DirtyLimitInfo *info = NULL;
+
+ info = g_malloc0(sizeof(*info));
+ info->cpu_index = cpu_index;
+ info->limit_rate = dirtylimit_vcpu_get_state(cpu_index)->quota;
+ info->current_rate = vcpu_dirty_rate_get(cpu_index);
+
+ return info;
+}
+
+static struct DirtyLimitInfoList *dirtylimit_query_all(void)
+{
+ int i, index;
+ DirtyLimitInfo *info = NULL;
+ DirtyLimitInfoList *head = NULL, **tail = &head;
+
+ dirtylimit_state_lock();
+
+ if (!dirtylimit_in_service()) {
+ return NULL;
+ }
+
+ for (i = 0; i < dirtylimit_state->max_cpus; i++) {
+ index = dirtylimit_state->states[i].cpu_index;
+ if (dirtylimit_vcpu_get_state(index)->enabled) {
+ info = dirtylimit_query_vcpu(index);
+ QAPI_LIST_APPEND(tail, info);
+ }
+ }
+
+ dirtylimit_state_unlock();
+
+ return head;
+}
+
+struct DirtyLimitInfoList *qmp_query_vcpu_dirty_limit(Error **errp)
+{
+ if (!dirtylimit_in_service()) {
+ error_setg(errp, "dirty page limit not enabled");
+ return NULL;
+ }
+
+ return dirtylimit_query_all();
+}
+
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict)
+{
+ DirtyLimitInfoList *limit, *head, *info = NULL;
+ Error *err = NULL;
+
+ if (!dirtylimit_in_service()) {
+ monitor_printf(mon, "Dirty page limit not enabled!\n");
+ return;
+ }
+
+ info = qmp_query_vcpu_dirty_limit(&err);
+ if (err) {
+ hmp_handle_error(mon, err);
+ return;
+ }
+
+ head = info;
+ for (limit = head; limit != NULL; limit = limit->next) {
+ monitor_printf(mon, "vcpu[%"PRIi64"], limit rate %"PRIi64 " (MB/s),"
+ " current rate %"PRIi64 " (MB/s)\n",
+ limit->value->cpu_index,
+ limit->value->limit_rate,
+ limit->value->current_rate);
+ }
+
+ g_free(info);
+}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v11 1/4] migration/dirtyrate: refactor dirty page rate calculation
2022-01-04 17:14 ` [PATCH v11 1/4] migration/dirtyrate: refactor dirty page rate calculation huangy81
@ 2022-01-17 2:19 ` Peter Xu
2022-01-22 3:22 ` Hyman Huang
0 siblings, 1 reply; 33+ messages in thread
From: Peter Xu @ 2022-01-17 2:19 UTC (permalink / raw)
To: huangy81
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Wed, Jan 05, 2022 at 01:14:06AM +0800, huangy81@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>
> abstract out dirty log change logic into function
> global_dirty_log_change.
>
> abstract out dirty page rate calculation logic via
> dirty-ring into function vcpu_calculate_dirtyrate.
>
> abstract out mathematical dirty page rate calculation
> into do_calculate_dirtyrate, decouple it from DirtyStat.
>
> rename set_sample_page_period to dirty_stat_wait, which
> is well-understood and will be reused in dirtylimit.
>
> add cpu_list_lock to protect cpu list before walking
> through it in case of race against cpu hotplug/unplug.
>
> export util functions outside migration.
>
> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> ---
> include/sysemu/dirtyrate.h | 29 ++++++
> migration/dirtyrate.c | 220 ++++++++++++++++++++++++++++-----------------
> migration/dirtyrate.h | 7 +-
> 3 files changed, 171 insertions(+), 85 deletions(-)
> create mode 100644 include/sysemu/dirtyrate.h
>
> diff --git a/include/sysemu/dirtyrate.h b/include/sysemu/dirtyrate.h
> new file mode 100644
> index 0000000..cb6f02b
> --- /dev/null
> +++ b/include/sysemu/dirtyrate.h
> @@ -0,0 +1,29 @@
> +/*
> + * dirty page rate helper functions
> + *
> + * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
> + *
> + * Authors:
> + * Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_DIRTYRATE_H
> +#define QEMU_DIRTYRATE_H
> +
> +typedef struct VcpuStat {
> + int nvcpu; /* number of vcpu */
> + DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
> +} VcpuStat;
> +
> +int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
> + int64_t init_time_ms,
> + VcpuStat *stat,
> + unsigned int flag,
> + bool one_shot);
> +
> +void global_dirty_log_change(unsigned int flag,
> + bool start);
> +#endif
> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> index d65e744..1407455 100644
> --- a/migration/dirtyrate.c
> +++ b/migration/dirtyrate.c
> @@ -46,7 +46,7 @@ static struct DirtyRateStat DirtyStat;
> static DirtyRateMeasureMode dirtyrate_mode =
> DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
>
> -static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
> +static int64_t dirty_stat_wait(int64_t msec, int64_t initial_time)
> {
> int64_t current_time;
>
> @@ -60,6 +60,128 @@ static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
> return msec;
> }
>
> +static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
> + CPUState *cpu, bool start)
> +{
> + if (start) {
> + dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
> + } else {
> + dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
> + }
> +}
> +
> +static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
> + int64_t calc_time_ms)
> +{
> + uint64_t memory_size_MB;
> + uint64_t increased_dirty_pages =
> + dirty_pages.end_pages - dirty_pages.start_pages;
> +
> + memory_size_MB = (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
> +
> + return memory_size_MB * 1000 / calc_time_ms;
> +}
> +
> +void global_dirty_log_change(unsigned int flag, bool start)
> +{
> + qemu_mutex_lock_iothread();
> + if (start) {
> + memory_global_dirty_log_start(flag);
> + } else {
> + memory_global_dirty_log_stop(flag);
> + }
> + qemu_mutex_unlock_iothread();
> +}
> +
> +/*
> + * global_dirty_log_sync
> + * 1. sync dirty log from kvm
> + * 2. stop dirty tracking if needed.
> + */
> +static void global_dirty_log_sync(unsigned int flag, bool one_shot)
> +{
> + qemu_mutex_lock_iothread();
> + memory_global_dirty_log_sync();
> + if (one_shot) {
> + memory_global_dirty_log_stop(flag);
> + }
> + qemu_mutex_unlock_iothread();
> +}
> +
> +static DirtyPageRecord *vcpu_dirty_stat_alloc(VcpuStat *stat)
> +{
> + CPUState *cpu;
> + DirtyPageRecord *records;
> + int nvcpu = 0;
> +
> + CPU_FOREACH(cpu) {
> + nvcpu++;
> + }
> +
> + stat->nvcpu = nvcpu;
> + stat->rates = g_malloc0(sizeof(DirtyRateVcpu) * nvcpu);
> +
> + records = g_malloc0(sizeof(DirtyPageRecord) * nvcpu);
> +
> + return records;
> +}
> +
> +static void vcpu_dirty_stat_collect(VcpuStat *stat,
> + DirtyPageRecord *records,
> + bool start)
> +{
> + CPUState *cpu;
> +
> + CPU_FOREACH(cpu) {
> + if (!start && cpu->cpu_index >= stat->nvcpu) {
> + /*
> + * Never go there unless cpu is hot-plugged,
> + * just ignore in this case.
> + */
> + continue;
> + }
As commented before, I think the only way to do it right is does not allow cpu
plug/unplug during measurement..
Say, even if index didn't get out of range, an unplug even should generate very
stange output of the unplugged cpu. Please see more below.
> + record_dirtypages(records, cpu, start);
> + }
> +}
> +
> +int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
> + int64_t init_time_ms,
> + VcpuStat *stat,
> + unsigned int flag,
> + bool one_shot)
> +{
> + DirtyPageRecord *records;
> + int64_t duration;
> + int64_t dirtyrate;
> + int i = 0;
> +
> + cpu_list_lock();
> + records = vcpu_dirty_stat_alloc(stat);
> + vcpu_dirty_stat_collect(stat, records, true);
> + cpu_list_unlock();
Continue with above - then I'm wondering whether we should just keep taking the
lock until vcpu_dirty_stat_collect().
Yes we could be taking the lock for a long time because of the sleep, but the
main thread plug thread will just wait for it to complete and it is at least
not a e.g. deadlock.
The other solution is we do cpu_list_unlock() like this, but introduce another
cpu_list_generation_id and boost it after any plug/unplug of cpu, aka, when cpu
list changes.
Then we record cpu generation ID at the entry of this function and retry the
whole measurement if at some point we found generation ID changed (we need to
fetch the gen ID after having the lock, of course). That could avoid us taking
the cpu list lock during dirty_stat_wait(), but it'll start to complicate cpu
list locking rules.
The simpler way is still just to take the lock, imho.
The rest looks good, thanks.
> +
> + duration = dirty_stat_wait(calc_time_ms, init_time_ms);
> +
> + global_dirty_log_sync(flag, one_shot);
> +
> + cpu_list_lock();
> + vcpu_dirty_stat_collect(stat, records, false);
> + cpu_list_unlock();
> +
> + for (i = 0; i < stat->nvcpu; i++) {
> + dirtyrate = do_calculate_dirtyrate(records[i], duration);
> +
> + stat->rates[i].id = i;
> + stat->rates[i].dirty_rate = dirtyrate;
> +
> + trace_dirtyrate_do_calculate_vcpu(i, dirtyrate);
> + }
> +
> + g_free(records);
> +
> + return duration;
> +}
> +
> static bool is_sample_period_valid(int64_t sec)
> {
> if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC ||
> @@ -396,44 +518,6 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
> return true;
> }
>
> -static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
> - CPUState *cpu, bool start)
> -{
> - if (start) {
> - dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
> - } else {
> - dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
> - }
> -}
> -
> -static void dirtyrate_global_dirty_log_start(void)
> -{
> - qemu_mutex_lock_iothread();
> - memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE);
> - qemu_mutex_unlock_iothread();
> -}
> -
> -static void dirtyrate_global_dirty_log_stop(void)
> -{
> - qemu_mutex_lock_iothread();
> - memory_global_dirty_log_sync();
> - memory_global_dirty_log_stop(GLOBAL_DIRTY_DIRTY_RATE);
> - qemu_mutex_unlock_iothread();
> -}
> -
> -static int64_t do_calculate_dirtyrate_vcpu(DirtyPageRecord dirty_pages)
> -{
> - uint64_t memory_size_MB;
> - int64_t time_s;
> - uint64_t increased_dirty_pages =
> - dirty_pages.end_pages - dirty_pages.start_pages;
> -
> - memory_size_MB = (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
> - time_s = DirtyStat.calc_time;
> -
> - return memory_size_MB / time_s;
> -}
> -
> static inline void record_dirtypages_bitmap(DirtyPageRecord *dirty_pages,
> bool start)
> {
> @@ -444,11 +528,6 @@ static inline void record_dirtypages_bitmap(DirtyPageRecord *dirty_pages,
> }
> }
>
> -static void do_calculate_dirtyrate_bitmap(DirtyPageRecord dirty_pages)
> -{
> - DirtyStat.dirty_rate = do_calculate_dirtyrate_vcpu(dirty_pages);
> -}
> -
> static inline void dirtyrate_manual_reset_protect(void)
> {
> RAMBlock *block = NULL;
> @@ -492,71 +571,52 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
> DirtyStat.start_time = start_time / 1000;
>
> msec = config.sample_period_seconds * 1000;
> - msec = set_sample_page_period(msec, start_time);
> + msec = dirty_stat_wait(msec, start_time);
> DirtyStat.calc_time = msec / 1000;
>
> /*
> - * dirtyrate_global_dirty_log_stop do two things.
> + * do two things.
> * 1. fetch dirty bitmap from kvm
> * 2. stop dirty tracking
> */
> - dirtyrate_global_dirty_log_stop();
> + global_dirty_log_sync(GLOBAL_DIRTY_DIRTY_RATE, true);
>
> record_dirtypages_bitmap(&dirty_pages, false);
>
> - do_calculate_dirtyrate_bitmap(dirty_pages);
> + DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec);
> }
>
> static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config)
> {
> - CPUState *cpu;
> - int64_t msec = 0;
> int64_t start_time;
> + int64_t duration;
> uint64_t dirtyrate = 0;
> uint64_t dirtyrate_sum = 0;
> - DirtyPageRecord *dirty_pages;
> - int nvcpu = 0;
> int i = 0;
>
> - CPU_FOREACH(cpu) {
> - nvcpu++;
> - }
> -
> - dirty_pages = malloc(sizeof(*dirty_pages) * nvcpu);
> -
> - DirtyStat.dirty_ring.nvcpu = nvcpu;
> - DirtyStat.dirty_ring.rates = malloc(sizeof(DirtyRateVcpu) * nvcpu);
> -
> - dirtyrate_global_dirty_log_start();
> -
> - CPU_FOREACH(cpu) {
> - record_dirtypages(dirty_pages, cpu, true);
> - }
> + /* start log sync */
> + global_dirty_log_change(GLOBAL_DIRTY_DIRTY_RATE, true);
>
> start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> DirtyStat.start_time = start_time / 1000;
>
> - msec = config.sample_period_seconds * 1000;
> - msec = set_sample_page_period(msec, start_time);
> - DirtyStat.calc_time = msec / 1000;
> + /* calculate vcpu dirtyrate */
> + duration = vcpu_calculate_dirtyrate(config.sample_period_seconds * 1000,
> + start_time,
> + &DirtyStat.dirty_ring,
> + GLOBAL_DIRTY_DIRTY_RATE,
> + true);
>
> - dirtyrate_global_dirty_log_stop();
> -
> - CPU_FOREACH(cpu) {
> - record_dirtypages(dirty_pages, cpu, false);
> - }
> + DirtyStat.calc_time = duration / 1000;
>
> + /* calculate vm dirtyrate */
> for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) {
> - dirtyrate = do_calculate_dirtyrate_vcpu(dirty_pages[i]);
> - trace_dirtyrate_do_calculate_vcpu(i, dirtyrate);
> -
> - DirtyStat.dirty_ring.rates[i].id = i;
> + dirtyrate = DirtyStat.dirty_ring.rates[i].dirty_rate;
> DirtyStat.dirty_ring.rates[i].dirty_rate = dirtyrate;
> dirtyrate_sum += dirtyrate;
> }
>
> DirtyStat.dirty_rate = dirtyrate_sum;
> - free(dirty_pages);
> }
>
> static void calculate_dirtyrate_sample_vm(struct DirtyRateConfig config)
> @@ -574,7 +634,7 @@ static void calculate_dirtyrate_sample_vm(struct DirtyRateConfig config)
> rcu_read_unlock();
>
> msec = config.sample_period_seconds * 1000;
> - msec = set_sample_page_period(msec, initial_time);
> + msec = dirty_stat_wait(msec, initial_time);
> DirtyStat.start_time = initial_time / 1000;
> DirtyStat.calc_time = msec / 1000;
>
> diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
> index 69d4c5b..594a5c0 100644
> --- a/migration/dirtyrate.h
> +++ b/migration/dirtyrate.h
> @@ -13,6 +13,8 @@
> #ifndef QEMU_MIGRATION_DIRTYRATE_H
> #define QEMU_MIGRATION_DIRTYRATE_H
>
> +#include "sysemu/dirtyrate.h"
> +
> /*
> * Sample 512 pages per GB as default.
> */
> @@ -65,11 +67,6 @@ typedef struct SampleVMStat {
> uint64_t total_block_mem_MB; /* size of total sampled pages in MB */
> } SampleVMStat;
>
> -typedef struct VcpuStat {
> - int nvcpu; /* number of vcpu */
> - DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
> -} VcpuStat;
> -
> /*
> * Store calculation statistics for each measure.
> */
> --
> 1.8.3.1
>
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 2/4] softmmu/dirtylimit: implement vCPU dirtyrate calculation periodically
2022-01-04 17:14 ` [PATCH v11 2/4] softmmu/dirtylimit: implement vCPU dirtyrate calculation periodically huangy81
@ 2022-01-17 2:31 ` Peter Xu
0 siblings, 0 replies; 33+ messages in thread
From: Peter Xu @ 2022-01-17 2:31 UTC (permalink / raw)
To: huangy81
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Wed, Jan 05, 2022 at 01:14:07AM +0800, huangy81@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>
> Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
> tracking for calculate dirtyrate periodly for dirty page
> rate limit.
>
> Add dirtylimit.c to implement dirtyrate calculation periodly,
> which will be used for dirty page rate limit.
>
> Add dirtylimit.h to export util functions for dirty page rate
> limit implementation.
>
> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-04 17:14 ` [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle huangy81
@ 2022-01-17 7:32 ` Peter Xu
2022-01-17 14:00 ` Hyman Huang
` (4 more replies)
2022-01-17 9:01 ` Peter Xu
2022-01-19 12:16 ` Markus Armbruster
2 siblings, 5 replies; 33+ messages in thread
From: Peter Xu @ 2022-01-17 7:32 UTC (permalink / raw)
To: huangy81
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Wed, Jan 05, 2022 at 01:14:08AM +0800, huangy81@chinatelecom.cn wrote:
> ##
> +# @DirtyLimitInfo:
> +#
> +# Dirty page rate limit information of virtual CPU.
> +#
> +# @cpu-index: index of virtual CPU.
> +#
> +# @limit-rate: upper limit of dirty page rate for virtual CPU.
> +#
> +# @current-rate: current dirty page rate for virtual CPU.
Please consider spell out the unit too for all these dirty rate fields (MB/s).
> +#
> +# Since: 7.0
> +#
> +##
> +{ 'struct': 'DirtyLimitInfo',
> + 'data': { 'cpu-index': 'int',
> + 'limit-rate': 'int64',
> + 'current-rate': 'int64' } }
> +
> +##
> # @snapshot-save:
> #
> # Save a VM snapshot
> diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
> index a10ac6f..c9f5745 100644
> --- a/softmmu/dirtylimit.c
> +++ b/softmmu/dirtylimit.c
> @@ -18,6 +18,26 @@
> #include "sysemu/dirtylimit.h"
> #include "exec/memory.h"
> #include "hw/boards.h"
> +#include "sysemu/kvm.h"
> +#include "trace.h"
> +
> +/*
> + * Dirtylimit stop working if dirty page rate error
> + * value less than DIRTYLIMIT_TOLERANCE_RANGE
> + */
> +#define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
> +/*
> + * Plus or minus vcpu sleep time linearly if dirty
> + * page rate error value percentage over
> + * DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT.
> + * Otherwise, plus or minus a fixed vcpu sleep time.
> + */
> +#define DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT 50
> +/*
> + * Max vcpu sleep time percentage during a cycle
> + * composed of dirty ring full and sleep time.
> + */
> +#define DIRTYLIMIT_THROTTLE_PCT_MAX 99
(Thanks for the enriched comments)
> +static inline void dirtylimit_vcpu_set_quota(int cpu_index,
> + uint64_t quota,
> + bool on)
> +{
> + dirtylimit_state->states[cpu_index].quota = quota;
To be clear, we could move this line into the "(on)" if condition, then in !on
case we reset it.
> + if (on) {
> + if (!dirtylimit_vcpu_get_state(cpu_index)->enabled) {
> + dirtylimit_state->limited_nvcpu++;
> + }
> + } else {
> + if (dirtylimit_state->states[cpu_index].enabled) {
> + dirtylimit_state->limited_nvcpu--;
> + }
> + }
> +
> + dirtylimit_state->states[cpu_index].enabled = on;
> +}
> +
> +static inline int64_t dirtylimit_dirty_ring_full_time(uint64_t dirtyrate)
> +{
> + static uint64_t max_dirtyrate;
> + uint32_t dirty_ring_size = kvm_dirty_ring_size();
> + uint64_t dirty_ring_size_meory_MB =
> + dirty_ring_size * TARGET_PAGE_SIZE >> 20;
> +
> + if (max_dirtyrate < dirtyrate) {
> + max_dirtyrate = dirtyrate;
> + }
> +
> + return dirty_ring_size_meory_MB * 1000000 / max_dirtyrate;
> +}
> +
> +static inline bool dirtylimit_done(uint64_t quota,
> + uint64_t current)
> +{
> + uint64_t min, max;
> +
> + min = MIN(quota, current);
> + max = MAX(quota, current);
> +
> + return ((max - min) <= DIRTYLIMIT_TOLERANCE_RANGE) ? true : false;
> +}
> +
> +static inline bool
> +dirtylimit_need_linear_adjustment(uint64_t quota,
> + uint64_t current)
> +{
> + uint64_t min, max, pct;
> +
> + min = MIN(quota, current);
> + max = MAX(quota, current);
> +
> + pct = (max - min) * 100 / max;
> +
> + return pct > DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT;
> +}
> +
> +static void dirtylimit_set_throttle(CPUState *cpu,
> + uint64_t quota,
> + uint64_t current)
> +{
> + int64_t ring_full_time_us = 0;
> + uint64_t sleep_pct = 0;
> + uint64_t throttle_us = 0;
> +
> + ring_full_time_us = dirtylimit_dirty_ring_full_time(current);
> +
> + if (dirtylimit_need_linear_adjustment(quota, current)) {
> + if (quota < current) {
> + sleep_pct = (current - quota) * 100 / current;
> + throttle_us =
> + ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
> + cpu->throttle_us_per_full += throttle_us;
> + } else {
> + sleep_pct = (quota - current) * 100 / quota;
> + throttle_us =
> + ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
> + cpu->throttle_us_per_full -= throttle_us;
> + }
> +
> + trace_dirtylimit_throttle_pct(cpu->cpu_index,
> + sleep_pct,
> + throttle_us);
> + } else {
> + if (quota < current) {
> + cpu->throttle_us_per_full += ring_full_time_us / 10;
> + } else {
> + cpu->throttle_us_per_full -= ring_full_time_us / 10;
> + }
> + }
> +
> + cpu->throttle_us_per_full = MIN(cpu->throttle_us_per_full,
> + ring_full_time_us * DIRTYLIMIT_THROTTLE_PCT_MAX);
> +
> + cpu->throttle_us_per_full = MAX(cpu->throttle_us_per_full, 0);
> +}
This algorithm seems works even worse than the previous version, could you have
a look on what's wrong?
See how it fluctuates when I set a throttle of 300MB/s:
(QMP) set-vcpu-dirty-limit dirty-rate=300
Dirty rate: 17622 (MB/s), duration: 1000 (ms), load: 100.00%
Dirty rate: 17617 (MB/s), duration: 1000 (ms), load: 100.00%
Dirty rate: 17611 (MB/s), duration: 1000 (ms), load: 100.00%
Dirty rate: 13023 (MB/s), duration: 1153 (ms), load: 100.00%
Dirty rate: 923 (MB/s), duration: 1000 (ms), load: 100.00%
Dirty rate: 2853 (MB/s), duration: 1000 (ms), load: 100.00%
Dirty rate: 1963 (MB/s), duration: 1040 (ms), load: 100.00%
Dirty rate: 180 (MB/s), duration: 1006 (ms), load: 100.00%
Dirty rate: 182 (MB/s), duration: 1007 (ms), load: 100.00%
Dirty rate: 177 (MB/s), duration: 1005 (ms), load: 100.00%
Dirty rate: 181 (MB/s), duration: 1007 (ms), load: 100.00%
Dirty rate: 179 (MB/s), duration: 1006 (ms), load: 100.00%
Dirty rate: 168 (MB/s), duration: 1005 (ms), load: 100.00%
Dirty rate: 169 (MB/s), duration: 1006 (ms), load: 100.00%
Dirty rate: 2717 (MB/s), duration: 1000 (ms), load: 100.00%
Dirty rate: 2851 (MB/s), duration: 1000 (ms), load: 100.00%
Dirty rate: 1773 (MB/s), duration: 1021 (ms), load: 100.00%
Dirty rate: 177 (MB/s), duration: 1006 (ms), load: 100.00%
Dirty rate: 179 (MB/s), duration: 1006 (ms), load: 100.00%
Dirty rate: 175 (MB/s), duration: 1005 (ms), load: 100.00%
Dirty rate: 1973 (MB/s), duration: 1000 (ms), load: 100.00%
Dirty rate: 2878 (MB/s), duration: 1000 (ms), load: 100.00%
Dirty rate: 1690 (MB/s), duration: 1022 (ms), load: 100.00%
Dirty rate: 174 (MB/s), duration: 1005 (ms), load: 100.00%
Dirty rate: 184 (MB/s), duration: 1006 (ms), load: 100.00%
This is the tool I'm using:
https://github.com/xzpeter/mig_mon#memory-dirty
Again, I won't ask for a good algorithm as the 1st version, but then I think
it's nicer we have the simplest algorithm merged first, which should be very
easy to verify.
> +
> +static void dirtylimit_adjust_throttle(CPUState *cpu)
> +{
> + uint64_t quota = 0;
> + uint64_t current = 0;
> + int cpu_index = cpu->cpu_index;
> +
> + quota = dirtylimit_vcpu_get_state(cpu_index)->quota;
> + current = vcpu_dirty_rate_get(cpu_index);
> +
> + if (current == 0 &&
> + dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt == 0) {
> + cpu->throttle_us_per_full = 0;
> + goto end;
> + } else if (++dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt
> + < 2) {
> + goto end;
> + } else if (dirtylimit_done(quota, current)) {
> + goto end;
> + } else {
> + dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt = 0;
> + dirtylimit_set_throttle(cpu, quota, current);
> + }
> +end:
> + trace_dirtylimit_adjust_throttle(cpu_index,
> + quota, current,
> + cpu->throttle_us_per_full);
> + return;
> +}
> +
> +static void *dirtylimit_thread(void *opaque)
> +{
> + CPUState *cpu;
> +
> + rcu_register_thread();
> +
> + while (!qatomic_read(&dirtylimit_quit)) {
> + sleep(DIRTYLIMIT_CALC_TIME_MS / 1000);
Sorry to have not mentioned this: I think we probably don't even need this
dirtylimit thread.
It'll be hard to make the "sleep" right here.. you could read two identical
values from the dirty calc thread because the 1sec sleep is not accurate, so
even after this sleep() the calc thread may not have provided the latest number
yet.
It'll be much cleaner (and most importantly, accurate..) to me if we could make
this a hook function being passed over to the vcpu_dirty_rate_stat_thread()
thread, then after each vcpu_dirty_rate_stat_collect() we call the hook.
> +
> + dirtylimit_state_lock();
> +
> + if (!dirtylimit_in_service()) {
> + dirtylimit_state_unlock();
> + break;
> + }
> +
> + CPU_FOREACH(cpu) {
> + if (!dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled) {
> + continue;
> + }
> + dirtylimit_adjust_throttle(cpu);
> + }
> + dirtylimit_state_unlock();
> + }
> +
> + rcu_unregister_thread();
> +
> + return NULL;
> +}
> +
> +static void dirtylimit_thread_start(void)
> +{
> + qatomic_set(&dirtylimit_quit, 0);
> + qemu_thread_create(&dirtylimit_thr,
> + "dirtylimit",
> + dirtylimit_thread,
> + NULL,
> + QEMU_THREAD_JOINABLE);
> +}
> +
> +static void dirtylimit_thread_stop(void)
> +{
> + qatomic_set(&dirtylimit_quit, 1);
> + qemu_mutex_unlock_iothread();
> + qemu_thread_join(&dirtylimit_thr);
> + qemu_mutex_lock_iothread();
> +}
> +
> +void dirtylimit_set_vcpu(int cpu_index,
> + uint64_t quota,
> + bool enable)
> +{
> + trace_dirtylimit_set_vcpu(cpu_index, quota);
> +
> + if (enable) {
> + if (dirtylimit_in_service()) {
> + /* only set the vcpu dirty page rate limit */
> + dirtylimit_vcpu_set_quota(cpu_index, quota, true);
> + return;
> + }
> +
> + /* initialize state when set dirtylimit first time */
> + dirtylimit_state_lock();
> + dirtylimit_state_initialize();
> + dirtylimit_vcpu_set_quota(cpu_index, quota, true);
> + dirtylimit_state_unlock();
> +
> + dirtylimit_thread_start();
Can we move dirtylimit global initializations out of dirtylimit_set_vcpu() too?
We should always keep init/cleanup of dirty_rate_calc and dirtylimit together,
because they should, imho. We never enable one of them.
I commented similarly in previous version on this.
> + } else {
> + if (!dirtylimit_in_service()) {
> + return;
> + }
> +
> + dirtylimit_state_lock();
> + /* dirty page rate limit is not enabled */
> + if (!dirtylimit_vcpu_get_state(cpu_index)->enabled) {
> + dirtylimit_state_unlock();
> + return;
> + }
> +
> + /* switch off vcpu dirty page rate limit */
> + dirtylimit_vcpu_set_quota(cpu_index, 0, false);
> + dirtylimit_state_unlock();
> +
> + if (!dirtylimit_state->limited_nvcpu) {
> + dirtylimit_thread_stop();
> +
> + dirtylimit_state_lock();
> + dirtylimit_state_finalize();
> + dirtylimit_state_unlock();
We don't need such a fine control of locking, IMHO.. it can be a very big lock
just to serialize things..
IMHO it could be as simple as:
void dirtylimit_set_vcpu(int cpu_index,
uint64_t quota,
bool enable)
{
dirtylimit_vcpu_set_quota(cpu_index, quota, enable);
trace_dirtylimit_set_vcpu(cpu_index, quota);
}
void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
uint64_t cpu_index,
uint64_t dirty_rate,
Error **errp)
{
if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
error_setg(errp, "dirty page limit feature requires KVM with"
" accelerator property 'dirty-ring-size' set'");
return;
}
if (has_cpu_index && !dirtylimit_vcpu_index_valid(cpu_index)) {
error_setg(errp, "incorrect cpu index specified");
return;
}
dirtylimit_state_lock();
if (!dirtylimit_in_service()) {
/* TODO: we could have one helper to initialize all of them */
vcpu_dirty_rate_stat_initialize();
vcpu_dirty_rate_stat_start();
dirtylimit_state_initialize();
dirtylimit_vcpu_set_quota(cpu_index, quota, true);
}
if (has_cpu_index) {
dirtylimit_set_vcpu(cpu_index, dirty_rate, true);
} else {
dirtylimit_set_all(dirty_rate, true);
}
dirtylimit_state_unlock();
}
I didn't write the cleanup path, but it's the same: we should only cleanup all
the global structs in cancel-vcpu-dirty-limit when we found there's zero vcpus
in track, and it should be done once there.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 4/4] softmmu/dirtylimit: implement dirty page rate limit
2022-01-04 17:14 ` [PATCH v11 4/4] softmmu/dirtylimit: implement dirty page rate limit huangy81
@ 2022-01-17 7:35 ` Peter Xu
2022-01-19 12:16 ` Markus Armbruster
1 sibling, 0 replies; 33+ messages in thread
From: Peter Xu @ 2022-01-17 7:35 UTC (permalink / raw)
To: huangy81
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Wed, Jan 05, 2022 at 01:14:09AM +0800, huangy81@chinatelecom.cn wrote:
> ##
> +# @set-vcpu-dirty-limit:
> +#
> +# Set the upper limit of dirty page rate for virtual CPU.
> +#
> +# Requires KVM with accelerator property "dirty-ring-size" set.
> +# A virtual CPU's dirty page rate is a measure of its memory load.
> +# To observe dirty page rates, use @calc-dirty-rate.
> +#
> +# @cpu-index: index of virtual CPU, default is all.
> +#
> +# @dirty-rate: upper limit of dirty page rate for virtual CPU.
Same here, we could add the unit of dirty rate (MB/s), and in all the rest of
the places around dirty rate.
> +void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
> + uint64_t cpu_index,
> + uint64_t dirty_rate,
> + Error **errp)
> +{
> + if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
> + error_setg(errp, "dirty page limit feature requires KVM with"
> + " accelerator property 'dirty-ring-size' set'");
> + return;
> + }
> +
> + if (!dirtylimit_in_service()) {
> + vcpu_dirty_rate_stat_initialize();
> + vcpu_dirty_rate_stat_start();
> + }
> +
> + if (has_cpu_index && !dirtylimit_vcpu_index_valid(cpu_index)) {
> + error_setg(errp, "incorrect cpu index specified");
> + return;
> + }
This could be moved before above vcpu_dirty_rate_stat_initialize() calls. I've
left some comments in previous patch on this; please refer to the pesudo code.
The rest looks good to me.
Thanks,
> +
> + if (has_cpu_index) {
> + dirtylimit_set_vcpu(cpu_index, dirty_rate, true);
> + } else {
> + dirtylimit_set_all(dirty_rate, true);
> + }
> +}
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 0/4] support dirty restraint on vCPU
2022-01-04 17:14 [PATCH v11 0/4] support dirty restraint on vCPU huangy81
[not found] ` <cover.1641316375.git.huangy81@chinatelecom.cn>
@ 2022-01-17 8:54 ` Peter Xu
1 sibling, 0 replies; 33+ messages in thread
From: Peter Xu @ 2022-01-17 8:54 UTC (permalink / raw)
To: huangy81
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Wed, Jan 05, 2022 at 01:14:05AM +0800, huangy81@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>
> v11
> - rebase on master
> - add a commit " refactor dirty page rate calculation" so that dirty page rate limit
> can reuse the calculation logic.
> - handle the cpu hotplug/unplug case in the dirty page rate calculation logic.
> - modify the qmp commands according to Markus's advice.
> - introduce a standalone file dirtylimit.c to implement dirty page rate limit
> - check if dirty limit in service by dirtylimit_state pointer instead of global variable
> - introduce dirtylimit_mutex to protect dirtylimit_state
> - do some code clean and docs
Yong,
Would you consider picking up this patch too alongside?
https://lore.kernel.org/qemu-devel/20211130080028.6474-1-peterx@redhat.com/
I didn't explicitly test cancel of dirty limit during migration just now, but I
think it'll need it.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-04 17:14 ` [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle huangy81
2022-01-17 7:32 ` Peter Xu
@ 2022-01-17 9:01 ` Peter Xu
2022-01-19 12:16 ` Markus Armbruster
2 siblings, 0 replies; 33+ messages in thread
From: Peter Xu @ 2022-01-17 9:01 UTC (permalink / raw)
To: huangy81
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Wed, Jan 05, 2022 at 01:14:08AM +0800, huangy81@chinatelecom.cn wrote:
> +uint32_t kvm_dirty_ring_size(void)
> +{
> + return kvm_state->kvm_dirty_ring_size;
> +}
You may need to touch up stub too to fix build on non-x86:
=============
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 5319573e00..1128cb2928 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -152,4 +152,9 @@ bool kvm_dirty_ring_enabled(void)
{
return false;
}
+
+uint32_t kvm_dirty_ring_size(void)
+{
+ return 0;
+}
#endif
=============
--
Peter Xu
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-17 7:32 ` Peter Xu
@ 2022-01-17 14:00 ` Hyman Huang
2022-01-18 1:00 ` Peter Xu
2022-01-20 8:26 ` Hyman Huang
` (3 subsequent siblings)
4 siblings, 1 reply; 33+ messages in thread
From: Hyman Huang @ 2022-01-17 14:00 UTC (permalink / raw)
To: Peter Xu
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On 1/17/22 15:32, Peter Xu wrote:
> On Wed, Jan 05, 2022 at 01:14:08AM +0800, huangy81@chinatelecom.cn wrote:
>> ##
>> +# @DirtyLimitInfo:
>> +#
>> +# Dirty page rate limit information of virtual CPU.
>> +#
>> +# @cpu-index: index of virtual CPU.
>> +#
>> +# @limit-rate: upper limit of dirty page rate for virtual CPU.
>> +#
>> +# @current-rate: current dirty page rate for virtual CPU.
>
> Please consider spell out the unit too for all these dirty rate fields (MB/s).
Ok
>
>> +#
>> +# Since: 7.0
>> +#
>> +##
>> +{ 'struct': 'DirtyLimitInfo',
>> + 'data': { 'cpu-index': 'int',
>> + 'limit-rate': 'int64',
>> + 'current-rate': 'int64' } }
>> +
>> +##
>> # @snapshot-save:
>> #
>> # Save a VM snapshot
>> diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
>> index a10ac6f..c9f5745 100644
>> --- a/softmmu/dirtylimit.c
>> +++ b/softmmu/dirtylimit.c
>> @@ -18,6 +18,26 @@
>> #include "sysemu/dirtylimit.h"
>> #include "exec/memory.h"
>> #include "hw/boards.h"
>> +#include "sysemu/kvm.h"
>> +#include "trace.h"
>> +
>> +/*
>> + * Dirtylimit stop working if dirty page rate error
>> + * value less than DIRTYLIMIT_TOLERANCE_RANGE
>> + */
>> +#define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
>> +/*
>> + * Plus or minus vcpu sleep time linearly if dirty
>> + * page rate error value percentage over
>> + * DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT.
>> + * Otherwise, plus or minus a fixed vcpu sleep time.
>> + */
>> +#define DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT 50
>> +/*
>> + * Max vcpu sleep time percentage during a cycle
>> + * composed of dirty ring full and sleep time.
>> + */
>> +#define DIRTYLIMIT_THROTTLE_PCT_MAX 99
>
> (Thanks for the enriched comments)
>
>> +static inline void dirtylimit_vcpu_set_quota(int cpu_index,
>> + uint64_t quota,
>> + bool on)
>> +{
>> + dirtylimit_state->states[cpu_index].quota = quota;
>
> To be clear, we could move this line into the "(on)" if condition, then in !on
> case we reset it.
>
>> + if (on) {
>> + if (!dirtylimit_vcpu_get_state(cpu_index)->enabled) {
>> + dirtylimit_state->limited_nvcpu++;
>> + }
>> + } else {
>> + if (dirtylimit_state->states[cpu_index].enabled) {
>> + dirtylimit_state->limited_nvcpu--;
>> + }
>> + }
>> +
>> + dirtylimit_state->states[cpu_index].enabled = on;
>> +}
>> +
>> +static inline int64_t dirtylimit_dirty_ring_full_time(uint64_t dirtyrate)
>> +{
>> + static uint64_t max_dirtyrate;
>> + uint32_t dirty_ring_size = kvm_dirty_ring_size();
>> + uint64_t dirty_ring_size_meory_MB =
>> + dirty_ring_size * TARGET_PAGE_SIZE >> 20;
>> +
>> + if (max_dirtyrate < dirtyrate) {
>> + max_dirtyrate = dirtyrate;
>> + }
>> +
>> + return dirty_ring_size_meory_MB * 1000000 / max_dirtyrate;
>> +}
>> +
>> +static inline bool dirtylimit_done(uint64_t quota,
>> + uint64_t current)
>> +{
>> + uint64_t min, max;
>> +
>> + min = MIN(quota, current);
>> + max = MAX(quota, current);
>> +
>> + return ((max - min) <= DIRTYLIMIT_TOLERANCE_RANGE) ? true : false;
>> +}
>> +
>> +static inline bool
>> +dirtylimit_need_linear_adjustment(uint64_t quota,
>> + uint64_t current)
>> +{
>> + uint64_t min, max, pct;
>> +
>> + min = MIN(quota, current);
>> + max = MAX(quota, current);
>> +
>> + pct = (max - min) * 100 / max;
>> +
>> + return pct > DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT;
>> +}
>> +
>> +static void dirtylimit_set_throttle(CPUState *cpu,
>> + uint64_t quota,
>> + uint64_t current)
>> +{
>> + int64_t ring_full_time_us = 0;
>> + uint64_t sleep_pct = 0;
>> + uint64_t throttle_us = 0;
>> +
>> + ring_full_time_us = dirtylimit_dirty_ring_full_time(current);
>> +
>> + if (dirtylimit_need_linear_adjustment(quota, current)) {
>> + if (quota < current) {
>> + sleep_pct = (current - quota) * 100 / current;
>> + throttle_us =
>> + ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
>> + cpu->throttle_us_per_full += throttle_us;
>> + } else {
>> + sleep_pct = (quota - current) * 100 / quota;
>> + throttle_us =
>> + ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
>> + cpu->throttle_us_per_full -= throttle_us;
>> + }
>> +
>> + trace_dirtylimit_throttle_pct(cpu->cpu_index,
>> + sleep_pct,
>> + throttle_us);
>> + } else {
>> + if (quota < current) {
>> + cpu->throttle_us_per_full += ring_full_time_us / 10;
>> + } else {
>> + cpu->throttle_us_per_full -= ring_full_time_us / 10;
>> + }
>> + }
>> +
>> + cpu->throttle_us_per_full = MIN(cpu->throttle_us_per_full,
>> + ring_full_time_us * DIRTYLIMIT_THROTTLE_PCT_MAX);
>> +
>> + cpu->throttle_us_per_full = MAX(cpu->throttle_us_per_full, 0);
>> +}
>
> This algorithm seems works even worse than the previous version, could you have
> a look on what's wrong?
What number the dirty-ring-size of qemu be configured? is it the same as
previous version test?
>
> See how it fluctuates when I set a throttle of 300MB/s:
>
> (QMP) set-vcpu-dirty-limit dirty-rate=300
>
> Dirty rate: 17622 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 17617 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 17611 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 13023 (MB/s), duration: 1153 (ms), load: 100.00%
> Dirty rate: 923 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 2853 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 1963 (MB/s), duration: 1040 (ms), load: 100.00%
> Dirty rate: 180 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 182 (MB/s), duration: 1007 (ms), load: 100.00%
> Dirty rate: 177 (MB/s), duration: 1005 (ms), load: 100.00%
> Dirty rate: 181 (MB/s), duration: 1007 (ms), load: 100.00%
> Dirty rate: 179 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 168 (MB/s), duration: 1005 (ms), load: 100.00%
> Dirty rate: 169 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 2717 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 2851 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 1773 (MB/s), duration: 1021 (ms), load: 100.00%
> Dirty rate: 177 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 179 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 175 (MB/s), duration: 1005 (ms), load: 100.00%
> Dirty rate: 1973 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 2878 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 1690 (MB/s), duration: 1022 (ms), load: 100.00%
> Dirty rate: 174 (MB/s), duration: 1005 (ms), load: 100.00%
> Dirty rate: 184 (MB/s), duration: 1006 (ms), load: 100.00%
>
> This is the tool I'm using:
>
> https://github.com/xzpeter/mig_mon#memory-dirty
>
> Again, I won't ask for a good algorithm as the 1st version, but then I think
> it's nicer we have the simplest algorithm merged first, which should be very
> easy to verify.
>
>> +
>> +static void dirtylimit_adjust_throttle(CPUState *cpu)
>> +{
>> + uint64_t quota = 0;
>> + uint64_t current = 0;
>> + int cpu_index = cpu->cpu_index;
>> +
>> + quota = dirtylimit_vcpu_get_state(cpu_index)->quota;
>> + current = vcpu_dirty_rate_get(cpu_index);
>> +
>> + if (current == 0 &&
>> + dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt == 0) {
>> + cpu->throttle_us_per_full = 0;
>> + goto end;
>> + } else if (++dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt
>> + < 2) {
>> + goto end;
>> + } else if (dirtylimit_done(quota, current)) {
>> + goto end;
>> + } else {
>> + dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt = 0;
>> + dirtylimit_set_throttle(cpu, quota, current);
>> + }
>> +end:
>> + trace_dirtylimit_adjust_throttle(cpu_index,
>> + quota, current,
>> + cpu->throttle_us_per_full);
>> + return;
>> +}
>> +
>> +static void *dirtylimit_thread(void *opaque)
>> +{
>> + CPUState *cpu;
>> +
>> + rcu_register_thread();
>> +
>> + while (!qatomic_read(&dirtylimit_quit)) {
>> + sleep(DIRTYLIMIT_CALC_TIME_MS / 1000);
>
> Sorry to have not mentioned this: I think we probably don't even need this
> dirtylimit thread.
>
> It'll be hard to make the "sleep" right here.. you could read two identical
> values from the dirty calc thread because the 1sec sleep is not accurate, so
> even after this sleep() the calc thread may not have provided the latest number
> yet.
>
> It'll be much cleaner (and most importantly, accurate..) to me if we could make
> this a hook function being passed over to the vcpu_dirty_rate_stat_thread()
> thread, then after each vcpu_dirty_rate_stat_collect() we call the hook.
>
>> +
>> + dirtylimit_state_lock();
>> +
>> + if (!dirtylimit_in_service()) {
>> + dirtylimit_state_unlock();
>> + break;
>> + }
>> +
>> + CPU_FOREACH(cpu) {
>> + if (!dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled) {
>> + continue;
>> + }
>> + dirtylimit_adjust_throttle(cpu);
>> + }
>> + dirtylimit_state_unlock();
>> + }
>> +
>> + rcu_unregister_thread();
>> +
>> + return NULL;
>> +}
>> +
>> +static void dirtylimit_thread_start(void)
>> +{
>> + qatomic_set(&dirtylimit_quit, 0);
>> + qemu_thread_create(&dirtylimit_thr,
>> + "dirtylimit",
>> + dirtylimit_thread,
>> + NULL,
>> + QEMU_THREAD_JOINABLE);
>> +}
>> +
>> +static void dirtylimit_thread_stop(void)
>> +{
>> + qatomic_set(&dirtylimit_quit, 1);
>> + qemu_mutex_unlock_iothread();
>> + qemu_thread_join(&dirtylimit_thr);
>> + qemu_mutex_lock_iothread();
>> +}
>> +
>> +void dirtylimit_set_vcpu(int cpu_index,
>> + uint64_t quota,
>> + bool enable)
>> +{
>> + trace_dirtylimit_set_vcpu(cpu_index, quota);
>> +
>> + if (enable) {
>> + if (dirtylimit_in_service()) {
>> + /* only set the vcpu dirty page rate limit */
>> + dirtylimit_vcpu_set_quota(cpu_index, quota, true);
>> + return;
>> + }
>> +
>> + /* initialize state when set dirtylimit first time */
>> + dirtylimit_state_lock();
>> + dirtylimit_state_initialize();
>> + dirtylimit_vcpu_set_quota(cpu_index, quota, true);
>> + dirtylimit_state_unlock();
>> +
>> + dirtylimit_thread_start();
>
> Can we move dirtylimit global initializations out of dirtylimit_set_vcpu() too?
> We should always keep init/cleanup of dirty_rate_calc and dirtylimit together,
> because they should, imho. We never enable one of them.
>
> I commented similarly in previous version on this.
>
>> + } else {
>> + if (!dirtylimit_in_service()) {
>> + return;
>> + }
>> +
>> + dirtylimit_state_lock();
>> + /* dirty page rate limit is not enabled */
>> + if (!dirtylimit_vcpu_get_state(cpu_index)->enabled) {
>> + dirtylimit_state_unlock();
>> + return;
>> + }
>> +
>> + /* switch off vcpu dirty page rate limit */
>> + dirtylimit_vcpu_set_quota(cpu_index, 0, false);
>> + dirtylimit_state_unlock();
>> +
>> + if (!dirtylimit_state->limited_nvcpu) {
>> + dirtylimit_thread_stop();
>> +
>> + dirtylimit_state_lock();
>> + dirtylimit_state_finalize();
>> + dirtylimit_state_unlock();
>
> We don't need such a fine control of locking, IMHO.. it can be a very big lock
> just to serialize things..
>
> IMHO it could be as simple as:
>
> void dirtylimit_set_vcpu(int cpu_index,
> uint64_t quota,
> bool enable)
> {
> dirtylimit_vcpu_set_quota(cpu_index, quota, enable);
> trace_dirtylimit_set_vcpu(cpu_index, quota);
> }
>
> void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
> uint64_t cpu_index,
> uint64_t dirty_rate,
> Error **errp)
> {
> if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
> error_setg(errp, "dirty page limit feature requires KVM with"
> " accelerator property 'dirty-ring-size' set'");
> return;
> }
>
> if (has_cpu_index && !dirtylimit_vcpu_index_valid(cpu_index)) {
> error_setg(errp, "incorrect cpu index specified");
> return;
> }
>
> dirtylimit_state_lock();
>
> if (!dirtylimit_in_service()) {
> /* TODO: we could have one helper to initialize all of them */
> vcpu_dirty_rate_stat_initialize();
> vcpu_dirty_rate_stat_start();
> dirtylimit_state_initialize();
> dirtylimit_vcpu_set_quota(cpu_index, quota, true);
> }
>
> if (has_cpu_index) {
> dirtylimit_set_vcpu(cpu_index, dirty_rate, true);
> } else {
> dirtylimit_set_all(dirty_rate, true);
> }
>
> dirtylimit_state_unlock();
> }
>
> I didn't write the cleanup path, but it's the same: we should only cleanup all
> the global structs in cancel-vcpu-dirty-limit when we found there's zero vcpus
> in track, and it should be done once there.
>
> Thanks,
>
Sound good.
--
Best Regards
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-17 14:00 ` Hyman Huang
@ 2022-01-18 1:00 ` Peter Xu
2022-01-18 2:08 ` Hyman Huang
0 siblings, 1 reply; 33+ messages in thread
From: Peter Xu @ 2022-01-18 1:00 UTC (permalink / raw)
To: Hyman Huang
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Mon, Jan 17, 2022 at 10:00:57PM +0800, Hyman Huang wrote:
> > This algorithm seems works even worse than the previous version, could you have
> > a look on what's wrong?
> What number the dirty-ring-size of qemu be configured? is it the same as
> previous version test?
It should be the same 4096.
The test environment can be slightly different, I used a larger guest this time
(20G, 40 cores), though. Previously it should be a few gig only with a few cores.
Side note: would you also consider picking up this patch along with the series?
https://lore.kernel.org/qemu-devel/20211130080028.6474-1-peterx@redhat.com/
I wanted to make sure it lands before this series, e.g., when enabled both
dirty limit and migration, disabling dirty limit might trigger the bug already.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-18 1:00 ` Peter Xu
@ 2022-01-18 2:08 ` Hyman Huang
0 siblings, 0 replies; 33+ messages in thread
From: Hyman Huang @ 2022-01-18 2:08 UTC (permalink / raw)
To: Peter Xu
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
在 2022/1/18 9:00, Peter Xu 写道:
> On Mon, Jan 17, 2022 at 10:00:57PM +0800, Hyman Huang wrote:
>>> This algorithm seems works even worse than the previous version, could you have
>>> a look on what's wrong?
>> What number the dirty-ring-size of qemu be configured? is it the same as
>> previous version test?
>
> It should be the same 4096.
>
> The test environment can be slightly different, I used a larger guest this time
> (20G, 40 cores), though. Previously it should be a few gig only with a few cores.
>
Ok, i'll work this out.
> Side note: would you also consider picking up this patch along with the series?
>
Of course yes, i think this patch can reduce time overhead of
memory_global_dirty_log_start/memory_global_dirty_log_stop but need some
migration tests, i'll do that and once all these be ready, i'll cherry
pick the series before dirtylimit patchset.
> https://lore.kernel.org/qemu-devel/20211130080028.6474-1-peterx@redhat.com/
>
> I wanted to make sure it lands before this series, e.g., when enabled both
> dirty limit and migration, disabling dirty limit might trigger the bug already.
> > Thanks,
>
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-04 17:14 ` [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle huangy81
2022-01-17 7:32 ` Peter Xu
2022-01-17 9:01 ` Peter Xu
@ 2022-01-19 12:16 ` Markus Armbruster
2022-01-20 11:22 ` Hyman Huang
2 siblings, 1 reply; 33+ messages in thread
From: Markus Armbruster @ 2022-01-19 12:16 UTC (permalink / raw)
To: huangy81
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
Paolo Bonzini, Philippe Mathieu-Daudé
huangy81@chinatelecom.cn writes:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>
> Setup a negative feedback system when vCPU thread
> handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
> throttle_us_per_full field in struct CPUState. Sleep
> throttle_us_per_full microseconds to throttle vCPU
> if dirtylimit is enabled.
>
> Start a thread to track current dirty page rates and
> tune the throttle_us_per_full dynamically untill current
> dirty page rate reach the quota.
>
> Introduce the util function in the header for dirtylimit
> implementation.
>
> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
[...]
> diff --git a/qapi/migration.json b/qapi/migration.json
> index bbfd48c..ac5fa56 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1850,6 +1850,25 @@
> { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
>
> ##
> +# @DirtyLimitInfo:
> +#
> +# Dirty page rate limit information of virtual CPU.
> +#
> +# @cpu-index: index of virtual CPU.
> +#
> +# @limit-rate: upper limit of dirty page rate for virtual CPU.
Lacks a unit. Is it bytes per second? pages per second?
If I understand your code correctly, zero means unlimited. This is
undocumented. Please document it. Something like "0 means unlimited"
should do.
> +#
> +# @current-rate: current dirty page rate for virtual CPU.
> +#
> +# Since: 7.0
> +#
> +##
> +{ 'struct': 'DirtyLimitInfo',
> + 'data': { 'cpu-index': 'int',
> + 'limit-rate': 'int64',
> + 'current-rate': 'int64' } }
The next patch uses 'uint64' for set-vcpu-dirty-limit argument
dirty-rate. Why signed here?
> +
> +##
> # @snapshot-save:
> #
> # Save a VM snapshot
[...]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 4/4] softmmu/dirtylimit: implement dirty page rate limit
2022-01-04 17:14 ` [PATCH v11 4/4] softmmu/dirtylimit: implement dirty page rate limit huangy81
2022-01-17 7:35 ` Peter Xu
@ 2022-01-19 12:16 ` Markus Armbruster
1 sibling, 0 replies; 33+ messages in thread
From: Markus Armbruster @ 2022-01-19 12:16 UTC (permalink / raw)
To: huangy81
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
Paolo Bonzini, Philippe Mathieu-Daudé
huangy81@chinatelecom.cn writes:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>
> Implement dirtyrate calculation periodically basing on
> dirty-ring and throttle virtual CPU until it reachs the quota
> dirty page rate given by user.
>
> Introduce qmp commands "set-vcpu-dirty-limit",
> "cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit"
> to enable, disable, query dirty page limit for virtual CPU.
>
> Meanwhile, introduce corresponding hmp commands
> "set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit",
> "info vcpu_dirty_limit" so the feature can be more usable.
>
> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
[...]
> diff --git a/qapi/migration.json b/qapi/migration.json
> index ac5fa56..9d406f4 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1869,6 +1869,66 @@
> 'current-rate': 'int64' } }
>
> ##
> +# @set-vcpu-dirty-limit:
> +#
> +# Set the upper limit of dirty page rate for virtual CPU.
"for virtual CPUs"
> +#
> +# Requires KVM with accelerator property "dirty-ring-size" set.
> +# A virtual CPU's dirty page rate is a measure of its memory load.
> +# To observe dirty page rates, use @calc-dirty-rate.
> +#
> +# @cpu-index: index of virtual CPU, default is all.
> +#
> +# @dirty-rate: upper limit of dirty page rate for virtual CPU.
Lacks a unit. Is it bytes per second? pages per second?
> +#
> +# Since: 7.0
> +#
> +# Example:
> +# {"execute": "set-vcpu-dirty-limit"}
> +# "arguments": { "dirty-rate": 200,
> +# "cpu-index": 1 } }
> +#
> +##
> +{ 'command': 'set-vcpu-dirty-limit',
> + 'data': { '*cpu-index': 'uint64',
Use 'int' for consistency with cpu-index arguments elsewhere.
> + 'dirty-rate': 'uint64' } }
> +
> +##
> +# @cancel-vcpu-dirty-limit:
> +#
> +# Cancel the upper limit of dirty page rate for virtual CPU.
"for virtual CPUs"
> +#
> +# Cancel the dirty page limit for the vCPU which has been set with
> +# set-vcpu-dirty-limit command. Note that this command requires
> +# support from dirty ring, same as the "set-vcpu-dirty-limit".
> +#
> +# @cpu-index: index of virtual CPU, default is all.
> +#
> +# Since: 7.0
> +#
> +# Example:
> +# {"execute": "cancel-vcpu-dirty-limit"}
> +# "arguments": { "cpu-index": 1 } }
> +#
> +##
> +{ 'command': 'cancel-vcpu-dirty-limit',
> + 'data': { '*cpu-index': 'uint64'} }
Use 'int' for consistency with cpu-index arguments elsewhere.
> +
> +##
> +# @query-vcpu-dirty-limit:
> +#
> +# Returns information about all virtual CPU dirty limit if enabled.
Suggest "about virtual CPU dirty page rate limits, if any".
> +#
> +# Since: 7.0
> +#
> +# Example:
> +# {"execute": "query-vcpu-dirty-limit"}
> +#
> +##
> +{ 'command': 'query-vcpu-dirty-limit',
> + 'returns': [ 'DirtyLimitInfo' ] }
> +
> +##
> # @snapshot-save:
> #
> # Save a VM snapshot
[...]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-17 7:32 ` Peter Xu
2022-01-17 14:00 ` Hyman Huang
@ 2022-01-20 8:26 ` Hyman Huang
2022-01-20 9:25 ` Peter Xu
2022-01-21 8:07 ` Hyman Huang
` (2 subsequent siblings)
4 siblings, 1 reply; 33+ messages in thread
From: Hyman Huang @ 2022-01-20 8:26 UTC (permalink / raw)
To: Peter Xu
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
在 2022/1/17 15:32, Peter Xu 写道:
> On Wed, Jan 05, 2022 at 01:14:08AM +0800, huangy81@chinatelecom.cn wrote:
>> ##
>> +# @DirtyLimitInfo:
>> +#
>> +# Dirty page rate limit information of virtual CPU.
>> +#
>> +# @cpu-index: index of virtual CPU.
>> +#
>> +# @limit-rate: upper limit of dirty page rate for virtual CPU.
>> +#
>> +# @current-rate: current dirty page rate for virtual CPU.
>
> Please consider spell out the unit too for all these dirty rate fields (MB/s).
>
>> +#
>> +# Since: 7.0
>> +#
>> +##
>> +{ 'struct': 'DirtyLimitInfo',
>> + 'data': { 'cpu-index': 'int',
>> + 'limit-rate': 'int64',
>> + 'current-rate': 'int64' } }
>> +
>> +##
>> # @snapshot-save:
>> #
>> # Save a VM snapshot
>> diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
>> index a10ac6f..c9f5745 100644
>> --- a/softmmu/dirtylimit.c
>> +++ b/softmmu/dirtylimit.c
>> @@ -18,6 +18,26 @@
>> #include "sysemu/dirtylimit.h"
>> #include "exec/memory.h"
>> #include "hw/boards.h"
>> +#include "sysemu/kvm.h"
>> +#include "trace.h"
>> +
>> +/*
>> + * Dirtylimit stop working if dirty page rate error
>> + * value less than DIRTYLIMIT_TOLERANCE_RANGE
>> + */
>> +#define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
>> +/*
>> + * Plus or minus vcpu sleep time linearly if dirty
>> + * page rate error value percentage over
>> + * DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT.
>> + * Otherwise, plus or minus a fixed vcpu sleep time.
>> + */
>> +#define DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT 50
>> +/*
>> + * Max vcpu sleep time percentage during a cycle
>> + * composed of dirty ring full and sleep time.
>> + */
>> +#define DIRTYLIMIT_THROTTLE_PCT_MAX 99
>
> (Thanks for the enriched comments)
>
>> +static inline void dirtylimit_vcpu_set_quota(int cpu_index,
>> + uint64_t quota,
>> + bool on)
>> +{
>> + dirtylimit_state->states[cpu_index].quota = quota;
>
> To be clear, we could move this line into the "(on)" if condition, then in !on
> case we reset it.
>
>> + if (on) {
>> + if (!dirtylimit_vcpu_get_state(cpu_index)->enabled) {
>> + dirtylimit_state->limited_nvcpu++;
>> + }
>> + } else {
>> + if (dirtylimit_state->states[cpu_index].enabled) {
>> + dirtylimit_state->limited_nvcpu--;
>> + }
>> + }
>> +
>> + dirtylimit_state->states[cpu_index].enabled = on;
>> +}
>> +
>> +static inline int64_t dirtylimit_dirty_ring_full_time(uint64_t dirtyrate)
>> +{
>> + static uint64_t max_dirtyrate;
>> + uint32_t dirty_ring_size = kvm_dirty_ring_size();
>> + uint64_t dirty_ring_size_meory_MB =
>> + dirty_ring_size * TARGET_PAGE_SIZE >> 20;
>> +
>> + if (max_dirtyrate < dirtyrate) {
>> + max_dirtyrate = dirtyrate;
>> + }
>> +
>> + return dirty_ring_size_meory_MB * 1000000 / max_dirtyrate;
>> +}
>> +
>> +static inline bool dirtylimit_done(uint64_t quota,
>> + uint64_t current)
>> +{
>> + uint64_t min, max;
>> +
>> + min = MIN(quota, current);
>> + max = MAX(quota, current);
>> +
>> + return ((max - min) <= DIRTYLIMIT_TOLERANCE_RANGE) ? true : false;
>> +}
>> +
>> +static inline bool
>> +dirtylimit_need_linear_adjustment(uint64_t quota,
>> + uint64_t current)
>> +{
>> + uint64_t min, max, pct;
>> +
>> + min = MIN(quota, current);
>> + max = MAX(quota, current);
>> +
>> + pct = (max - min) * 100 / max;
>> +
>> + return pct > DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT;
>> +}
>> +
>> +static void dirtylimit_set_throttle(CPUState *cpu,
>> + uint64_t quota,
>> + uint64_t current)
>> +{
>> + int64_t ring_full_time_us = 0;
>> + uint64_t sleep_pct = 0;
>> + uint64_t throttle_us = 0;
>> +
>> + ring_full_time_us = dirtylimit_dirty_ring_full_time(current);
>> +
>> + if (dirtylimit_need_linear_adjustment(quota, current)) {
>> + if (quota < current) {
>> + sleep_pct = (current - quota) * 100 / current;
>> + throttle_us =
>> + ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
>> + cpu->throttle_us_per_full += throttle_us;
>> + } else {
>> + sleep_pct = (quota - current) * 100 / quota;
>> + throttle_us =
>> + ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
>> + cpu->throttle_us_per_full -= throttle_us;
>> + }
>> +
>> + trace_dirtylimit_throttle_pct(cpu->cpu_index,
>> + sleep_pct,
>> + throttle_us);
>> + } else {
>> + if (quota < current) {
>> + cpu->throttle_us_per_full += ring_full_time_us / 10;
>> + } else {
>> + cpu->throttle_us_per_full -= ring_full_time_us / 10;
>> + }
>> + }
>> +
>> + cpu->throttle_us_per_full = MIN(cpu->throttle_us_per_full,
>> + ring_full_time_us * DIRTYLIMIT_THROTTLE_PCT_MAX);
>> +
>> + cpu->throttle_us_per_full = MAX(cpu->throttle_us_per_full, 0);
>> +}
>
> This algorithm seems works even worse than the previous version, could you have
> a look on what's wrong?
>
> See how it fluctuates when I set a throttle of 300MB/s:
>
> (QMP) set-vcpu-dirty-limit dirty-rate=300
>
> Dirty rate: 17622 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 17617 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 17611 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 13023 (MB/s), duration: 1153 (ms), load: 100.00%
> Dirty rate: 923 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 2853 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 1963 (MB/s), duration: 1040 (ms), load: 100.00%
> Dirty rate: 180 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 182 (MB/s), duration: 1007 (ms), load: 100.00%
> Dirty rate: 177 (MB/s), duration: 1005 (ms), load: 100.00%
> Dirty rate: 181 (MB/s), duration: 1007 (ms), load: 100.00%
> Dirty rate: 179 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 168 (MB/s), duration: 1005 (ms), load: 100.00%
> Dirty rate: 169 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 2717 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 2851 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 1773 (MB/s), duration: 1021 (ms), load: 100.00%
> Dirty rate: 177 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 179 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 175 (MB/s), duration: 1005 (ms), load: 100.00%
> Dirty rate: 1973 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 2878 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 1690 (MB/s), duration: 1022 (ms), load: 100.00%
> Dirty rate: 174 (MB/s), duration: 1005 (ms), load: 100.00%
> Dirty rate: 184 (MB/s), duration: 1006 (ms), load: 100.00%
>
> This is the tool I'm using:
>
> https://github.com/xzpeter/mig_mon#memory-dirty
>
> Again, I won't ask for a good algorithm as the 1st version, but then I think
> it's nicer we have the simplest algorithm merged first, which should be very
> easy to verify.
>
Hi,Peter. I'm working on this problem and found the reason is kind of
the same as i metioned in cover letter of v10, the following is what i
posted:
2. The new implementaion of throttle algo enlightened by Peter
responds faster and consume less cpu resource than the older,
we make a impressed progress.
And there is a viewpoint may be discussed, it is that the new
throttle logic is "passive", vcpu sleeps only after dirty ring,
is full, unlike the "auto-converge" which will kick vcpu instead
in a fixed slice time. If the vcpu is memory-write intensive
and the ring size is large, it will produce dirty memory during
the dirty ring full time and the throttle works not so good, it
means the throttle depends on the dirty ring size.
I actually tested the new algo in two case:
case 1: dirty-ring-size: 4096, dirtyrate: 1170MB/s
result: minimum quota dirtyrate is 25MB/s or even less
minimum vcpu util is 6%
case 2: dirty-ring-size: 65536, dirtyrate: 1170MB/s
result: minimum quota dirtyrate is 256MB/s
minimum vcpu util is 24%
I post this just for discussion, i think this is not a big deal
beacase if we set the dirty-ring-size to the maximum value(65536),
we assume the server's bandwidth is capable of handling it.
Currently, Qemu handle the vcpu KVM_EXIT_DIRTY_RING_FULL exit as the
following:
1. If one of the dirty-ring on a vcpu is full, vcpu thread returns to
user space and qemu handle it.
2. Qemu get the kvm_slots_lock and reap dirty-ring of all vcpus once for
all by calling kvm_dirty_ring_reap, fill the dirty page bitmap of slot
and reset dirty ring. Release the kvm_slots_lock finally.
The logic *reap and reset dirty ring of all vcpu after one vcpu's dirty
ring is full* works fine and efficiently.
But this is not what dirtylimit want becasue some of the vcpu may loss
chance to sleep and could not be throttled, though vcpu's dirty ring was
full.
The latest test environment of you used a larger guest(20G, 40 cores),
increasing the chances of missing sleep for vcpus and the throttle works
not good as before.
I try a simple modification make the throttle works better as before:
+static void kvm_dirty_ring_reset_one(KVMState *s, CPUState *cpu)
+{
+ int ret;
+ uint64_t total = 0;
+
+ kvm_slots_lock();
+ total = kvm_dirty_ring_reap_one(s, cpu);
+
+ if (total) {
+ ret = kvm_vm_ioctl(s, KVM_RESET_DIRTY_RINGS);
+ assert(ret == total);
+ }
+
+ kvm_slots_unlock();
+}
+
static void do_kvm_cpu_synchronize_kick(CPUState *cpu, run_on_cpu_data
arg)
{
/* No need to do anything */
@@ -2309,6 +2327,11 @@ bool kvm_dirty_ring_enabled(void)
return kvm_state->kvm_dirty_ring_size ? true : false;
}
static int kvm_init(MachineState *ms)
{
MachineClass *mc = MACHINE_GET_CLASS(ms);
@@ -2955,9 +2978,8 @@ int kvm_cpu_exec(CPUState *cpu)
* still full. Got kicked by KVM_RESET_DIRTY_RINGS.
*/
trace_kvm_dirty_ring_full(cpu->cpu_index);
- qemu_mutex_lock_iothread();
- kvm_dirty_ring_reap(kvm_state);
- qemu_mutex_unlock_iothread();
+ kvm_dirty_ring_reset_one(kvm_state, cpu);
+ dirtylimit_vcpu_execute(cpu);
ret = 0;
break;
case KVM_EXIT_SYSTEM_EVENT:
I drop the BQL to reduce the overhead of KVM_EXIT_DIRTY_RING_FULL exit
handle. May be kvm_state could be protected by BQL, but i wonder if
there can be a finer granularity lock.
How about this?
>> +
>> +static void dirtylimit_adjust_throttle(CPUState *cpu)
>> +{
>> + uint64_t quota = 0;
>> + uint64_t current = 0;
>> + int cpu_index = cpu->cpu_index;
>> +
>> + quota = dirtylimit_vcpu_get_state(cpu_index)->quota;
>> + current = vcpu_dirty_rate_get(cpu_index);
>> +
>> + if (current == 0 &&
>> + dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt == 0) {
>> + cpu->throttle_us_per_full = 0;
>> + goto end;
>> + } else if (++dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt
>> + < 2) {
>> + goto end;
>> + } else if (dirtylimit_done(quota, current)) {
>> + goto end;
>> + } else {
>> + dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt = 0;
>> + dirtylimit_set_throttle(cpu, quota, current);
>> + }
>> +end:
>> + trace_dirtylimit_adjust_throttle(cpu_index,
>> + quota, current,
>> + cpu->throttle_us_per_full);
>> + return;
>> +}
>> +
>> +static void *dirtylimit_thread(void *opaque)
>> +{
>> + CPUState *cpu;
>> +
>> + rcu_register_thread();
>> +
>> + while (!qatomic_read(&dirtylimit_quit)) {
>> + sleep(DIRTYLIMIT_CALC_TIME_MS / 1000);
>
> Sorry to have not mentioned this: I think we probably don't even need this
> dirtylimit thread.
>
> It'll be hard to make the "sleep" right here.. you could read two identical
> values from the dirty calc thread because the 1sec sleep is not accurate, so
> even after this sleep() the calc thread may not have provided the latest number
> yet.
>
> It'll be much cleaner (and most importantly, accurate..) to me if we could make
> this a hook function being passed over to the vcpu_dirty_rate_stat_thread()
> thread, then after each vcpu_dirty_rate_stat_collect() we call the hook.
>
>> +
>> + dirtylimit_state_lock();
>> +
>> + if (!dirtylimit_in_service()) {
>> + dirtylimit_state_unlock();
>> + break;
>> + }
>> +
>> + CPU_FOREACH(cpu) {
>> + if (!dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled) {
>> + continue;
>> + }
>> + dirtylimit_adjust_throttle(cpu);
>> + }
>> + dirtylimit_state_unlock();
>> + }
>> +
>> + rcu_unregister_thread();
>> +
>> + return NULL;
>> +}
>> +
>> +static void dirtylimit_thread_start(void)
>> +{
>> + qatomic_set(&dirtylimit_quit, 0);
>> + qemu_thread_create(&dirtylimit_thr,
>> + "dirtylimit",
>> + dirtylimit_thread,
>> + NULL,
>> + QEMU_THREAD_JOINABLE);
>> +}
>> +
>> +static void dirtylimit_thread_stop(void)
>> +{
>> + qatomic_set(&dirtylimit_quit, 1);
>> + qemu_mutex_unlock_iothread();
>> + qemu_thread_join(&dirtylimit_thr);
>> + qemu_mutex_lock_iothread();
>> +}
>> +
>> +void dirtylimit_set_vcpu(int cpu_index,
>> + uint64_t quota,
>> + bool enable)
>> +{
>> + trace_dirtylimit_set_vcpu(cpu_index, quota);
>> +
>> + if (enable) {
>> + if (dirtylimit_in_service()) {
>> + /* only set the vcpu dirty page rate limit */
>> + dirtylimit_vcpu_set_quota(cpu_index, quota, true);
>> + return;
>> + }
>> +
>> + /* initialize state when set dirtylimit first time */
>> + dirtylimit_state_lock();
>> + dirtylimit_state_initialize();
>> + dirtylimit_vcpu_set_quota(cpu_index, quota, true);
>> + dirtylimit_state_unlock();
>> +
>> + dirtylimit_thread_start();
>
> Can we move dirtylimit global initializations out of dirtylimit_set_vcpu() too?
> We should always keep init/cleanup of dirty_rate_calc and dirtylimit together,
> because they should, imho. We never enable one of them.
>
> I commented similarly in previous version on this.
>
>> + } else {
>> + if (!dirtylimit_in_service()) {
>> + return;
>> + }
>> +
>> + dirtylimit_state_lock();
>> + /* dirty page rate limit is not enabled */
>> + if (!dirtylimit_vcpu_get_state(cpu_index)->enabled) {
>> + dirtylimit_state_unlock();
>> + return;
>> + }
>> +
>> + /* switch off vcpu dirty page rate limit */
>> + dirtylimit_vcpu_set_quota(cpu_index, 0, false);
>> + dirtylimit_state_unlock();
>> +
>> + if (!dirtylimit_state->limited_nvcpu) {
>> + dirtylimit_thread_stop();
>> +
>> + dirtylimit_state_lock();
>> + dirtylimit_state_finalize();
>> + dirtylimit_state_unlock();
>
> We don't need such a fine control of locking, IMHO.. it can be a very big lock
> just to serialize things..
>
> IMHO it could be as simple as:
>
> void dirtylimit_set_vcpu(int cpu_index,
> uint64_t quota,
> bool enable)
> {
> dirtylimit_vcpu_set_quota(cpu_index, quota, enable);
> trace_dirtylimit_set_vcpu(cpu_index, quota);
> }
>
> void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
> uint64_t cpu_index,
> uint64_t dirty_rate,
> Error **errp)
> {
> if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
> error_setg(errp, "dirty page limit feature requires KVM with"
> " accelerator property 'dirty-ring-size' set'");
> return;
> }
>
> if (has_cpu_index && !dirtylimit_vcpu_index_valid(cpu_index)) {
> error_setg(errp, "incorrect cpu index specified");
> return;
> }
>
> dirtylimit_state_lock();
>
> if (!dirtylimit_in_service()) {
> /* TODO: we could have one helper to initialize all of them */
> vcpu_dirty_rate_stat_initialize();
> vcpu_dirty_rate_stat_start();
> dirtylimit_state_initialize();
> dirtylimit_vcpu_set_quota(cpu_index, quota, true);
> }
>
> if (has_cpu_index) {
> dirtylimit_set_vcpu(cpu_index, dirty_rate, true);
> } else {
> dirtylimit_set_all(dirty_rate, true);
> }
>
> dirtylimit_state_unlock();
> }
>
> I didn't write the cleanup path, but it's the same: we should only cleanup all
> the global structs in cancel-vcpu-dirty-limit when we found there's zero vcpus
> in track, and it should be done once there.
>
> Thanks,
>
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-20 8:26 ` Hyman Huang
@ 2022-01-20 9:25 ` Peter Xu
2022-01-20 10:03 ` Hyman Huang
2022-01-20 10:39 ` Hyman Huang
0 siblings, 2 replies; 33+ messages in thread
From: Peter Xu @ 2022-01-20 9:25 UTC (permalink / raw)
To: Hyman Huang
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Thu, Jan 20, 2022 at 04:26:09PM +0800, Hyman Huang wrote:
> Hi,Peter. I'm working on this problem and found the reason is kind of the
> same as i metioned in cover letter of v10, the following is what i posted:
>
> 2. The new implementaion of throttle algo enlightened by Peter
> responds faster and consume less cpu resource than the older,
> we make a impressed progress.
>
> And there is a viewpoint may be discussed, it is that the new
> throttle logic is "passive", vcpu sleeps only after dirty ring,
> is full, unlike the "auto-converge" which will kick vcpu instead
> in a fixed slice time. If the vcpu is memory-write intensive
> and the ring size is large, it will produce dirty memory during
> the dirty ring full time and the throttle works not so good, it
> means the throttle depends on the dirty ring size.
>
> I actually tested the new algo in two case:
>
> case 1: dirty-ring-size: 4096, dirtyrate: 1170MB/s
> result: minimum quota dirtyrate is 25MB/s or even less
> minimum vcpu util is 6%
>
> case 2: dirty-ring-size: 65536, dirtyrate: 1170MB/s
> result: minimum quota dirtyrate is 256MB/s
> minimum vcpu util is 24%
>
> I post this just for discussion, i think this is not a big deal
> beacase if we set the dirty-ring-size to the maximum value(65536),
> we assume the server's bandwidth is capable of handling it.
My memory is that I tested your v10 (which has this wait-at-ring-full logic)
already and at that time it worked well.
It's possible that I just got lucky with v10, so that can happen with some
random conditions and so far we still don't know how to hit it.
However..
>
> Currently, Qemu handle the vcpu KVM_EXIT_DIRTY_RING_FULL exit as the
> following:
>
> 1. If one of the dirty-ring on a vcpu is full, vcpu thread returns to user
> space and qemu handle it.
>
> 2. Qemu get the kvm_slots_lock and reap dirty-ring of all vcpus once for all
> by calling kvm_dirty_ring_reap, fill the dirty page bitmap of slot and reset
> dirty ring. Release the kvm_slots_lock finally.
>
> The logic *reap and reset dirty ring of all vcpu after one vcpu's dirty ring
> is full* works fine and efficiently.
>
> But this is not what dirtylimit want becasue some of the vcpu may loss
> chance to sleep and could not be throttled, though vcpu's dirty ring was
> full.
>
> The latest test environment of you used a larger guest(20G, 40 cores),
> increasing the chances of missing sleep for vcpus and the throttle works not
> good as before.
>
> I try a simple modification make the throttle works better as before:
>
> +static void kvm_dirty_ring_reset_one(KVMState *s, CPUState *cpu)
> +{
> + int ret;
> + uint64_t total = 0;
> +
> + kvm_slots_lock();
> + total = kvm_dirty_ring_reap_one(s, cpu);
> +
> + if (total) {
> + ret = kvm_vm_ioctl(s, KVM_RESET_DIRTY_RINGS);
> + assert(ret == total);
> + }
> +
> + kvm_slots_unlock();
> +}
> +
> static void do_kvm_cpu_synchronize_kick(CPUState *cpu, run_on_cpu_data arg)
> {
> /* No need to do anything */
> @@ -2309,6 +2327,11 @@ bool kvm_dirty_ring_enabled(void)
> return kvm_state->kvm_dirty_ring_size ? true : false;
> }
>
> static int kvm_init(MachineState *ms)
> {
> MachineClass *mc = MACHINE_GET_CLASS(ms);
> @@ -2955,9 +2978,8 @@ int kvm_cpu_exec(CPUState *cpu)
> * still full. Got kicked by KVM_RESET_DIRTY_RINGS.
> */
> trace_kvm_dirty_ring_full(cpu->cpu_index);
> - qemu_mutex_lock_iothread();
> - kvm_dirty_ring_reap(kvm_state);
> - qemu_mutex_unlock_iothread();
> + kvm_dirty_ring_reset_one(kvm_state, cpu);
> + dirtylimit_vcpu_execute(cpu);
> ret = 0;
> break;
> case KVM_EXIT_SYSTEM_EVENT:
>
> I drop the BQL to reduce the overhead of KVM_EXIT_DIRTY_RING_FULL exit
> handle. May be kvm_state could be protected by BQL, but i wonder if there
> can be a finer granularity lock.
>
> How about this?
... I think what you explained makes sense to me.
Note that there's also the reaper thread running in the background that can
reap all the cores too.
It only runs once per second so it shouldn't bring a lot of differences, but
I'm also wondering whether we should also turn that temporarily off too when
dirtylimit is enabled - we can simply let it keep sleeping if dirtylimit is in
service.
Dropping BQL may not be safe, as it serializes the reaping with other possible
kvm memslot updates. I don't know whether it's a must in the future to use BQL
for reaping the rings, but so far I'd say we can still stick with it.
Note that even if you don't take BQL you'll still need the slots_lock and so
far that's also global, so I don't see how it can help on vcpu concurrency
anyway even if we dropped one of them.
If to do this, could you not introduce kvm_dirty_ring_reset_one() but just let
it take one more CPUState* parameter? Most of the codes you added should be
similar to kvm_dirty_ring_reap_locked(), and I wanted to keep the trace point
there (trace_kvm_dirty_ring_reap, though that needs another parameter too).
And that patch can be done on top of this patch, so it can be reviewed easier
outside of dirtylimit details.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-20 9:25 ` Peter Xu
@ 2022-01-20 10:03 ` Hyman Huang
2022-01-20 10:58 ` Peter Xu
2022-01-20 10:39 ` Hyman Huang
1 sibling, 1 reply; 33+ messages in thread
From: Hyman Huang @ 2022-01-20 10:03 UTC (permalink / raw)
To: Peter Xu
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
在 2022/1/20 17:25, Peter Xu 写道:
> On Thu, Jan 20, 2022 at 04:26:09PM +0800, Hyman Huang wrote:
>> Hi,Peter. I'm working on this problem and found the reason is kind of the
>> same as i metioned in cover letter of v10, the following is what i posted:
>>
>> 2. The new implementaion of throttle algo enlightened by Peter
>> responds faster and consume less cpu resource than the older,
>> we make a impressed progress.
>>
>> And there is a viewpoint may be discussed, it is that the new
>> throttle logic is "passive", vcpu sleeps only after dirty ring,
>> is full, unlike the "auto-converge" which will kick vcpu instead
>> in a fixed slice time. If the vcpu is memory-write intensive
>> and the ring size is large, it will produce dirty memory during
>> the dirty ring full time and the throttle works not so good, it
>> means the throttle depends on the dirty ring size.
>>
>> I actually tested the new algo in two case:
>>
>> case 1: dirty-ring-size: 4096, dirtyrate: 1170MB/s
>> result: minimum quota dirtyrate is 25MB/s or even less
>> minimum vcpu util is 6%
>>
>> case 2: dirty-ring-size: 65536, dirtyrate: 1170MB/s
>> result: minimum quota dirtyrate is 256MB/s
>> minimum vcpu util is 24%
>>
>> I post this just for discussion, i think this is not a big deal
>> beacase if we set the dirty-ring-size to the maximum value(65536),
>> we assume the server's bandwidth is capable of handling it.
>
> My memory is that I tested your v10 (which has this wait-at-ring-full logic)
> already and at that time it worked well.
>
> It's possible that I just got lucky with v10, so that can happen with some
> random conditions and so far we still don't know how to hit it.
Uh, sorry for not explaining the reason clearly. I think the reason of
failing to throttle is "vcpu loss chance to sleep", i trace the
kvm_dirty_ring_full event and found that when throttling a vcpu works
bad, the kvm_dirty_ring_full event do no arise and sleep never happens
correspondingly.
Two case lead to this result:
case 1: dirty-ring-size is large and the ring full time is long, not all
dirty ring of vcpu get full at one time, so there must be
some vcpus that miss the chance to sleep.
case 2: guest has many vcpus, not all dirty ring of vcpu get full at one
time. So miss the chance of sleeping too as above.
>
> However..
>
>>
>> Currently, Qemu handle the vcpu KVM_EXIT_DIRTY_RING_FULL exit as the
>> following:
>>
>> 1. If one of the dirty-ring on a vcpu is full, vcpu thread returns to user
>> space and qemu handle it.
>>
>> 2. Qemu get the kvm_slots_lock and reap dirty-ring of all vcpus once for all
>> by calling kvm_dirty_ring_reap, fill the dirty page bitmap of slot and reset
>> dirty ring. Release the kvm_slots_lock finally.
>>
>> The logic *reap and reset dirty ring of all vcpu after one vcpu's dirty ring
>> is full* works fine and efficiently.
>>
>> But this is not what dirtylimit want becasue some of the vcpu may loss
>> chance to sleep and could not be throttled, though vcpu's dirty ring was
>> full.
>>
>> The latest test environment of you used a larger guest(20G, 40 cores),
>> increasing the chances of missing sleep for vcpus and the throttle works not
>> good as before.
>>
>> I try a simple modification make the throttle works better as before:
>>
>> +static void kvm_dirty_ring_reset_one(KVMState *s, CPUState *cpu)
>> +{
>> + int ret;
>> + uint64_t total = 0;
>> +
>> + kvm_slots_lock();
>> + total = kvm_dirty_ring_reap_one(s, cpu);
>> +
>> + if (total) {
>> + ret = kvm_vm_ioctl(s, KVM_RESET_DIRTY_RINGS);
>> + assert(ret == total);
>> + }
>> +
>> + kvm_slots_unlock();
>> +}
>> +
>> static void do_kvm_cpu_synchronize_kick(CPUState *cpu, run_on_cpu_data arg)
>> {
>> /* No need to do anything */
>> @@ -2309,6 +2327,11 @@ bool kvm_dirty_ring_enabled(void)
>> return kvm_state->kvm_dirty_ring_size ? true : false;
>> }
>>
>> static int kvm_init(MachineState *ms)
>> {
>> MachineClass *mc = MACHINE_GET_CLASS(ms);
>> @@ -2955,9 +2978,8 @@ int kvm_cpu_exec(CPUState *cpu)
>> * still full. Got kicked by KVM_RESET_DIRTY_RINGS.
>> */
>> trace_kvm_dirty_ring_full(cpu->cpu_index);
>> - qemu_mutex_lock_iothread();
>> - kvm_dirty_ring_reap(kvm_state);
>> - qemu_mutex_unlock_iothread();
>> + kvm_dirty_ring_reset_one(kvm_state, cpu);
>> + dirtylimit_vcpu_execute(cpu);
>> ret = 0;
>> break;
>> case KVM_EXIT_SYSTEM_EVENT:
>>
>> I drop the BQL to reduce the overhead of KVM_EXIT_DIRTY_RING_FULL exit
>> handle. May be kvm_state could be protected by BQL, but i wonder if there
>> can be a finer granularity lock.
>>
>> How about this?
>
> ... I think what you explained makes sense to me.
>
> Note that there's also the reaper thread running in the background that can
> reap all the cores too.
>
> It only runs once per second so it shouldn't bring a lot of differences, but
> I'm also wondering whether we should also turn that temporarily off too when
> dirtylimit is enabled - we can simply let it keep sleeping if dirtylimit is in
> service.
>
> Dropping BQL may not be safe, as it serializes the reaping with other possible
> kvm memslot updates. I don't know whether it's a must in the future to use BQL
> for reaping the rings, but so far I'd say we can still stick with it.
>
> Note that even if you don't take BQL you'll still need the slots_lock and so
> far that's also global, so I don't see how it can help on vcpu concurrency
> anyway even if we dropped one of them.
>
> If to do this, could you not introduce kvm_dirty_ring_reset_one() but just let
> it take one more CPUState* parameter? Most of the codes you added should be
> similar to kvm_dirty_ring_reap_locked(), and I wanted to keep the trace point
> there (trace_kvm_dirty_ring_reap, though that needs another parameter too).
>
> And that patch can be done on top of this patch, so it can be reviewed easier
> outside of dirtylimit details.
>
> Thanks,
>
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-20 9:25 ` Peter Xu
2022-01-20 10:03 ` Hyman Huang
@ 2022-01-20 10:39 ` Hyman Huang
2022-01-20 10:56 ` Peter Xu
1 sibling, 1 reply; 33+ messages in thread
From: Hyman Huang @ 2022-01-20 10:39 UTC (permalink / raw)
To: Peter Xu
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
在 2022/1/20 17:25, Peter Xu 写道:
>
> ... I think what you explained makes sense to me.
>
> Note that there's also the reaper thread running in the background that can
> reap all the cores too.
>
> It only runs once per second so it shouldn't bring a lot of differences, but
> I'm also wondering whether we should also turn that temporarily off too when
> dirtylimit is enabled - we can simply let it keep sleeping if dirtylimit is in
> service.
Does this work ok when dirtylimit and migration happens concurrently?
Migration may fetch the aged dirty bitmap info from slot if we turn
reaper thread off. As you metioned above, reaper thread only runs once
per second. Is it more suitable for not touching the reaper thread logic?
>
> Dropping BQL may not be safe, as it serializes the reaping with other possible
> kvm memslot updates. I don't know whether it's a must in the future to use BQL
> for reaping the rings, but so far I'd say we can still stick with it.
>
> Note that even if you don't take BQL you'll still need the slots_lock and so
> far that's also global, so I don't see how it can help on vcpu concurrency
> anyway even if we dropped one of them.
>
> If to do this, could you not introduce kvm_dirty_ring_reset_one() but just let
> it take one more CPUState* parameter? Most of the codes you added should be
> similar to kvm_dirty_ring_reap_locked(), and I wanted to keep the trace point
> there (trace_kvm_dirty_ring_reap, though that needs another parameter too).
>
> And that patch can be done on top of this patch, so it can be reviewed easier
> outside of dirtylimit details.
>
> Thanks,
>
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-20 10:39 ` Hyman Huang
@ 2022-01-20 10:56 ` Peter Xu
2022-01-20 11:03 ` Hyman Huang
0 siblings, 1 reply; 33+ messages in thread
From: Peter Xu @ 2022-01-20 10:56 UTC (permalink / raw)
To: Hyman Huang
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Thu, Jan 20, 2022 at 06:39:01PM +0800, Hyman Huang wrote:
> > It only runs once per second so it shouldn't bring a lot of differences, but
> > I'm also wondering whether we should also turn that temporarily off too when
> > dirtylimit is enabled - we can simply let it keep sleeping if dirtylimit is in
> > service.
> Does this work ok when dirtylimit and migration happens concurrently?
> Migration may fetch the aged dirty bitmap info from slot if we turn reaper
> thread off. As you metioned above, reaper thread only runs once per second.
> Is it more suitable for not touching the reaper thread logic?
Yes I think it'll still work, as migration will do explicit sync/collect after
each iteration. See kvm_log_sync_global().
In short, you shouldn't touch the rest of the kvm_dirty_ring_reap() callers,
they'll always flush all the rings.
But it'll be nicer if you will try it out. :)
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-20 10:03 ` Hyman Huang
@ 2022-01-20 10:58 ` Peter Xu
0 siblings, 0 replies; 33+ messages in thread
From: Peter Xu @ 2022-01-20 10:58 UTC (permalink / raw)
To: Hyman Huang
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Thu, Jan 20, 2022 at 06:03:45PM +0800, Hyman Huang wrote:
>
>
> 在 2022/1/20 17:25, Peter Xu 写道:
> > On Thu, Jan 20, 2022 at 04:26:09PM +0800, Hyman Huang wrote:
> > > Hi,Peter. I'm working on this problem and found the reason is kind of the
> > > same as i metioned in cover letter of v10, the following is what i posted:
> > >
> > > 2. The new implementaion of throttle algo enlightened by Peter
> > > responds faster and consume less cpu resource than the older,
> > > we make a impressed progress.
> > >
> > > And there is a viewpoint may be discussed, it is that the new
> > > throttle logic is "passive", vcpu sleeps only after dirty ring,
> > > is full, unlike the "auto-converge" which will kick vcpu instead
> > > in a fixed slice time. If the vcpu is memory-write intensive
> > > and the ring size is large, it will produce dirty memory during
> > > the dirty ring full time and the throttle works not so good, it
> > > means the throttle depends on the dirty ring size.
> > >
> > > I actually tested the new algo in two case:
> > >
> > > case 1: dirty-ring-size: 4096, dirtyrate: 1170MB/s
> > > result: minimum quota dirtyrate is 25MB/s or even less
> > > minimum vcpu util is 6%
> > >
> > > case 2: dirty-ring-size: 65536, dirtyrate: 1170MB/s
> > > result: minimum quota dirtyrate is 256MB/s
> > > minimum vcpu util is 24%
> > >
> > > I post this just for discussion, i think this is not a big deal
> > > beacase if we set the dirty-ring-size to the maximum value(65536),
> > > we assume the server's bandwidth is capable of handling it.
> >
> > My memory is that I tested your v10 (which has this wait-at-ring-full logic)
> > already and at that time it worked well.
> >
> > It's possible that I just got lucky with v10, so that can happen with some
> > random conditions and so far we still don't know how to hit it.
> Uh, sorry for not explaining the reason clearly. I think the reason of
> failing to throttle is "vcpu loss chance to sleep", i trace the
> kvm_dirty_ring_full event and found that when throttling a vcpu works bad,
> the kvm_dirty_ring_full event do no arise and sleep never happens
> correspondingly.
I see, that's fine. I think I'll just try the new version when it's ready.
>
> Two case lead to this result:
> case 1: dirty-ring-size is large and the ring full time is long, not all
> dirty ring of vcpu get full at one time, so there must be some vcpus that
> miss the chance to sleep.
>
> case 2: guest has many vcpus, not all dirty ring of vcpu get full at one
> time. So miss the chance of sleeping too as above.
I am just not sure whether my test case matches any of above: I'm using 4096
which is the smaller ring size, meanwhile I only have 1 busy dirty thread,
iirc.
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-20 10:56 ` Peter Xu
@ 2022-01-20 11:03 ` Hyman Huang
2022-01-20 11:13 ` Peter Xu
0 siblings, 1 reply; 33+ messages in thread
From: Hyman Huang @ 2022-01-20 11:03 UTC (permalink / raw)
To: Peter Xu
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
在 2022/1/20 18:56, Peter Xu 写道:
> On Thu, Jan 20, 2022 at 06:39:01PM +0800, Hyman Huang wrote:
>>> It only runs once per second so it shouldn't bring a lot of differences, but
>>> I'm also wondering whether we should also turn that temporarily off too when
>>> dirtylimit is enabled - we can simply let it keep sleeping if dirtylimit is in
>>> service.
>> Does this work ok when dirtylimit and migration happens concurrently?
>> Migration may fetch the aged dirty bitmap info from slot if we turn reaper
>> thread off. As you metioned above, reaper thread only runs once per second.
>> Is it more suitable for not touching the reaper thread logic?
>
> Yes I think it'll still work, as migration will do explicit sync/collect after
> each iteration. See kvm_dirty_ring_flush().
Oh, i get it. :)
I just missed the key function kvm_dirty_ring_flush in kvm_dirty_ring_flush.
>
> In short, you shouldn't touch the rest of the kvm_dirty_ring_reap() callers,
> they'll always flush all the rings.
>
> But it'll be nicer if you will try it out. :)
>
Ok, i'll try this out.
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-20 11:03 ` Hyman Huang
@ 2022-01-20 11:13 ` Peter Xu
0 siblings, 0 replies; 33+ messages in thread
From: Peter Xu @ 2022-01-20 11:13 UTC (permalink / raw)
To: Hyman Huang
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Thu, Jan 20, 2022 at 07:03:30PM +0800, Hyman Huang wrote:
> Ok, i'll try this out.
You could even add a unit test if you want: see test_migrate_auto_converge() in
migration-test.c.
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-19 12:16 ` Markus Armbruster
@ 2022-01-20 11:22 ` Hyman Huang
0 siblings, 0 replies; 33+ messages in thread
From: Hyman Huang @ 2022-01-20 11:22 UTC (permalink / raw)
To: Markus Armbruster
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Peter Xu, Dr. David Alan Gilbert,
Paolo Bonzini, Philippe Mathieu-Daudé
在 2022/1/19 20:16, Markus Armbruster 写道:
> huangy81@chinatelecom.cn writes:
>
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> Setup a negative feedback system when vCPU thread
>> handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
>> throttle_us_per_full field in struct CPUState. Sleep
>> throttle_us_per_full microseconds to throttle vCPU
>> if dirtylimit is enabled.
>>
>> Start a thread to track current dirty page rates and
>> tune the throttle_us_per_full dynamically untill current
>> dirty page rate reach the quota.
>>
>> Introduce the util function in the header for dirtylimit
>> implementation.
>>
>> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>
> [...]
>
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index bbfd48c..ac5fa56 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -1850,6 +1850,25 @@
>> { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
>>
>> ##
>> +# @DirtyLimitInfo:
>> +#
>> +# Dirty page rate limit information of virtual CPU.
>> +#
>> +# @cpu-index: index of virtual CPU.
>> +#
>> +# @limit-rate: upper limit of dirty page rate for virtual CPU.
>
> Lacks a unit. Is it bytes per second? pages per second?
>
> If I understand your code correctly, zero means unlimited. This is
> undocumented. Please document it. Something like "0 means unlimited"
> should do.
>
Ok.
>> +#
>> +# @current-rate: current dirty page rate for virtual CPU.
>> +#
>> +# Since: 7.0
>> +#
>> +##
>> +{ 'struct': 'DirtyLimitInfo',
>> + 'data': { 'cpu-index': 'int',
>> + 'limit-rate': 'int64',
>> + 'current-rate': 'int64' } }
>
> The next patch uses 'uint64' for set-vcpu-dirty-limit argument
> dirty-rate. Why signed here?
Yes, this is not consistent with next patch. I left this wrongly after
some modification, it should be moved in to next patch. I'll do that
next version :(
>
>> +
>> +##
>> # @snapshot-save:
>> #
>> # Save a VM snapshot
>
> [...]
>
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-17 7:32 ` Peter Xu
2022-01-17 14:00 ` Hyman Huang
2022-01-20 8:26 ` Hyman Huang
@ 2022-01-21 8:07 ` Hyman Huang
2022-01-21 9:19 ` Peter Xu
2022-01-22 3:54 ` Hyman Huang
2022-01-24 4:20 ` Hyman Huang
4 siblings, 1 reply; 33+ messages in thread
From: Hyman Huang @ 2022-01-21 8:07 UTC (permalink / raw)
To: Peter Xu
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
>
> Again, I won't ask for a good algorithm as the 1st version, but then I think
> it's nicer we have the simplest algorithm merged first, which should be very
> easy to verify.
>
Hi, Peter, i'm working on simplifying the algorithm.
Current throttle logic is like the following:
1. If error value(|quota - current|) less than 25MB/s, we assert
throttle already done and do nothing.
2. Start to throttle if "error value greater than 25MB/s" scenario
detected twice.
3. Speed up throttle via plus and minus linearly if "error value" be
found too large.
4. Throttle normally via plus and minus a fixed time slice.
I think 1、4 are basic logic and shoul not be dropped, and 2 could be
removed(i take this from auto-converg algo), i prefer to reserve 3 so
that the throttle can response fast.
How about this?
Could it be possible that i add some comments in
dirtylimit_adjust_throttle and not touch the logic? I test the result of
v12 and it seems working fine.
>> +
>> +static void dirtylimit_adjust_throttle(CPUState *cpu)
>> +{
>> + uint64_t quota = 0;
>> + uint64_t current = 0;
>> + int cpu_index = cpu->cpu_index;
>> +
>> + quota = dirtylimit_vcpu_get_state(cpu_index)->quota;
>> + current = vcpu_dirty_rate_get(cpu_index);
>> +
>> + if (current == 0 &&
>> + dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt == 0) {
>> + cpu->throttle_us_per_full = 0;
>> + goto end;
>> + } else if (++dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt
>> + < 2) {
>> + goto end;
>> + } else if (dirtylimit_done(quota, current)) {
>> + goto end;
>> + } else {
>> + dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt = 0;
>> + dirtylimit_set_throttle(cpu, quota, current);
>> + }
>> +end:
>> + trace_dirtylimit_adjust_throttle(cpu_index,
>> + quota, current,
>> + cpu->throttle_us_per_full);
>> + return;
>> +}
>> +
>> +static void *dirtylimit_thread(void *opaque)
>> +{
>> + CPUState *cpu;
>> +
>> + rcu_register_thread();
>> +
>> + while (!qatomic_read(&dirtylimit_quit)) {
>> + sleep(DIRTYLIMIT_CALC_TIME_MS / 1000);
>
> Sorry to have not mentioned this: I think we probably don't even need this
> dirtylimit thread.
>
> It'll be hard to make the "sleep" right here.. you could read two identical
> values from the dirty calc thread because the 1sec sleep is not accurate, so
> even after this sleep() the calc thread may not have provided the latest number
> yet.
>
> It'll be much cleaner (and most importantly, accurate..) to me if we could make
> this a hook function being passed over to the vcpu_dirty_rate_stat_thread()
> thread, then after each vcpu_dirty_rate_stat_collect() we call the hook.
>
>> +
>> + dirtylimit_state_lock();
>> +
>> + if (!dirtylimit_in_service()) {
>> + dirtylimit_state_unlock();
>> + break;
>> + }
>> +
>> + CPU_FOREACH(cpu) {
>> + if (!dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled) {
>> + continue;
>> + }
>> + dirtylimit_adjust_throttle(cpu);
>> + }
>> + dirtylimit_state_unlock();
>> + }
>> +
>> + rcu_unregister_thread();
>> +
>> + return NULL;
>> +}
>> +
>> +static void dirtylimit_thread_start(void)
>> +{
>> + qatomic_set(&dirtylimit_quit, 0);
>> + qemu_thread_create(&dirtylimit_thr,
>> + "dirtylimit",
>> + dirtylimit_thread,
>> + NULL,
>> + QEMU_THREAD_JOINABLE);
>> +}
>> +
>> +static void dirtylimit_thread_stop(void)
>> +{
>> + qatomic_set(&dirtylimit_quit, 1);
>> + qemu_mutex_unlock_iothread();
>> + qemu_thread_join(&dirtylimit_thr);
>> + qemu_mutex_lock_iothread();
>> +}
>> +
>> +void dirtylimit_set_vcpu(int cpu_index,
>> + uint64_t quota,
>> + bool enable)
>> +{
>> + trace_dirtylimit_set_vcpu(cpu_index, quota);
>> +
>> + if (enable) {
>> + if (dirtylimit_in_service()) {
>> + /* only set the vcpu dirty page rate limit */
>> + dirtylimit_vcpu_set_quota(cpu_index, quota, true);
>> + return;
>> + }
>> +
>> + /* initialize state when set dirtylimit first time */
>> + dirtylimit_state_lock();
>> + dirtylimit_state_initialize();
>> + dirtylimit_vcpu_set_quota(cpu_index, quota, true);
>> + dirtylimit_state_unlock();
>> +
>> + dirtylimit_thread_start();
>
> Can we move dirtylimit global initializations out of dirtylimit_set_vcpu() too?
> We should always keep init/cleanup of dirty_rate_calc and dirtylimit together,
> because they should, imho. We never enable one of them.
>
> I commented similarly in previous version on this.
>
>> + } else {
>> + if (!dirtylimit_in_service()) {
>> + return;
>> + }
>> +
>> + dirtylimit_state_lock();
>> + /* dirty page rate limit is not enabled */
>> + if (!dirtylimit_vcpu_get_state(cpu_index)->enabled) {
>> + dirtylimit_state_unlock();
>> + return;
>> + }
>> +
>> + /* switch off vcpu dirty page rate limit */
>> + dirtylimit_vcpu_set_quota(cpu_index, 0, false);
>> + dirtylimit_state_unlock();
>> +
>> + if (!dirtylimit_state->limited_nvcpu) {
>> + dirtylimit_thread_stop();
>> +
>> + dirtylimit_state_lock();
>> + dirtylimit_state_finalize();
>> + dirtylimit_state_unlock();
>
> We don't need such a fine control of locking, IMHO.. it can be a very big lock
> just to serialize things..
>
> IMHO it could be as simple as:
>
> void dirtylimit_set_vcpu(int cpu_index,
> uint64_t quota,
> bool enable)
> {
> dirtylimit_vcpu_set_quota(cpu_index, quota, enable);
> trace_dirtylimit_set_vcpu(cpu_index, quota);
> }
>
> void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
> uint64_t cpu_index,
> uint64_t dirty_rate,
> Error **errp)
> {
> if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
> error_setg(errp, "dirty page limit feature requires KVM with"
> " accelerator property 'dirty-ring-size' set'");
> return;
> }
>
> if (has_cpu_index && !dirtylimit_vcpu_index_valid(cpu_index)) {
> error_setg(errp, "incorrect cpu index specified");
> return;
> }
>
> dirtylimit_state_lock();
>
> if (!dirtylimit_in_service()) {
> /* TODO: we could have one helper to initialize all of them */
> vcpu_dirty_rate_stat_initialize();
> vcpu_dirty_rate_stat_start();
> dirtylimit_state_initialize();
> dirtylimit_vcpu_set_quota(cpu_index, quota, true);
> }
>
> if (has_cpu_index) {
> dirtylimit_set_vcpu(cpu_index, dirty_rate, true);
> } else {
> dirtylimit_set_all(dirty_rate, true);
> }
>
> dirtylimit_state_unlock();
> }
>
> I didn't write the cleanup path, but it's the same: we should only cleanup all
> the global structs in cancel-vcpu-dirty-limit when we found there's zero vcpus
> in track, and it should be done once there.
>
> Thanks,
>
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-21 8:07 ` Hyman Huang
@ 2022-01-21 9:19 ` Peter Xu
0 siblings, 0 replies; 33+ messages in thread
From: Peter Xu @ 2022-01-21 9:19 UTC (permalink / raw)
To: Hyman Huang
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Fri, Jan 21, 2022 at 04:07:24PM +0800, Hyman Huang wrote:
> Hi, Peter, i'm working on simplifying the algorithm.
> Current throttle logic is like the following:
>
> 1. If error value(|quota - current|) less than 25MB/s, we assert throttle
> already done and do nothing.
>
> 2. Start to throttle if "error value greater than 25MB/s" scenario detected
> twice.
>
> 3. Speed up throttle via plus and minus linearly if "error value" be found
> too large.
>
> 4. Throttle normally via plus and minus a fixed time slice.
>
> I think 1、4 are basic logic and shoul not be dropped, and 2 could be
> removed(i take this from auto-converg algo),
Agreed.
> i prefer to reserve 3 so that the throttle can response fast.
>
> How about this?
>
> Could it be possible that i add some comments in dirtylimit_adjust_throttle
> and not touch the logic? I test the result of v12 and it seems working fine.
I only worry that the differential part (step 3) makes it oscillate, and that's
what I hit. Maybe it can be tuned so it'll not happen with general use cases
then I think it's perfectly fine at least to me.
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 1/4] migration/dirtyrate: refactor dirty page rate calculation
2022-01-17 2:19 ` Peter Xu
@ 2022-01-22 3:22 ` Hyman Huang
2022-01-24 3:08 ` Peter Xu
0 siblings, 1 reply; 33+ messages in thread
From: Hyman Huang @ 2022-01-22 3:22 UTC (permalink / raw)
To: Peter Xu
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
在 2022/1/17 10:19, Peter Xu 写道:
> On Wed, Jan 05, 2022 at 01:14:06AM +0800, huangy81@chinatelecom.cn wrote:
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> +
>> +static void vcpu_dirty_stat_collect(VcpuStat *stat,
>> + DirtyPageRecord *records,
>> + bool start)
>> +{
>> + CPUState *cpu;
>> +
>> + CPU_FOREACH(cpu) {
>> + if (!start && cpu->cpu_index >= stat->nvcpu) {
>> + /*
>> + * Never go there unless cpu is hot-plugged,
>> + * just ignore in this case.
>> + */
>> + continue;
>> + }
>
> As commented before, I think the only way to do it right is does not allow cpu
> plug/unplug during measurement..
>
> Say, even if index didn't get out of range, an unplug even should generate very
> stange output of the unplugged cpu. Please see more below.
>
>> + record_dirtypages(records, cpu, start);
>> + }
>> +}
>> +
>> +int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
>> + int64_t init_time_ms,
>> + VcpuStat *stat,
>> + unsigned int flag,
>> + bool one_shot)
>> +{
>> + DirtyPageRecord *records;
>> + int64_t duration;
>> + int64_t dirtyrate;
>> + int i = 0;
>> +
>> + cpu_list_lock();
>> + records = vcpu_dirty_stat_alloc(stat);
>> + vcpu_dirty_stat_collect(stat, records, true);
>> + cpu_list_unlock();
>
> Continue with above - then I'm wondering whether we should just keep taking the
> lock until vcpu_dirty_stat_collect().
>
> Yes we could be taking the lock for a long time because of the sleep, but the
> main thread plug thread will just wait for it to complete and it is at least
> not a e.g. deadlock.
>
> The other solution is we do cpu_list_unlock() like this, but introduce another
> cpu_list_generation_id and boost it after any plug/unplug of cpu, aka, when cpu
> list changes.
>
> Then we record cpu generation ID at the entry of this function and retry the
> whole measurement if at some point we found generation ID changed (we need to
> fetch the gen ID after having the lock, of course). That could avoid us taking
> the cpu list lock during dirty_stat_wait(), but it'll start to complicate cpu
> list locking rules.
>
> The simpler way is still just to take the lock, imho.
>
Hi, Peter, i'm working on this as you suggetion, and keep taking the
cpu_list_lock during dirty page rate calculation. I found the deadlock
when testing hotplug scenario, the logic is as the following:
calc thread qemu main thread
1. take qemu_cpu_list_lock
1. take the BQL
2. collect dirty page and wait 2. cpu hotplug
3. take qemu_cpu_list_lock
3. take the BQL
4. sync dirty log
5. release the BQL
I just recall that is one of the reasons why i handle the plug/unplug
scenario(another is cpu plug may wait a little bit long time when
dirtylimit in service).
It seems that we have two strategies, one is just keep this logic
untouched in v12 and add "cpu_list_generation_id" implementaion in TODO
list(once this patchset been merged, i'll try that out), another is
introducing the "cpu_list_generation_id" right now.
What strategy do you prefer to?
Uh... I think the "unmatched_cnt" also kind of like this too, becauce
once we remove the "unmatched count" logic, the throttle algo is more
likely to oscillate and i prefer to add the "unmatched_cnt" in TODO list
as above.
> The rest looks good, thanks.
>
>> +
>> + duration = dirty_stat_wait(calc_time_ms, init_time_ms);
>> +
>> + global_dirty_log_sync(flag, one_shot);
>> +
>> + cpu_list_lock();
>> + vcpu_dirty_stat_collect(stat, records, false);
>> + cpu_list_unlock();
>> +
>> + for (i = 0; i < stat->nvcpu; i++) {
>> + dirtyrate = do_calculate_dirtyrate(records[i], duration);
>> +
>> + stat->rates[i].id = i;
>> + stat->rates[i].dirty_rate = dirtyrate;
>> +
>> + trace_dirtyrate_do_calculate_vcpu(i, dirtyrate);
>> + }
>> +
>> + g_free(records);
>> +
>> + return duration;
>> +}
>> +
>> static bool is_sample_period_valid(int64_t sec)
>> {
>> if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC ||
>> @@ -396,44 +518,6 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
>> return true;
>> }
>>
>> -static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
>> - CPUState *cpu, bool start)
>> -{
>> - if (start) {
>> - dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
>> - } else {
>> - dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
>> - }
>> -}
>> -
>> -static void dirtyrate_global_dirty_log_start(void)
>> -{
>> - qemu_mutex_lock_iothread();
>> - memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE);
>> - qemu_mutex_unlock_iothread();
>> -}
>> -
>> -static void dirtyrate_global_dirty_log_stop(void)
>> -{
>> - qemu_mutex_lock_iothread();
>> - memory_global_dirty_log_sync();
>> - memory_global_dirty_log_stop(GLOBAL_DIRTY_DIRTY_RATE);
>> - qemu_mutex_unlock_iothread();
>> -}
>> -
>> -static int64_t do_calculate_dirtyrate_vcpu(DirtyPageRecord dirty_pages)
>> -{
>> - uint64_t memory_size_MB;
>> - int64_t time_s;
>> - uint64_t increased_dirty_pages =
>> - dirty_pages.end_pages - dirty_pages.start_pages;
>> -
>> - memory_size_MB = (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
>> - time_s = DirtyStat.calc_time;
>> -
>> - return memory_size_MB / time_s;
>> -}
>> -
>> static inline void record_dirtypages_bitmap(DirtyPageRecord *dirty_pages,
>> bool start)
>> {
>> @@ -444,11 +528,6 @@ static inline void record_dirtypages_bitmap(DirtyPageRecord *dirty_pages,
>> }
>> }
>>
>> -static void do_calculate_dirtyrate_bitmap(DirtyPageRecord dirty_pages)
>> -{
>> - DirtyStat.dirty_rate = do_calculate_dirtyrate_vcpu(dirty_pages);
>> -}
>> -
>> static inline void dirtyrate_manual_reset_protect(void)
>> {
>> RAMBlock *block = NULL;
>> @@ -492,71 +571,52 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
>> DirtyStat.start_time = start_time / 1000;
>>
>> msec = config.sample_period_seconds * 1000;
>> - msec = set_sample_page_period(msec, start_time);
>> + msec = dirty_stat_wait(msec, start_time);
>> DirtyStat.calc_time = msec / 1000;
>>
>> /*
>> - * dirtyrate_global_dirty_log_stop do two things.
>> + * do two things.
>> * 1. fetch dirty bitmap from kvm
>> * 2. stop dirty tracking
>> */
>> - dirtyrate_global_dirty_log_stop();
>> + global_dirty_log_sync(GLOBAL_DIRTY_DIRTY_RATE, true);
>>
>> record_dirtypages_bitmap(&dirty_pages, false);
>>
>> - do_calculate_dirtyrate_bitmap(dirty_pages);
>> + DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec);
>> }
>>
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-17 7:32 ` Peter Xu
` (2 preceding siblings ...)
2022-01-21 8:07 ` Hyman Huang
@ 2022-01-22 3:54 ` Hyman Huang
2022-01-24 3:10 ` Peter Xu
2022-01-24 4:20 ` Hyman Huang
4 siblings, 1 reply; 33+ messages in thread
From: Hyman Huang @ 2022-01-22 3:54 UTC (permalink / raw)
To: Peter Xu
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
>> +static void *dirtylimit_thread(void *opaque)
>> +{
>> + CPUState *cpu;
>> +
>> + rcu_register_thread();
>> +
>> + while (!qatomic_read(&dirtylimit_quit)) {
>> + sleep(DIRTYLIMIT_CALC_TIME_MS / 1000);
>
> Sorry to have not mentioned this: I think we probably don't even need this
> dirtylimit thread.
>
> It'll be hard to make the "sleep" right here.. you could read two identical
> values from the dirty calc thread because the 1sec sleep is not accurate, so
> even after this sleep() the calc thread may not have provided the latest number
> yet.
>
> It'll be much cleaner (and most importantly, accurate..) to me if we could make
> this a hook function being passed over to the vcpu_dirty_rate_stat_thread()
> thread, then after each vcpu_dirty_rate_stat_collect() we call the hook.
>
Ok, i remove the dirtylimit_thread and implemtment throttle in bottom
half instead, indeed, it become more accurate. Anyway, how do you think
of it?
>> +
>> + dirtylimit_state_lock();
>> +
>> + if (!dirtylimit_in_service()) {
>> + dirtylimit_state_unlock();
>> + break;
>> + }
>> +
>> + CPU_FOREACH(cpu) {
>> + if (!dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled) {
>> + continue;
>> + }
>> + dirtylimit_adjust_throttle(cpu);
>> + }
>> + dirtylimit_state_unlock();
>> + }
>> +
>> + rcu_unregister_thread();
>> +
>> + return NULL;
>> +}
>> +
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 1/4] migration/dirtyrate: refactor dirty page rate calculation
2022-01-22 3:22 ` Hyman Huang
@ 2022-01-24 3:08 ` Peter Xu
2022-01-24 9:36 ` Hyman Huang
0 siblings, 1 reply; 33+ messages in thread
From: Peter Xu @ 2022-01-24 3:08 UTC (permalink / raw)
To: Hyman Huang
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Sat, Jan 22, 2022 at 11:22:37AM +0800, Hyman Huang wrote:
>
>
> 在 2022/1/17 10:19, Peter Xu 写道:
> > On Wed, Jan 05, 2022 at 01:14:06AM +0800, huangy81@chinatelecom.cn wrote:
> > > From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> > >
> > > +
> > > +static void vcpu_dirty_stat_collect(VcpuStat *stat,
> > > + DirtyPageRecord *records,
> > > + bool start)
> > > +{
> > > + CPUState *cpu;
> > > +
> > > + CPU_FOREACH(cpu) {
> > > + if (!start && cpu->cpu_index >= stat->nvcpu) {
> > > + /*
> > > + * Never go there unless cpu is hot-plugged,
> > > + * just ignore in this case.
> > > + */
> > > + continue;
> > > + }
> >
> > As commented before, I think the only way to do it right is does not allow cpu
> > plug/unplug during measurement..
> >
> > Say, even if index didn't get out of range, an unplug even should generate very
> > stange output of the unplugged cpu. Please see more below.
> >
> > > + record_dirtypages(records, cpu, start);
> > > + }
> > > +}
> > > +
> > > +int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
> > > + int64_t init_time_ms,
> > > + VcpuStat *stat,
> > > + unsigned int flag,
> > > + bool one_shot)
> > > +{
> > > + DirtyPageRecord *records;
> > > + int64_t duration;
> > > + int64_t dirtyrate;
> > > + int i = 0;
> > > +
> > > + cpu_list_lock();
> > > + records = vcpu_dirty_stat_alloc(stat);
> > > + vcpu_dirty_stat_collect(stat, records, true);
> > > + cpu_list_unlock();
> >
> > Continue with above - then I'm wondering whether we should just keep taking the
> > lock until vcpu_dirty_stat_collect().
> >
> > Yes we could be taking the lock for a long time because of the sleep, but the
> > main thread plug thread will just wait for it to complete and it is at least
> > not a e.g. deadlock.
> >
> > The other solution is we do cpu_list_unlock() like this, but introduce another
> > cpu_list_generation_id and boost it after any plug/unplug of cpu, aka, when cpu
> > list changes.
> >
> > Then we record cpu generation ID at the entry of this function and retry the
> > whole measurement if at some point we found generation ID changed (we need to
> > fetch the gen ID after having the lock, of course). That could avoid us taking
> > the cpu list lock during dirty_stat_wait(), but it'll start to complicate cpu
> > list locking rules.
> >
> > The simpler way is still just to take the lock, imho.
> >
> Hi, Peter, i'm working on this as you suggetion, and keep taking the
> cpu_list_lock during dirty page rate calculation. I found the deadlock when
> testing hotplug scenario, the logic is as the following:
>
> calc thread qemu main thread
> 1. take qemu_cpu_list_lock
> 1. take the BQL
> 2. collect dirty page and wait 2. cpu hotplug
> 3. take qemu_cpu_list_lock
> 3. take the BQL
>
> 4. sync dirty log
>
> 5. release the BQL
>
> I just recall that is one of the reasons why i handle the plug/unplug
> scenario(another is cpu plug may wait a little bit long time when dirtylimit
> in service).
Ah I should have noticed the bql dependency with cpu list lock before..
I think having the cpu plug waiting for one sec is fine, because the mgmt app
should be aware of both so it shouldn't even happen in practise (not good
timing to plug during pre-migration). However bql is definitely another
story.. which I agree.
>
> It seems that we have two strategies, one is just keep this logic untouched
> in v12 and add "cpu_list_generation_id" implementaion in TODO list(once this
> patchset been merged, i'll try that out), another is introducing the
> "cpu_list_generation_id" right now.
>
> What strategy do you prefer to?
I prefer having the gen_id patch. The thing is it should be less than 10 lines
and the logic should be fairly straightforward. While if without it, it seems
always on risk to use this new feature.
I hope I didn't overlook any existing mechanism to block cpu plug/unplug for
some period, though, or we should use it.
>
> Uh... I think the "unmatched_cnt" also kind of like this too, becauce once
> we remove the "unmatched count" logic, the throttle algo is more likely to
> oscillate and i prefer to add the "unmatched_cnt" in TODO list as above.
Could we tune the differential factor to make it less possible to oscillate?
I still can't say I like "unmatched cnt" idea a lot.. From a PID pov (partial,
integral, differential) you've already got partial + differential, and IMHO
that "unmatched cnt" solution was trying to mimic an "integral" delta. Instead
of doing an mean value calculation (as in most integral system does) the
"unmatched cnt" solution literally made it an array of 2 and it dropped the 1st
element.. Hence a decision was made only from the 2nd data you collected.
From that POV I think it's cleaner you add a real (but simple) integral algo
into it? It can be e.g. an array of 3, then when you do the math you use the
average of the three dirty rates. Would that work (and also look a bit
cleaner)?
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-22 3:54 ` Hyman Huang
@ 2022-01-24 3:10 ` Peter Xu
0 siblings, 0 replies; 33+ messages in thread
From: Peter Xu @ 2022-01-24 3:10 UTC (permalink / raw)
To: Hyman Huang
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
On Sat, Jan 22, 2022 at 11:54:07AM +0800, Hyman Huang wrote:
>
> > > +static void *dirtylimit_thread(void *opaque)
> > > +{
> > > + CPUState *cpu;
> > > +
> > > + rcu_register_thread();
> > > +
> > > + while (!qatomic_read(&dirtylimit_quit)) {
> > > + sleep(DIRTYLIMIT_CALC_TIME_MS / 1000);
> >
> > Sorry to have not mentioned this: I think we probably don't even need this
> > dirtylimit thread.
> >
> > It'll be hard to make the "sleep" right here.. you could read two identical
> > values from the dirty calc thread because the 1sec sleep is not accurate, so
> > even after this sleep() the calc thread may not have provided the latest number
> > yet.
> >
> > It'll be much cleaner (and most importantly, accurate..) to me if we could make
> > this a hook function being passed over to the vcpu_dirty_rate_stat_thread()
> > thread, then after each vcpu_dirty_rate_stat_collect() we call the hook
Another cut-off email? Please try again?.. :)
--
Peter Xu
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle
2022-01-17 7:32 ` Peter Xu
` (3 preceding siblings ...)
2022-01-22 3:54 ` Hyman Huang
@ 2022-01-24 4:20 ` Hyman Huang
4 siblings, 0 replies; 33+ messages in thread
From: Hyman Huang @ 2022-01-24 4:20 UTC (permalink / raw)
To: Peter Xu
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
在 2022/1/17 15:32, Peter Xu 写道:
> On Wed, Jan 05, 2022 at 01:14:08AM +0800, huangy81@chinatelecom.cn wrote:
>> ##
>> +# @DirtyLimitInfo:
>> +#
>> +# Dirty page rate limit information of virtual CPU.
>> +#
>> +# @cpu-index: index of virtual CPU.
>> +#
>> +# @limit-rate: upper limit of dirty page rate for virtual CPU.
>> +#
>> +# @current-rate: current dirty page rate for virtual CPU.
>
> Please consider spell out the unit too for all these dirty rate fields (MB/s).
>
>> +#
>> +# Since: 7.0
>> +#
>> +##
>> +{ 'struct': 'DirtyLimitInfo',
>> + 'data': { 'cpu-index': 'int',
>> + 'limit-rate': 'int64',
>> + 'current-rate': 'int64' } }
>> +
>> +##
>> # @snapshot-save:
>> #
>> # Save a VM snapshot
>> diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
>> index a10ac6f..c9f5745 100644
>> --- a/softmmu/dirtylimit.c
>> +++ b/softmmu/dirtylimit.c
>> @@ -18,6 +18,26 @@
>> #include "sysemu/dirtylimit.h"
>> #include "exec/memory.h"
>> #include "hw/boards.h"
>> +#include "sysemu/kvm.h"
>> +#include "trace.h"
>> +
>> +/*
>> + * Dirtylimit stop working if dirty page rate error
>> + * value less than DIRTYLIMIT_TOLERANCE_RANGE
>> + */
>> +#define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
>> +/*
>> + * Plus or minus vcpu sleep time linearly if dirty
>> + * page rate error value percentage over
>> + * DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT.
>> + * Otherwise, plus or minus a fixed vcpu sleep time.
>> + */
>> +#define DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT 50
>> +/*
>> + * Max vcpu sleep time percentage during a cycle
>> + * composed of dirty ring full and sleep time.
>> + */
>> +#define DIRTYLIMIT_THROTTLE_PCT_MAX 99
>
> (Thanks for the enriched comments)
>
>> +static inline void dirtylimit_vcpu_set_quota(int cpu_index,
>> + uint64_t quota,
>> + bool on)
>> +{
>> + dirtylimit_state->states[cpu_index].quota = quota;
>
> To be clear, we could move this line into the "(on)" if condition, then in !on
> case we reset it.
>
>> + if (on) {
>> + if (!dirtylimit_vcpu_get_state(cpu_index)->enabled) {
>> + dirtylimit_state->limited_nvcpu++;
>> + }
>> + } else {
>> + if (dirtylimit_state->states[cpu_index].enabled) {
>> + dirtylimit_state->limited_nvcpu--;
>> + }
>> + }
>> +
>> + dirtylimit_state->states[cpu_index].enabled = on;
>> +}
>> +
>> +static inline int64_t dirtylimit_dirty_ring_full_time(uint64_t dirtyrate)
>> +{
>> + static uint64_t max_dirtyrate;
>> + uint32_t dirty_ring_size = kvm_dirty_ring_size();
>> + uint64_t dirty_ring_size_meory_MB =
>> + dirty_ring_size * TARGET_PAGE_SIZE >> 20;
>> +
>> + if (max_dirtyrate < dirtyrate) {
>> + max_dirtyrate = dirtyrate;
>> + }
>> +
>> + return dirty_ring_size_meory_MB * 1000000 / max_dirtyrate;
>> +}
>> +
>> +static inline bool dirtylimit_done(uint64_t quota,
>> + uint64_t current)
>> +{
>> + uint64_t min, max;
>> +
>> + min = MIN(quota, current);
>> + max = MAX(quota, current);
>> +
>> + return ((max - min) <= DIRTYLIMIT_TOLERANCE_RANGE) ? true : false;
>> +}
>> +
>> +static inline bool
>> +dirtylimit_need_linear_adjustment(uint64_t quota,
>> + uint64_t current)
>> +{
>> + uint64_t min, max, pct;
>> +
>> + min = MIN(quota, current);
>> + max = MAX(quota, current);
>> +
>> + pct = (max - min) * 100 / max;
>> +
>> + return pct > DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT;
>> +}
>> +
>> +static void dirtylimit_set_throttle(CPUState *cpu,
>> + uint64_t quota,
>> + uint64_t current)
>> +{
>> + int64_t ring_full_time_us = 0;
>> + uint64_t sleep_pct = 0;
>> + uint64_t throttle_us = 0;
>> +
>> + ring_full_time_us = dirtylimit_dirty_ring_full_time(current);
>> +
>> + if (dirtylimit_need_linear_adjustment(quota, current)) {
>> + if (quota < current) {
>> + sleep_pct = (current - quota) * 100 / current;
>> + throttle_us =
>> + ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
>> + cpu->throttle_us_per_full += throttle_us;
>> + } else {
>> + sleep_pct = (quota - current) * 100 / quota;
>> + throttle_us =
>> + ring_full_time_us * sleep_pct / (double)(100 - sleep_pct);
>> + cpu->throttle_us_per_full -= throttle_us;
>> + }
>> +
>> + trace_dirtylimit_throttle_pct(cpu->cpu_index,
>> + sleep_pct,
>> + throttle_us);
>> + } else {
>> + if (quota < current) {
>> + cpu->throttle_us_per_full += ring_full_time_us / 10;
>> + } else {
>> + cpu->throttle_us_per_full -= ring_full_time_us / 10;
>> + }
>> + }
>> +
>> + cpu->throttle_us_per_full = MIN(cpu->throttle_us_per_full,
>> + ring_full_time_us * DIRTYLIMIT_THROTTLE_PCT_MAX);
>> +
>> + cpu->throttle_us_per_full = MAX(cpu->throttle_us_per_full, 0);
>> +}
>
> This algorithm seems works even worse than the previous version, could you have
> a look on what's wrong?
>
> See how it fluctuates when I set a throttle of 300MB/s:
>
> (QMP) set-vcpu-dirty-limit dirty-rate=300
>
> Dirty rate: 17622 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 17617 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 17611 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 13023 (MB/s), duration: 1153 (ms), load: 100.00%
> Dirty rate: 923 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 2853 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 1963 (MB/s), duration: 1040 (ms), load: 100.00%
> Dirty rate: 180 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 182 (MB/s), duration: 1007 (ms), load: 100.00%
> Dirty rate: 177 (MB/s), duration: 1005 (ms), load: 100.00%
> Dirty rate: 181 (MB/s), duration: 1007 (ms), load: 100.00%
> Dirty rate: 179 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 168 (MB/s), duration: 1005 (ms), load: 100.00%
> Dirty rate: 169 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 2717 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 2851 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 1773 (MB/s), duration: 1021 (ms), load: 100.00%
> Dirty rate: 177 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 179 (MB/s), duration: 1006 (ms), load: 100.00%
> Dirty rate: 175 (MB/s), duration: 1005 (ms), load: 100.00%
> Dirty rate: 1973 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 2878 (MB/s), duration: 1000 (ms), load: 100.00%
> Dirty rate: 1690 (MB/s), duration: 1022 (ms), load: 100.00%
> Dirty rate: 174 (MB/s), duration: 1005 (ms), load: 100.00%
> Dirty rate: 184 (MB/s), duration: 1006 (ms), load: 100.00%
>
> This is the tool I'm using:
>
> https://github.com/xzpeter/mig_mon#memory-dirty
>
> Again, I won't ask for a good algorithm as the 1st version, but then I think
> it's nicer we have the simplest algorithm merged first, which should be very
> easy to verify.
>
>> +
>> +static void dirtylimit_adjust_throttle(CPUState *cpu)
>> +{
>> + uint64_t quota = 0;
>> + uint64_t current = 0;
>> + int cpu_index = cpu->cpu_index;
>> +
>> + quota = dirtylimit_vcpu_get_state(cpu_index)->quota;
>> + current = vcpu_dirty_rate_get(cpu_index);
>> +
>> + if (current == 0 &&
>> + dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt == 0) {
>> + cpu->throttle_us_per_full = 0;
>> + goto end;
>> + } else if (++dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt
>> + < 2) {
>> + goto end;
>> + } else if (dirtylimit_done(quota, current)) {
>> + goto end;
>> + } else {
>> + dirtylimit_vcpu_get_state(cpu_index)->unmatched_cnt = 0;
>> + dirtylimit_set_throttle(cpu, quota, current);
>> + }
>> +end:
>> + trace_dirtylimit_adjust_throttle(cpu_index,
>> + quota, current,
>> + cpu->throttle_us_per_full);
>> + return;
>> +}
>> +
>> +static void *dirtylimit_thread(void *opaque)
>> +{
>> + CPUState *cpu;
>> +
>> + rcu_register_thread();
>> +
>> + while (!qatomic_read(&dirtylimit_quit)) {
>> + sleep(DIRTYLIMIT_CALC_TIME_MS / 1000);
>
> Sorry to have not mentioned this: I think we probably don't even need this
> dirtylimit thread.
>
> It'll be hard to make the "sleep" right here.. you could read two identical
> values from the dirty calc thread because the 1sec sleep is not accurate, so
> even after this sleep() the calc thread may not have provided the latest number
> yet.
>
> It'll be much cleaner (and most importantly, accurate..) to me if we could make
> this a hook function being passed over to the vcpu_dirty_rate_stat_thread()
> thread, then after each vcpu_dirty_rate_stat_collect() we call the hook.
Ok, i remove the dirtylimit_thread and implemtment throttle in bottom
half instead, indeed, it become more accurate. What do you think of it?
>
>> +
>> + dirtylimit_state_lock();
>> +
>> + if (!dirtylimit_in_service()) {
>> + dirtylimit_state_unlock();
>> + break;
>> + }
>> +
>> + CPU_FOREACH(cpu) {
>> + if (!dirtylimit_vcpu_get_state(cpu->cpu_index)->enabled) {
>> + continue;
>> + }
>> + dirtylimit_adjust_throttle(cpu);
>> + }
>> + dirtylimit_state_unlock();
>> + }
>> +
>> + rcu_unregister_thread();
>> +
>> + return NULL;
>> +}
>> +
>> +static void dirtylimit_thread_start(void)
>> +{
>> + qatomic_set(&dirtylimit_quit, 0);
>> + qemu_thread_create(&dirtylimit_thr,
>> + "dirtylimit",
>> + dirtylimit_thread,
>> + NULL,
>> + QEMU_THREAD_JOINABLE);
>> +}
>> +
>> +static void dirtylimit_thread_stop(void)
>> +{
>> + qatomic_set(&dirtylimit_quit, 1);
>> + qemu_mutex_unlock_iothread();
>> + qemu_thread_join(&dirtylimit_thr);
>> + qemu_mutex_lock_iothread();
>> +}
>> +
>> +void dirtylimit_set_vcpu(int cpu_index,
>> + uint64_t quota,
>> + bool enable)
>> +{
>> + trace_dirtylimit_set_vcpu(cpu_index, quota);
>> +
>> + if (enable) {
>> + if (dirtylimit_in_service()) {
>> + /* only set the vcpu dirty page rate limit */
>> + dirtylimit_vcpu_set_quota(cpu_index, quota, true);
>> + return;
>> + }
>> +
>> + /* initialize state when set dirtylimit first time */
>> + dirtylimit_state_lock();
>> + dirtylimit_state_initialize();
>> + dirtylimit_vcpu_set_quota(cpu_index, quota, true);
>> + dirtylimit_state_unlock();
>> +
>> + dirtylimit_thread_start();
>
> Can we move dirtylimit global initializations out of dirtylimit_set_vcpu() too?
> We should always keep init/cleanup of dirty_rate_calc and dirtylimit together,
> because they should, imho. We never enable one of them.
>
> I commented similarly in previous version on this.
>
>> + } else {
>> + if (!dirtylimit_in_service()) {
>> + return;
>> + }
>> +
>> + dirtylimit_state_lock();
>> + /* dirty page rate limit is not enabled */
>> + if (!dirtylimit_vcpu_get_state(cpu_index)->enabled) {
>> + dirtylimit_state_unlock();
>> + return;
>> + }
>> +
>> + /* switch off vcpu dirty page rate limit */
>> + dirtylimit_vcpu_set_quota(cpu_index, 0, false);
>> + dirtylimit_state_unlock();
>> +
>> + if (!dirtylimit_state->limited_nvcpu) {
>> + dirtylimit_thread_stop();
>> +
>> + dirtylimit_state_lock();
>> + dirtylimit_state_finalize();
>> + dirtylimit_state_unlock();
>
> We don't need such a fine control of locking, IMHO.. it can be a very big lock
> just to serialize things..
>
> IMHO it could be as simple as:
>
> void dirtylimit_set_vcpu(int cpu_index,
> uint64_t quota,
> bool enable)
> {
> dirtylimit_vcpu_set_quota(cpu_index, quota, enable);
> trace_dirtylimit_set_vcpu(cpu_index, quota);
> }
>
> void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
> uint64_t cpu_index,
> uint64_t dirty_rate,
> Error **errp)
> {
> if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
> error_setg(errp, "dirty page limit feature requires KVM with"
> " accelerator property 'dirty-ring-size' set'");
> return;
> }
>
> if (has_cpu_index && !dirtylimit_vcpu_index_valid(cpu_index)) {
> error_setg(errp, "incorrect cpu index specified");
> return;
> }
>
> dirtylimit_state_lock();
>
> if (!dirtylimit_in_service()) {
> /* TODO: we could have one helper to initialize all of them */
> vcpu_dirty_rate_stat_initialize();
> vcpu_dirty_rate_stat_start();
> dirtylimit_state_initialize();
> dirtylimit_vcpu_set_quota(cpu_index, quota, true);
> }
>
> if (has_cpu_index) {
> dirtylimit_set_vcpu(cpu_index, dirty_rate, true);
> } else {
> dirtylimit_set_all(dirty_rate, true);
> }
>
> dirtylimit_state_unlock();
> }
>
> I didn't write the cleanup path, but it's the same: we should only cleanup all
> the global structs in cancel-vcpu-dirty-limit when we found there's zero vcpus
> in track, and it should be done once there.
>
> Thanks,
>
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v11 1/4] migration/dirtyrate: refactor dirty page rate calculation
2022-01-24 3:08 ` Peter Xu
@ 2022-01-24 9:36 ` Hyman Huang
0 siblings, 0 replies; 33+ messages in thread
From: Hyman Huang @ 2022-01-24 9:36 UTC (permalink / raw)
To: Peter Xu
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, qemu-devel, Markus ArmBruster, Paolo Bonzini,
Philippe Mathieu-Daudé,
Dr. David Alan Gilbert
在 2022/1/24 11:08, Peter Xu 写道:
> On Sat, Jan 22, 2022 at 11:22:37AM +0800, Hyman Huang wrote:
>>
>>
>> 在 2022/1/17 10:19, Peter Xu 写道:
>>> On Wed, Jan 05, 2022 at 01:14:06AM +0800, huangy81@chinatelecom.cn wrote:
>>>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>>>
>>>> +
>>>> +static void vcpu_dirty_stat_collect(VcpuStat *stat,
>>>> + DirtyPageRecord *records,
>>>> + bool start)
>>>> +{
>>>> + CPUState *cpu;
>>>> +
>>>> + CPU_FOREACH(cpu) {
>>>> + if (!start && cpu->cpu_index >= stat->nvcpu) {
>>>> + /*
>>>> + * Never go there unless cpu is hot-plugged,
>>>> + * just ignore in this case.
>>>> + */
>>>> + continue;
>>>> + }
>>>
>>> As commented before, I think the only way to do it right is does not allow cpu
>>> plug/unplug during measurement..
>>>
>>> Say, even if index didn't get out of range, an unplug even should generate very
>>> stange output of the unplugged cpu. Please see more below.
>>>
>>>> + record_dirtypages(records, cpu, start);
>>>> + }
>>>> +}
>>>> +
>>>> +int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
>>>> + int64_t init_time_ms,
>>>> + VcpuStat *stat,
>>>> + unsigned int flag,
>>>> + bool one_shot)
>>>> +{
>>>> + DirtyPageRecord *records;
>>>> + int64_t duration;
>>>> + int64_t dirtyrate;
>>>> + int i = 0;
>>>> +
>>>> + cpu_list_lock();
>>>> + records = vcpu_dirty_stat_alloc(stat);
>>>> + vcpu_dirty_stat_collect(stat, records, true);
>>>> + cpu_list_unlock();
>>>
>>> Continue with above - then I'm wondering whether we should just keep taking the
>>> lock until vcpu_dirty_stat_collect().
>>>
>>> Yes we could be taking the lock for a long time because of the sleep, but the
>>> main thread plug thread will just wait for it to complete and it is at least
>>> not a e.g. deadlock.
>>>
>>> The other solution is we do cpu_list_unlock() like this, but introduce another
>>> cpu_list_generation_id and boost it after any plug/unplug of cpu, aka, when cpu
>>> list changes.
>>>
>>> Then we record cpu generation ID at the entry of this function and retry the
>>> whole measurement if at some point we found generation ID changed (we need to
>>> fetch the gen ID after having the lock, of course). That could avoid us taking
>>> the cpu list lock during dirty_stat_wait(), but it'll start to complicate cpu
>>> list locking rules.
>>>
>>> The simpler way is still just to take the lock, imho.
>>>
>> Hi, Peter, i'm working on this as you suggetion, and keep taking the
>> cpu_list_lock during dirty page rate calculation. I found the deadlock when
>> testing hotplug scenario, the logic is as the following:
>>
>> calc thread qemu main thread
>> 1. take qemu_cpu_list_lock
>> 1. take the BQL
>> 2. collect dirty page and wait 2. cpu hotplug
>> 3. take qemu_cpu_list_lock
>> 3. take the BQL
>>
>> 4. sync dirty log
>>
>> 5. release the BQL
>>
>> I just recall that is one of the reasons why i handle the plug/unplug
>> scenario(another is cpu plug may wait a little bit long time when dirtylimit
>> in service).
>
> Ah I should have noticed the bql dependency with cpu list lock before..
>
> I think having the cpu plug waiting for one sec is fine, because the mgmt app
> should be aware of both so it shouldn't even happen in practise (not good
> timing to plug during pre-migration). However bql is definitely another
> story.. which I agree.
>
>>
>> It seems that we have two strategies, one is just keep this logic untouched
>> in v12 and add "cpu_list_generation_id" implementaion in TODO list(once this
>> patchset been merged, i'll try that out), another is introducing the
>> "cpu_list_generation_id" right now.
>>
>> What strategy do you prefer to?
>
> I prefer having the gen_id patch. The thing is it should be less than 10 lines
> and the logic should be fairly straightforward. While if without it, it seems
> always on risk to use this new feature.
>
> I hope I didn't overlook any existing mechanism to block cpu plug/unplug for
> some period, though, or we should use it.
>
>>
>> Uh... I think the "unmatched_cnt" also kind of like this too, becauce once
>> we remove the "unmatched count" logic, the throttle algo is more likely to
>> oscillate and i prefer to add the "unmatched_cnt" in TODO list as above.
>
> Could we tune the differential factor to make it less possible to oscillate?
> Uh... From certain angles, yes. When current dirty pate rate is nearly
close to quota when dirtylimit in service, throttle achieve balance.
Once the current dirty page rate arise a slight fluctuation(not much
oscillation), sleep time be adjusted which actually can be ignored.
> I still can't say I like "unmatched cnt" idea a lot.. From a PID pov (partial,
> integral, differential) you've already got partial + differential, and IMHO
> that "unmatched cnt" solution was trying to mimic an "integral" delta. Instead
> of doing an mean value calculation (as in most integral system does) the
> "unmatched cnt" solution literally made it an array of 2 and it dropped the 1st
> element.. Hence a decision was made only from the 2nd data you collected.
>
> From that POV I think it's cleaner you add a real (but simple) integral algo
> into it? It can be e.g. an array of 3, then when you do the math you use the
> average of the three dirty rates. Would that work (and also look a bit
> cleaner)?
Yes, IMHO this is a more complete algo and we can try it out. So, let's
see the v12 test result and decide whether above work should added to
TODO list. :)
>
> Thanks,
>
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2022-01-24 9:39 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-04 17:14 [PATCH v11 0/4] support dirty restraint on vCPU huangy81
[not found] ` <cover.1641316375.git.huangy81@chinatelecom.cn>
2022-01-04 17:14 ` [PATCH v11 1/4] migration/dirtyrate: refactor dirty page rate calculation huangy81
2022-01-17 2:19 ` Peter Xu
2022-01-22 3:22 ` Hyman Huang
2022-01-24 3:08 ` Peter Xu
2022-01-24 9:36 ` Hyman Huang
2022-01-04 17:14 ` [PATCH v11 2/4] softmmu/dirtylimit: implement vCPU dirtyrate calculation periodically huangy81
2022-01-17 2:31 ` Peter Xu
2022-01-04 17:14 ` [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle huangy81
2022-01-17 7:32 ` Peter Xu
2022-01-17 14:00 ` Hyman Huang
2022-01-18 1:00 ` Peter Xu
2022-01-18 2:08 ` Hyman Huang
2022-01-20 8:26 ` Hyman Huang
2022-01-20 9:25 ` Peter Xu
2022-01-20 10:03 ` Hyman Huang
2022-01-20 10:58 ` Peter Xu
2022-01-20 10:39 ` Hyman Huang
2022-01-20 10:56 ` Peter Xu
2022-01-20 11:03 ` Hyman Huang
2022-01-20 11:13 ` Peter Xu
2022-01-21 8:07 ` Hyman Huang
2022-01-21 9:19 ` Peter Xu
2022-01-22 3:54 ` Hyman Huang
2022-01-24 3:10 ` Peter Xu
2022-01-24 4:20 ` Hyman Huang
2022-01-17 9:01 ` Peter Xu
2022-01-19 12:16 ` Markus Armbruster
2022-01-20 11:22 ` Hyman Huang
2022-01-04 17:14 ` [PATCH v11 4/4] softmmu/dirtylimit: implement dirty page rate limit huangy81
2022-01-17 7:35 ` Peter Xu
2022-01-19 12:16 ` Markus Armbruster
2022-01-17 8:54 ` [PATCH v11 0/4] support dirty restraint on vCPU Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).