From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9B76EC433F5 for ; Sat, 2 Apr 2022 07:05:43 +0000 (UTC) Received: from localhost ([::1]:53980 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1naXpO-0005su-6w for qemu-devel@archiver.kernel.org; Sat, 02 Apr 2022 03:05:42 -0400 Received: from eggs.gnu.org ([209.51.188.92]:38220) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1naXoD-00058U-0g for qemu-devel@nongnu.org; Sat, 02 Apr 2022 03:04:30 -0400 Received: from prt-mail.chinatelecom.cn ([42.123.76.226]:33720 helo=chinatelecom.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1naXo7-0004X2-W0 for qemu-devel@nongnu.org; Sat, 02 Apr 2022 03:04:27 -0400 HMM_SOURCE_IP: 172.18.0.188:56194.850146068 HMM_ATTACHE_NUM: 0000 HMM_SOURCE_TYPE: SMTP Received: from clientip-36.111.64.85 (unknown [172.18.0.188]) by chinatelecom.cn (HERMES) with SMTP id 4AF9E2800CD; Sat, 2 Apr 2022 15:04:01 +0800 (CST) X-189-SAVE-TO-SEND: huangy81@chinatelecom.cn Received: from ([172.18.0.188]) by app0023 with ESMTP id 8363f005eb054d0aab7bb198b8d024ff for wucy11@chinatelecom.cn; Sat, 02 Apr 2022 15:04:12 CST X-Transaction-ID: 8363f005eb054d0aab7bb198b8d024ff X-Real-From: huangy81@chinatelecom.cn X-Receive-IP: 172.18.0.188 X-MEDUSA-Status: 0 Message-ID: <80aa611e-55da-e76c-d09b-bda3a94f3169@chinatelecom.cn> Date: Sat, 2 Apr 2022 15:04:00 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v2 1/4] kvm: Dynamically adjust the rate of dirty ring reaper thread To: wucy11@chinatelecom.cn, qemu-devel@nongnu.org References: <7e786b6ab74e0c62661176fa7aec243c7b9bea8d.1648091540.git.wucy11@chinatelecom.cn> From: Hyman Huang In-Reply-To: <7e786b6ab74e0c62661176fa7aec243c7b9bea8d.1648091540.git.wucy11@chinatelecom.cn> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=42.123.76.226; envelope-from=huangy81@chinatelecom.cn; helo=chinatelecom.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, NICE_REPLY_A=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: tugy@chinatelecom.cn, David Hildenbrand , yuanmh12@chinatelecom.cn, Juan Quintela , Richard Henderson , "Dr. David Alan Gilbert" , Peter Xu , f4bug@amsat.org, dengpc12@chinatelecom.cn, Paolo Bonzini , baiyw2@chinatelecom.cn Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" 在 2022/3/28 9:32, wucy11@chinatelecom.cn 写道: > From: Chongyun Wu > > Dynamically adjust the dirty ring collection thread to > reduce the occurrence of ring full, thereby reducing the > impact on customers, improving the efficiency of dirty > page collection, and thus improving the migration efficiency. > > Implementation: > 1) Define different collection speeds for the reap thread. > > 2) Divide the total number of dirty pages collected each > time by the ring size to get a ratio which indicates the > occupancy rate of dirty pages in the ring. The higher the > ratio, the higher the possibility that the ring will be full. > > 3) Different ratios correspond to different running speeds. > A higher ratio value indicates that a higher running speed > is required to collect dirty pages as soon as possible to > ensure that too many ring fulls will not be generated, > which will affect the customer's business. > > This patch can significantly reduce the number of ring full > occurrences in the case of high memory dirty page pressure, > and minimize the impact on guests. > Increase the frequency of reaping dirty ring can reduce the guest vcpu block time obviously and consequently improve the guest memory performance. But this also make the write-memory vcpu run more time and dirty more memory, so the migration time may become longer. Maybe we should also focus on the migraiton time and compare with traditional algo. > Using this patch for the qeum guestperf test, the memory > performance during the migration process is somewhat improved > compared to the bitmap method, and is significantly improved > compared to the unoptimized dirty ring method. For detailed > test data, please refer to the follow-up series of patches. > > Signed-off-by: Chongyun Wu > --- > accel/kvm/kvm-all.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 144 insertions(+), 5 deletions(-) > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > index 27864df..65a4de8 100644 > --- a/accel/kvm/kvm-all.c > +++ b/accel/kvm/kvm-all.c > @@ -91,6 +91,27 @@ enum KVMDirtyRingReaperState { > KVM_DIRTY_RING_REAPER_REAPING, > }; > > +enum KVMDirtyRingReaperRunLevel { > + /* The reaper runs at default normal speed */ > + KVM_DIRTY_RING_REAPER_RUN_NORMAL = 0, > + /* The reaper starts to accelerate in different gears */ > + KVM_DIRTY_RING_REAPER_RUN_FAST1, > + KVM_DIRTY_RING_REAPER_RUN_FAST2, > + KVM_DIRTY_RING_REAPER_RUN_FAST3, > + KVM_DIRTY_RING_REAPER_RUN_FAST4, > + /* The reaper runs at the fastest speed */ > + KVM_DIRTY_RING_REAPER_RUN_MAX_SPEED, > +}; > + > +enum KVMDirtyRingReaperSpeedControl { > + /* Maintain current speed */ > + KVM_DIRTY_RING_REAPER_SPEED_CONTROL_KEEP = 0, > + /* Accelerate current speed */ > + KVM_DIRTY_RING_REAPER_SPEED_CONTROL_UP, > + /* Decrease current speed */ > + KVM_DIRTY_RING_REAPER_SPEED_CONTROL_DOWN > +}; > + > /* > * KVM reaper instance, responsible for collecting the KVM dirty bits > * via the dirty ring. > @@ -100,6 +121,11 @@ struct KVMDirtyRingReaper { > QemuThread reaper_thr; > volatile uint64_t reaper_iteration; /* iteration number of reaper thr */ > volatile enum KVMDirtyRingReaperState reaper_state; /* reap thr state */ > + /* Control the running speed of the reaper thread to fit dirty page rate */ > + enum KVMDirtyRingReaperRunLevel run_level; > + uint64_t ring_full_cnt; > + float ratio_adjust_threshold; > + int stable_count_threshold; Could you add some comments about the introduced field? > }; > > struct KVMState > @@ -1449,11 +1475,115 @@ out: > kvm_slots_unlock(); > } > [...] > +static uint64_t calcu_sleep_time(KVMState *s, > + uint64_t dirty_count, > + uint64_t ring_full_cnt_last, > + uint32_t *speed_down_cnt) Code isn't aligned > +{ > + float ratio = 0.0; > + uint64_t sleep_time = 1000000; > + enum KVMDirtyRingReaperRunLevel run_level_want; > + enum KVMDirtyRingReaperSpeedControl speed_control; > + > + /* > + * When the number of dirty pages collected exceeds > + * the given percentage of the ring size,the speed > + * up action will be triggered. > + */ > + s->reaper.ratio_adjust_threshold = 0.1; > + s->reaper.stable_count_threshold = 5; > + > + ratio = (float)dirty_count / s->kvm_dirty_ring_size; > + > + if (s->reaper.ring_full_cnt > ring_full_cnt_last) { > + /* If get a new ring full need speed up reaper thread */ > + if (s->reaper.run_level != KVM_DIRTY_RING_REAPER_RUN_MAX_SPEED) { > + s->reaper.run_level++; > + } > + } else { > + /* > + * If get more dirty pages this loop and this status continus > + * for many times try to speed up reaper thread. > + * If the status is stable and need to decide which speed need > + * to use. > + */ > + if (ratio < s->reaper.ratio_adjust_threshold) { > + run_level_want = KVM_DIRTY_RING_REAPER_RUN_NORMAL; > + } else if (ratio < s->reaper.ratio_adjust_threshold * 2) { > + run_level_want = KVM_DIRTY_RING_REAPER_RUN_FAST1; > + } else if (ratio < s->reaper.ratio_adjust_threshold * 3) { > + run_level_want = KVM_DIRTY_RING_REAPER_RUN_FAST2; > + } else if (ratio < s->reaper.ratio_adjust_threshold * 4) { > + run_level_want = KVM_DIRTY_RING_REAPER_RUN_FAST3; > + } else if (ratio < s->reaper.ratio_adjust_threshold * 5) { > + run_level_want = KVM_DIRTY_RING_REAPER_RUN_FAST4; > + } else { > + run_level_want = KVM_DIRTY_RING_REAPER_RUN_MAX_SPEED; > + } > + > + /* Get if need speed up or slow down */ > + if (run_level_want > s->reaper.run_level) { > + speed_control = KVM_DIRTY_RING_REAPER_SPEED_CONTROL_UP; > + *speed_down_cnt = 0; > + } else if (run_level_want < s->reaper.run_level) { > + speed_control = KVM_DIRTY_RING_REAPER_SPEED_CONTROL_DOWN; > + *speed_down_cnt++; > + } else { > + speed_control = KVM_DIRTY_RING_REAPER_SPEED_CONTROL_KEEP; > + } > + > + /* Control reaper thread run in sutiable run speed level */ > + if (speed_control == KVM_DIRTY_RING_REAPER_SPEED_CONTROL_UP) { > + /* If need speed up do not check its stable just do it */ > + s->reaper.run_level++; > + } else if (speed_control == > + KVM_DIRTY_RING_REAPER_SPEED_CONTROL_DOWN) { > + /* If need speed down we should filter this status */ > + if (*speed_down_cnt > s->reaper.stable_count_threshold) { > + s->reaper.run_level--; > + } > + } > + } > + > + /* Set the actual running rate of the reaper */ > + switch (s->reaper.run_level) { > + case KVM_DIRTY_RING_REAPER_RUN_NORMAL: > + sleep_time = 1000000; > + break; > + case KVM_DIRTY_RING_REAPER_RUN_FAST1: > + sleep_time = 500000; > + break; > + case KVM_DIRTY_RING_REAPER_RUN_FAST2: > + sleep_time = 250000; > + break; > + case KVM_DIRTY_RING_REAPER_RUN_FAST3: > + sleep_time = 125000; > + break; > + case KVM_DIRTY_RING_REAPER_RUN_FAST4: > + sleep_time = 100000; > + break; > + case KVM_DIRTY_RING_REAPER_RUN_MAX_SPEED: > + sleep_time = 80000; > + break; > + default: > + sleep_time = 1000000; > + error_report("Bad reaper thread run level, use default"); > + } > + > + return sleep_time; > +} > +I think how to calculate the sleep time needs discuussion, including why we define 5 levels, why we choose the time constants and in what scenarios this algo works fine. The other thing is i still think it's nicer we have the simplest algorithm firstly, which should be very easy to verify. > static void *kvm_dirty_ring_reaper_thread(void *data) > { > KVMState *s = data; > struct KVMDirtyRingReaper *r = &s->reaper; > > + uint64_t count = 0; > + uint64_t sleep_time = 1000000; > + uint64_t ring_full_cnt_last = 0; > + /* Filter speed jitter */ > + uint32_t speed_down_cnt = 0; > + > rcu_register_thread(); > > trace_kvm_dirty_ring_reaper("init"); > @@ -1461,18 +1591,26 @@ static void *kvm_dirty_ring_reaper_thread(void *data) > while (true) { > r->reaper_state = KVM_DIRTY_RING_REAPER_WAIT; > trace_kvm_dirty_ring_reaper("wait"); > - /* > - * TODO: provide a smarter timeout rather than a constant? > - */ > - sleep(1); > + > + ring_full_cnt_last = s->reaper.ring_full_cnt; > + > + usleep(sleep_time); > > trace_kvm_dirty_ring_reaper("wakeup"); > r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING; > > qemu_mutex_lock_iothread(); > - kvm_dirty_ring_reap(s); > + count = kvm_dirty_ring_reap(s); > qemu_mutex_unlock_iothread(); > > + /* > + * Calculate the appropriate sleep time according to > + * the speed of the current dirty page. > + */ > + sleep_time = calcu_sleep_time(s, count, > + ring_full_cnt_last, > + &speed_down_cnt); > + > r->reaper_iteration++; > } > > @@ -2958,6 +3096,7 @@ int kvm_cpu_exec(CPUState *cpu) > trace_kvm_dirty_ring_full(cpu->cpu_index); > qemu_mutex_lock_iothread(); > kvm_dirty_ring_reap(kvm_state); > + kvm_state->reaper.ring_full_cnt++; > qemu_mutex_unlock_iothread(); > ret = 0; > break; Thanks. -- Best regard Hyman Huang(黄勇)