From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8234BC433FE for ; Wed, 9 Dec 2020 07:50:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 275CC233CF for ; Wed, 9 Dec 2020 07:50:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728281AbgLIHtq (ORCPT ); Wed, 9 Dec 2020 02:49:46 -0500 Received: from m97179.mail.qiye.163.com ([220.181.97.179]:56372 "EHLO m97179.mail.qiye.163.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725953AbgLIHtq (ORCPT ); Wed, 9 Dec 2020 02:49:46 -0500 Received: from [192.168.122.37] (unknown [218.94.118.90]) by m97179.mail.qiye.163.com (Hmail) with ESMTPA id 4A51EE019BC; Wed, 9 Dec 2020 15:48:55 +0800 (CST) Subject: Re: [PATCH] bcache: consider the fragmentation when update the writeback rate To: Dongdong Tao Cc: Dongdong Tao , Coly Li , Gavin Guo , Gerald Yang , Trent Lloyd , Kent Overstreet , "open list:BCACHE (BLOCK LAYER CACHE)" , open list References: <20201103124235.14440-1-tdd21151186@gmail.com> <89b83c00-1117-d114-2c23-7b03fc22966e@easystack.cn> From: Dongsheng Yang Message-ID: Date: Wed, 9 Dec 2020 15:49:26 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-HM-Spam-Status: e1kfGhgUHx5ZQUtXWQgYFAkeWUFZS1VLWVdZKFlBSUI3V1ktWUFJV1kPCR oVCBIfWUFZHxofSE5ITxlLTB4dVkpNS0xOS0tKSE5NTEJVGRETFhoSFyQUDg9ZV1kWGg8SFR0UWU FZT0tIVUpKS0hOT1VLWQY+ X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6Oj46EBw6Gj0wDCkNEEguOggy NQkaCzRVSlVKTUtMTktLSkhOQktMVTMWGhIXVR8UFRwIEx4VHFUCGhUcOx4aCAIIDxoYEFUYFUVZ V1kSC1lBWUlKQ1VCT1VKSkNVQktZV1kIAVlBQkJPTjcG X-HM-Tid: 0a7646797c8f20bdkuqy4a51ee019bc Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2020/12/9 星期三 下午 12:48, Dongdong Tao 写道: > Hi Dongsheng, > > I'm working on it, next step I'm gathering some testing data and > upload (very sorry for the delay...) > Thanks for the comment. > One of the main concerns to alleviate this issue with the writeback > process is that we need to minimize the impact on the client IO > performance. > writeback_percent by default is 10, start writeback when dirty buckets > reached 10 percent might be a bit too aggressive, as the > writeback_cutoff_sync is 70 percent. > So i chose to start the writeback when dirty buckets reached 50 > percent so that this patch will only take effect after dirty buckets > percent is above that Agree with that's too aggressive to reuse writeback_percent, and that's less flexable. Okey, let's wait for your testing result. Thanx > > Thanks, > Dongdong > > > > > On Wed, Dec 9, 2020 at 10:27 AM Dongsheng Yang > wrote: >> >> 在 2020/11/3 星期二 下午 8:42, Dongdong Tao 写道: >>> From: dongdong tao >>> >>> Current way to calculate the writeback rate only considered the >>> dirty sectors, this usually works fine when the fragmentation >>> is not high, but it will give us unreasonable small rate when >>> we are under a situation that very few dirty sectors consumed >>> a lot dirty buckets. In some case, the dirty bucekts can reached >>> to CUTOFF_WRITEBACK_SYNC while the dirty data(sectors) noteven >>> reached the writeback_percent, the writeback rate will still >>> be the minimum value (4k), thus it will cause all the writes to be >>> stucked in a non-writeback mode because of the slow writeback. >>> >>> This patch will try to accelerate the writeback rate when the >>> fragmentation is high. It calculate the propotional_scaled value >>> based on below: >>> (dirty_sectors / writeback_rate_p_term_inverse) * fragment >>> As we can see, the higher fragmentation will result a larger >>> proportional_scaled value, thus cause a larger writeback rate. >>> The fragment value is calculated based on below: >>> (dirty_buckets * bucket_size) / dirty_sectors >>> If you think about it, the value of fragment will be always >>> inside [1, bucket_size]. >>> >>> This patch only considers the fragmentation when the number of >>> dirty_buckets reached to a dirty threshold(configurable by >>> writeback_fragment_percent, default is 50), so bcache will >>> remain the original behaviour before the dirty buckets reached >>> the threshold. >>> >>> Signed-off-by: dongdong tao >>> --- >>> drivers/md/bcache/bcache.h | 1 + >>> drivers/md/bcache/sysfs.c | 6 ++++++ >>> drivers/md/bcache/writeback.c | 21 +++++++++++++++++++++ >>> 3 files changed, 28 insertions(+) >>> >>> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h >>> index 1d57f48307e6..87632f7032b6 100644 >>> --- a/drivers/md/bcache/bcache.h >>> +++ b/drivers/md/bcache/bcache.h >>> @@ -374,6 +374,7 @@ struct cached_dev { >>> unsigned int writeback_metadata:1; >>> unsigned int writeback_running:1; >>> unsigned char writeback_percent; >>> + unsigned char writeback_fragment_percent; >>> unsigned int writeback_delay; >>> >>> uint64_t writeback_rate_target; >>> diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c >>> index 554e3afc9b68..69499113aef8 100644 >>> --- a/drivers/md/bcache/sysfs.c >>> +++ b/drivers/md/bcache/sysfs.c >>> @@ -115,6 +115,7 @@ rw_attribute(stop_when_cache_set_failed); >>> rw_attribute(writeback_metadata); >>> rw_attribute(writeback_running); >>> rw_attribute(writeback_percent); >>> +rw_attribute(writeback_fragment_percent); >> >> Hi Dongdong and Coly, >> >> What is the status about this patch? In my opinion, it is a problem >> we need to solve, >> >> but can we just reuse the parameter of writeback_percent, rather than >> introduce a new writeback_fragment_percent? >> >> That means the semantic of writeback_percent will act on dirty data >> percent and dirty bucket percent. >> >> When we found there are dirty buckets more than (c->nbuckets * >> writeback_percent), start the writeback. >> >> >> Thanx >> >> Yang >> >>> rw_attribute(writeback_delay); >>> rw_attribute(writeback_rate); >>> >>> @@ -197,6 +198,7 @@ SHOW(__bch_cached_dev) >>> var_printf(writeback_running, "%i"); >>> var_print(writeback_delay); >>> var_print(writeback_percent); >>> + var_print(writeback_fragment_percent); >>> sysfs_hprint(writeback_rate, >>> wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0); >>> sysfs_printf(io_errors, "%i", atomic_read(&dc->io_errors)); >>> @@ -308,6 +310,9 @@ STORE(__cached_dev) >>> sysfs_strtoul_clamp(writeback_percent, dc->writeback_percent, >>> 0, bch_cutoff_writeback); >>> >>> + sysfs_strtoul_clamp(writeback_fragment_percent, dc->writeback_fragment_percent, >>> + 0, bch_cutoff_writeback_sync); >>> + >>> if (attr == &sysfs_writeback_rate) { >>> ssize_t ret; >>> long int v = atomic_long_read(&dc->writeback_rate.rate); >>> @@ -498,6 +503,7 @@ static struct attribute *bch_cached_dev_files[] = { >>> &sysfs_writeback_running, >>> &sysfs_writeback_delay, >>> &sysfs_writeback_percent, >>> + &sysfs_writeback_fragment_percent, >>> &sysfs_writeback_rate, >>> &sysfs_writeback_rate_update_seconds, >>> &sysfs_writeback_rate_i_term_inverse, >>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c >>> index 3c74996978da..34babc89fdf3 100644 >>> --- a/drivers/md/bcache/writeback.c >>> +++ b/drivers/md/bcache/writeback.c >>> @@ -88,6 +88,26 @@ static void __update_writeback_rate(struct cached_dev *dc) >>> int64_t integral_scaled; >>> uint32_t new_rate; >>> >>> + /* >>> + * We need to consider the number of dirty buckets as well >>> + * when calculating the proportional_scaled, Otherwise we might >>> + * have an unreasonable small writeback rate at a highly fragmented situation >>> + * when very few dirty sectors consumed a lot dirty buckets, the >>> + * worst case is when dirty_data reached writeback_percent and >>> + * dirty buckets reached to cutoff_writeback_sync, but the rate >>> + * still will be at the minimum value, which will cause the write >>> + * stuck at a non-writeback mode. >>> + */ >>> + struct cache_set *c = dc->disk.c; >>> + >>> + if (c->gc_stats.in_use > dc->writeback_fragment_percent && dirty > 0) { >>> + int64_t dirty_buckets = (c->gc_stats.in_use * c->nbuckets) / 100; >>> + int64_t fragment = (dirty_buckets * c->cache->sb.bucket_size) / dirty; >>> + >>> + proportional_scaled = >>> + div_s64(dirty, dc->writeback_rate_p_term_inverse) * (fragment); >>> + } >>> + >>> if ((error < 0 && dc->writeback_rate_integral > 0) || >>> (error > 0 && time_before64(local_clock(), >>> dc->writeback_rate.next + NSEC_PER_MSEC))) { >>> @@ -969,6 +989,7 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc) >>> dc->writeback_metadata = true; >>> dc->writeback_running = false; >>> dc->writeback_percent = 10; >>> + dc->writeback_fragment_percent = 50; >>> dc->writeback_delay = 30; >>> atomic_long_set(&dc->writeback_rate.rate, 1024); >>> dc->writeback_rate_minimum = 8;