linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kemeng Shi <shikemeng@huaweicloud.com>
To: Jan Kara <jack@suse.cz>
Cc: willy@infradead.org, akpm@linux-foundation.org, tj@kernel.org,
	hcochran@kernelspring.com, axboe@kernel.dk, mszeredi@redhat.com,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/4] mm: correct calculation of wb's bg_thresh in cgroup domain
Date: Tue, 7 May 2024 09:16:39 +0800	[thread overview]
Message-ID: <12bf104e-aeac-67a5-6e5a-bc7bdbfe4d79@huaweicloud.com> (raw)
In-Reply-To: <20240503093056.6povgn2shvqzpedj@quack3>


Hi Jan,
on 5/3/2024 5:30 PM, Jan Kara wrote:
> On Thu 25-04-24 21:17:22, Kemeng Shi wrote:
>> The wb_calc_thresh is supposed to calculate wb's share of bg_thresh in
>> global domain. To calculate wb's share of bg_thresh in cgroup domain,
>> it's more reasonable to use __wb_calc_thresh in which way we calculate
>> dirty_thresh in cgroup domain in balance_dirty_pages().
>>
>> Consider following domain hierarchy:
>>                 global domain (> 20G)
>>                 /                 \
>>         cgroup domain1(10G)     cgroup domain2(10G)
>>                 |                 |
>> bdi            wb1               wb2
>> Assume wb1 and wb2 has the same bandwidth.
>> We have global domain bg_thresh > 2G, cgroup domain bg_thresh 1G.
>> Then we have:
>> wb's thresh in global domain = 2G * (wb bandwidth) / (system bandwidth)
>> = 2G * 1/2 = 1G
>> wb's thresh in cgroup domain = 1G * (wb bandwidth) / (system bandwidth)
>> = 1G * 1/2 = 0.5G
>> At last, wb1 and wb2 will be limited at 0.5G, the system will be limited
>> at 1G which is less than global domain bg_thresh 2G.
> 
> This was a bit hard to understand for me so I'd rephrase it as:
> 
> wb_calc_thresh() is calculating wb's share of bg_thresh in the global
> domain. However in case of cgroup writeback this is not the right thing to
> do. Consider the following domain hierarchy:
> 
>                 global domain (> 20G)
>                 /                 \
>           cgroup1 (10G)     cgroup2 (10G)
>                 |                 |
> bdi            wb1               wb2
> 
> and assume wb1 and wb2 have the same bandwidth and the background threshold
> is set at 10%. The bg_thresh of cgroup1 and cgroup2 is going to be 1G. Now
> because wb_calc_thresh(mdtc->wb, mdtc->bg_thresh) calculates per-wb
> threshold in the global domain as (wb bandwidth) / (domain bandwidth) it
> returns bg_thresh for wb1 as 0.5G although it has nobody to compete against
> in cgroup1.
> 
> Fix the problem by calculating wb's share of bg_thresh in the cgroup
> domain.
Thanks for improving the changelog. As this was merged into -mm and
mm-unstable tree, I'm not sure if a new patch is needed. If there is
anything I should do, please let me konw. Thanks.

> 
>> Test as following:
>> /* make it easier to observe the issue */
>> echo 300000 > /proc/sys/vm/dirty_expire_centisecs
>> echo 100 > /proc/sys/vm/dirty_writeback_centisecs
>>
>> /* run fio in wb1 */
>> cd /sys/fs/cgroup
>> echo "+memory +io" > cgroup.subtree_control
>> mkdir group1
>> cd group1
>> echo 10G > memory.high
>> echo 10G > memory.max
>> echo $$ > cgroup.procs
>> mkfs.ext4 -F /dev/vdb
>> mount /dev/vdb /bdi1/
>> fio -name test -filename=/bdi1/file -size=600M -ioengine=libaio -bs=4K \
>> -iodepth=1 -rw=write -direct=0 --time_based -runtime=600 -invalidate=0
>>
>> /* run fio in wb2 with a new shell */
>> cd /sys/fs/cgroup
>> mkdir group2
>> cd group2
>> echo 10G > memory.high
>> echo 10G > memory.max
>> echo $$ > cgroup.procs
>> mkfs.ext4 -F /dev/vdc
>> mount /dev/vdc /bdi2/
>> fio -name test -filename=/bdi2/file -size=600M -ioengine=libaio -bs=4K \
>> -iodepth=1 -rw=write -direct=0 --time_based -runtime=600 -invalidate=0
>>
>> Before fix, the wrttien pages of wb1 and wb2 reported from
>> toos/writeback/wb_monitor.py keep growing. After fix, rare written pages
>> are accumulated.
>> There is no obvious change in fio result.
>>
>> Fixes: 74d369443325 ("writeback: Fix performance regression in wb_over_bg_thresh()")
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> 
> Besides the changelog rephrasing the change looks good. Feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> 								Honza
> 
>> ---
>>  mm/page-writeback.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> index 2a3b68aae336..14893b20d38c 100644
>> --- a/mm/page-writeback.c
>> +++ b/mm/page-writeback.c
>> @@ -2137,7 +2137,7 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb)
>>  		if (mdtc->dirty > mdtc->bg_thresh)
>>  			return true;
>>  
>> -		thresh = wb_calc_thresh(mdtc->wb, mdtc->bg_thresh);
>> +		thresh = __wb_calc_thresh(mdtc, mdtc->bg_thresh);
>>  		if (thresh < 2 * wb_stat_error())
>>  			reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE);
>>  		else
>> -- 
>> 2.30.0
>>


  reply	other threads:[~2024-05-07  1:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-25 13:17 [PATCH v2 0/4] Fix and cleanups to page-writeback Kemeng Shi
2024-04-25 13:17 ` [PATCH v2 1/4] mm: enable __wb_calc_thresh to calculate dirty background threshold Kemeng Shi
2024-05-03  9:11   ` Jan Kara
2024-04-25 13:17 ` [PATCH v2 2/4] mm: correct calculation of wb's bg_thresh in cgroup domain Kemeng Shi
2024-05-03  9:30   ` Jan Kara
2024-05-07  1:16     ` Kemeng Shi [this message]
2024-05-07 13:28       ` Jan Kara
2024-04-25 13:17 ` [PATCH v2 3/4] mm: call __wb_calc_thresh instead of wb_calc_thresh in wb_over_bg_thresh Kemeng Shi
2024-05-03  9:31   ` Jan Kara
2024-04-25 13:17 ` [PATCH v2 4/4] mm: remove stale comment __folio_mark_dirty Kemeng Shi
2024-05-03  9:31   ` Jan Kara
2024-05-01 16:16 ` [PATCH v2 0/4] Fix and cleanups to page-writeback Tejun Heo
2024-05-06  1:25   ` Kemeng Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12bf104e-aeac-67a5-6e5a-bc7bdbfe4d79@huaweicloud.com \
    --to=shikemeng@huaweicloud.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=hcochran@kernelspring.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mszeredi@redhat.com \
    --cc=tj@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).