linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Linux-MM <linux-mm@kvack.org>, NeilBrown <neilb@suse.de>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Dave Chinner <david@fromorbit.com>,
	Rik van Riel <riel@surriel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/8] mm/vmscan: Throttle reclaim until some writeback completes if congested
Date: Thu, 14 Oct 2021 17:42:51 +0200	[thread overview]
Message-ID: <66e8e0cc-1abb-a283-1e0d-068124a84790@suse.cz> (raw)
In-Reply-To: <20211014104744.GY3959@techsingularity.net>

On 10/14/21 12:47, Mel Gorman wrote:
> Thanks Vlastimil
> 
> On Wed, Oct 13, 2021 at 05:39:36PM +0200, Vlastimil Babka wrote:
>> > +/*
>> > + * Account for pages written if tasks are throttled waiting on dirty
>> > + * pages to clean. If enough pages have been cleaned since throttling
>> > + * started then wakeup the throttled tasks.
>> > + */
>> > +void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page,
>> > +							int nr_throttled)
>> > +{
>> > +	unsigned long nr_written;
>> > +
>> > +	__inc_node_page_state(page, NR_THROTTLED_WRITTEN);
>> 
>> Is this intentionally using the __ version that normally expects irqs to be
>> disabled (AFAIK they are not in this path)? I think this is rarely used cold
>> path so it doesn't seem worth to trade off speed for accuracy.
>> 
> 
> It was intentional because IRQs can be disabled and if it's race-prone,
> it's not overly problematic but you're right, better to be safe.  I changed
> it to the safe type as it's mostly free on x86, arm64 and s390 and for
> other architectures, this is a slow path.

Great, thanks.

>> > +	nr_written = node_page_state(pgdat, NR_THROTTLED_WRITTEN) -
>> > +		READ_ONCE(pgdat->nr_reclaim_start);
>> 
>> Even if the inc above was safe, node_page_state() will return only the
>> global counter, so the value we read here will only actually increment when
>> some cpu's counter overflows, so it will be "bursty". Maybe it's ok, just
>> worth documenting?
>> 
> 
> I didn't think the penalty of doing an accurate read while writeback
> throttled is worth it. I'll add a comment.
> 
>> > +
>> > +	if (nr_written > SWAP_CLUSTER_MAX * nr_throttled)
>> > +		wake_up_all(&pgdat->reclaim_wait);
>> 
>> Hm it seems a bit weird that the more tasks are throttled, the more we wait,
>> and then wake up all. Theoretically this will lead to even more
>> bursty/staggering herd behavior. Could be better to wake up single task each
>> SWAP_CLUSTER_MAX, and bump nr_reclaim_start? But maybe it's not a problem in
>> practice due to HZ/10 timeouts being short enough?
>> 
> 
> Yes, the more tasks are throttled the longer tasks wait because tasks are
> allocating faster than writeback can complete so I wanted to reduce the
> allocation pressure. I considered waking one task at a time but there is
> no prioritisation of tasks on the waitqueue and it's not clear that the
> additional complexity is justified. With inaccurate counters, a light
> allocator could get throttled for the full timeout unnecessarily.
> 
> Even if we were to wake one task at a time, I would prefer it was done
> as a potential optimisation on top.

Fair enough.

> Diff on top based on review feedback;

Thanks, with that you can add

Acked-by: Vlastimil Babka <vbabka@suse.cz>

to the updated version

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bcd22e53795f..735b1f2b5d9e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1048,7 +1048,15 @@ void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page,
>  {
>  	unsigned long nr_written;
>  
> -	__inc_node_page_state(page, NR_THROTTLED_WRITTEN);
> +	inc_node_page_state(page, NR_THROTTLED_WRITTEN);
> +
> +	/*
> +	 * This is an inaccurate read as the per-cpu deltas may not
> +	 * be synchronised. However, given that the system is
> +	 * writeback throttled, it is not worth taking the penalty
> +	 * of getting an accurate count. At worst, the throttle
> +	 * timeout guarantees forward progress.
> +	 */
>  	nr_written = node_page_state(pgdat, NR_THROTTLED_WRITTEN) -
>  		READ_ONCE(pgdat->nr_reclaim_start);
> 



  reply	other threads:[~2021-10-14 15:42 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-08 13:53 [PATCH v3 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-08 13:53 ` [PATCH 1/8] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman
2021-10-13 15:39   ` Vlastimil Babka
2021-10-14 10:47     ` Mel Gorman
2021-10-14 15:42       ` Vlastimil Babka [this message]
2021-10-08 13:53 ` [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Mel Gorman
2021-10-14  8:06   ` Vlastimil Babka
2021-10-14 11:56     ` Mel Gorman
2021-10-14 15:44       ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 3/8] mm/vmscan: Throttle reclaim when no progress is being made Mel Gorman
2021-10-14 12:31   ` Vlastimil Babka
2021-10-14 13:03     ` Mel Gorman
2021-10-14 15:45       ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 4/8] mm/writeback: Throttle based on page writeback instead of congestion Mel Gorman
2021-10-14 15:34   ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 5/8] mm/page_alloc: Remove the throttling logic from the page allocator Mel Gorman
2021-10-14 15:36   ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 6/8] mm/vmscan: Centralise timeout values for reclaim_throttle Mel Gorman
2021-10-14 15:38   ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 7/8] mm/vmscan: Increase the timeout if page reclaim is not making progress Mel Gorman
2021-10-14 15:39   ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 8/8] mm/vmscan: Delay waking of tasks throttled on NOPROGRESS Mel Gorman
2021-10-14 15:41   ` Vlastimil Babka
2021-10-19  9:01 [PATCH v4 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-19  9:01 ` [PATCH 1/8] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman
2021-10-22 14:46 [PATCH v5 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-22 14:46 ` [PATCH 1/8] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66e8e0cc-1abb-a283-1e0d-068124a84790@suse.cz \
    --to=vbabka@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=corbet@lwn.net \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=neilb@suse.de \
    --cc=riel@surriel.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).