All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Linux-MM <linux-mm@kvack.org>, NeilBrown <neilb@suse.de>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Dave Chinner <david@fromorbit.com>,
	Rik van Riel <riel@surriel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/8] mm/vmscan: Throttle reclaim until some writeback completes if congested
Date: Thu, 14 Oct 2021 17:42:51 +0200	[thread overview]
Message-ID: <66e8e0cc-1abb-a283-1e0d-068124a84790@suse.cz> (raw)
In-Reply-To: <20211014104744.GY3959@techsingularity.net>

On 10/14/21 12:47, Mel Gorman wrote:
> Thanks Vlastimil
> 
> On Wed, Oct 13, 2021 at 05:39:36PM +0200, Vlastimil Babka wrote:
>> > +/*
>> > + * Account for pages written if tasks are throttled waiting on dirty
>> > + * pages to clean. If enough pages have been cleaned since throttling
>> > + * started then wakeup the throttled tasks.
>> > + */
>> > +void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page,
>> > +							int nr_throttled)
>> > +{
>> > +	unsigned long nr_written;
>> > +
>> > +	__inc_node_page_state(page, NR_THROTTLED_WRITTEN);
>> 
>> Is this intentionally using the __ version that normally expects irqs to be
>> disabled (AFAIK they are not in this path)? I think this is rarely used cold
>> path so it doesn't seem worth to trade off speed for accuracy.
>> 
> 
> It was intentional because IRQs can be disabled and if it's race-prone,
> it's not overly problematic but you're right, better to be safe.  I changed
> it to the safe type as it's mostly free on x86, arm64 and s390 and for
> other architectures, this is a slow path.

Great, thanks.

>> > +	nr_written = node_page_state(pgdat, NR_THROTTLED_WRITTEN) -
>> > +		READ_ONCE(pgdat->nr_reclaim_start);
>> 
>> Even if the inc above was safe, node_page_state() will return only the
>> global counter, so the value we read here will only actually increment when
>> some cpu's counter overflows, so it will be "bursty". Maybe it's ok, just
>> worth documenting?
>> 
> 
> I didn't think the penalty of doing an accurate read while writeback
> throttled is worth it. I'll add a comment.
> 
>> > +
>> > +	if (nr_written > SWAP_CLUSTER_MAX * nr_throttled)
>> > +		wake_up_all(&pgdat->reclaim_wait);
>> 
>> Hm it seems a bit weird that the more tasks are throttled, the more we wait,
>> and then wake up all. Theoretically this will lead to even more
>> bursty/staggering herd behavior. Could be better to wake up single task each
>> SWAP_CLUSTER_MAX, and bump nr_reclaim_start? But maybe it's not a problem in
>> practice due to HZ/10 timeouts being short enough?
>> 
> 
> Yes, the more tasks are throttled the longer tasks wait because tasks are
> allocating faster than writeback can complete so I wanted to reduce the
> allocation pressure. I considered waking one task at a time but there is
> no prioritisation of tasks on the waitqueue and it's not clear that the
> additional complexity is justified. With inaccurate counters, a light
> allocator could get throttled for the full timeout unnecessarily.
> 
> Even if we were to wake one task at a time, I would prefer it was done
> as a potential optimisation on top.

Fair enough.

> Diff on top based on review feedback;

Thanks, with that you can add

Acked-by: Vlastimil Babka <vbabka@suse.cz>

to the updated version

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bcd22e53795f..735b1f2b5d9e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1048,7 +1048,15 @@ void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page,
>  {
>  	unsigned long nr_written;
>  
> -	__inc_node_page_state(page, NR_THROTTLED_WRITTEN);
> +	inc_node_page_state(page, NR_THROTTLED_WRITTEN);
> +
> +	/*
> +	 * This is an inaccurate read as the per-cpu deltas may not
> +	 * be synchronised. However, given that the system is
> +	 * writeback throttled, it is not worth taking the penalty
> +	 * of getting an accurate count. At worst, the throttle
> +	 * timeout guarantees forward progress.
> +	 */
>  	nr_written = node_page_state(pgdat, NR_THROTTLED_WRITTEN) -
>  		READ_ONCE(pgdat->nr_reclaim_start);
> 


  reply	other threads:[~2021-10-14 15:42 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-08 13:53 [PATCH v3 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-08 13:53 ` [PATCH 1/8] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman
2021-10-13 15:39   ` Vlastimil Babka
2021-10-14 10:47     ` Mel Gorman
2021-10-14 15:42       ` Vlastimil Babka [this message]
2021-10-08 13:53 ` [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Mel Gorman
2021-10-14  8:06   ` Vlastimil Babka
2021-10-14 11:56     ` Mel Gorman
2021-10-14 15:44       ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 3/8] mm/vmscan: Throttle reclaim when no progress is being made Mel Gorman
2021-10-14 12:31   ` Vlastimil Babka
2021-10-14 13:03     ` Mel Gorman
2021-10-14 15:45       ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 4/8] mm/writeback: Throttle based on page writeback instead of congestion Mel Gorman
2021-10-14 15:34   ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 5/8] mm/page_alloc: Remove the throttling logic from the page allocator Mel Gorman
2021-10-14 15:36   ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 6/8] mm/vmscan: Centralise timeout values for reclaim_throttle Mel Gorman
2021-10-14 15:38   ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 7/8] mm/vmscan: Increase the timeout if page reclaim is not making progress Mel Gorman
2021-10-14 15:39   ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 8/8] mm/vmscan: Delay waking of tasks throttled on NOPROGRESS Mel Gorman
2021-10-14 15:41   ` Vlastimil Babka
2021-10-19  9:01 [PATCH v4 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-19  9:01 ` [PATCH 1/8] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman
2021-10-22 14:46 [PATCH v5 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-22 14:46 ` [PATCH 1/8] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66e8e0cc-1abb-a283-1e0d-068124a84790@suse.cz \
    --to=vbabka@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=corbet@lwn.net \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=neilb@suse.de \
    --cc=riel@surriel.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.