linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Linux-MM <linux-mm@kvack.org>, NeilBrown <neilb@suse.de>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Dave Chinner <david@fromorbit.com>,
	Rik van Riel <riel@surriel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated
Date: Thu, 14 Oct 2021 17:44:50 +0200	[thread overview]
Message-ID: <1953635e-a97a-eff3-8019-3d012b065938@suse.cz> (raw)
In-Reply-To: <20211014115632.GZ3959@techsingularity.net>

On 10/14/21 13:56, Mel Gorman wrote:
> On Thu, Oct 14, 2021 at 10:06:25AM +0200, Vlastimil Babka wrote:
>> On 10/8/21 15:53, Mel Gorman wrote:
>> > Page reclaim throttles on congestion if too many parallel reclaim instances
>> > have isolated too many pages. This makes no sense, excessive parallelisation
>> > has nothing to do with writeback or congestion.
>> > 
>> > This patch creates an additional workqueue to sleep on when too many
>> > pages are isolated. The throttled tasks are woken when the number
>> > of isolated pages is reduced or a timeout occurs. There may be
>> > some false positive wakeups for GFP_NOIO/GFP_NOFS callers but
>> > the tasks will throttle again if necessary.
>> > 
>> > [shy828301@gmail.com: Wake up from compaction context]
>> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>> 
>> ...
>> 
>> > diff --git a/mm/internal.h b/mm/internal.h
>> > index 90764d646e02..06d0c376efcd 100644
>> > --- a/mm/internal.h
>> > +++ b/mm/internal.h
>> > @@ -45,6 +45,15 @@ static inline void acct_reclaim_writeback(struct page *page)
>> >  		__acct_reclaim_writeback(pgdat, page, nr_throttled);
>> >  }
>> >  
>> > +static inline void wake_throttle_isolated(pg_data_t *pgdat)
>> > +{
>> > +	wait_queue_head_t *wqh;
>> > +
>> > +	wqh = &pgdat->reclaim_wait[VMSCAN_THROTTLE_ISOLATED];
>> > +	if (waitqueue_active(wqh))
>> > +		wake_up_all(wqh);
>> 
>> Again, would it be better to wake up just one task to prevent possible
>> thundering herd? We can assume that that task will call too_many_isolated()
>> eventually to wake up the next one?
> 
> Same problem as the writeback throttling, there is no prioritsation of
> light vs heavy allocators.
> 
>> Although it seems strange that
>> too_many_isolated() is the place where we detect the situation for wake up.
>> Simpler than to hook into NR_ISOLATED decrementing I guess.
>> 
> 
> Simplier but more costly. Every decrement would have to check
> too_many_isolated(). I think the cost of that is too high given that the
> VMSCAN_THROTTLE_ISOLATED is relatively hard to trigger and the minority
> of throttling events.

Agreed.

>> > +}
>> > +
>> >  vm_fault_t do_swap_page(struct vm_fault *vmf);
>> >  
>> >  void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>> ...
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -1006,11 +1006,10 @@ static void handle_write_error(struct address_space *mapping,
>> >  	unlock_page(page);
>> >  }
>> >  
>> > -static void
>> > -reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason,
>> > +void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason,
>> >  							long timeout)
>> >  {
>> > -	wait_queue_head_t *wqh = &pgdat->reclaim_wait;
>> > +	wait_queue_head_t *wqh = &pgdat->reclaim_wait[reason];
>> 
>> It seems weird that later in this function we increase nr_reclaim_throttled
>> without distinguishing the reason, so effectively throttling for isolated
>> pages will trigger acct_reclaim_writeback() doing the NR_THROTTLED_WRITTEN
>> counting, although it's not related at all? Maybe either have separate
>> nr_reclaim_throttled counters per vmscan_throttle_state (if counter of
>> isolated is useful, I haven't seen the rest of series yet), or count only
>> VMSCAN_THROTTLE_WRITEBACK tasks?
>> 
> 
> Very good point, it would be more appropriate to only count the
> writeback reason.
> 
> Diff on top is below. It'll cause minor conflicts later in the series.

Looks good, for the updated version:

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ca65d6a64bdd..58a25d42c31c 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -849,7 +849,7 @@ typedef struct pglist_data {
>  	wait_queue_head_t kswapd_wait;
>  	wait_queue_head_t pfmemalloc_wait;
>  	wait_queue_head_t reclaim_wait[NR_VMSCAN_THROTTLE];
> -	atomic_t nr_reclaim_throttled;	/* nr of throtted tasks */
> +	atomic_t nr_writeback_throttled;/* nr of writeback-throttled tasks */
>  	unsigned long nr_reclaim_start;	/* nr pages written while throttled
>  					 * when throttling started. */
>  	struct task_struct *kswapd;	/* Protected by
> diff --git a/mm/internal.h b/mm/internal.h
> index 06d0c376efcd..3461a1055975 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -39,7 +39,7 @@ void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page,
>  static inline void acct_reclaim_writeback(struct page *page)
>  {
>  	pg_data_t *pgdat = page_pgdat(page);
> -	int nr_throttled = atomic_read(&pgdat->nr_reclaim_throttled);
> +	int nr_throttled = atomic_read(&pgdat->nr_writeback_throttled);
>  
>  	if (nr_throttled)
>  		__acct_reclaim_writeback(pgdat, page, nr_throttled);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 6e198bbbd86a..29434d4fc1c7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1011,6 +1011,7 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason,
>  {
>  	wait_queue_head_t *wqh = &pgdat->reclaim_wait[reason];
>  	long ret;
> +	bool acct_writeback = (reason == VMSCAN_THROTTLE_WRITEBACK);
>  	DEFINE_WAIT(wait);
>  
>  	/*
> @@ -1022,7 +1023,8 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason,
>  	    current->flags & (PF_IO_WORKER|PF_KTHREAD))
>  		return;
>  
> -	if (atomic_inc_return(&pgdat->nr_reclaim_throttled) == 1) {
> +	if (acct_writeback &&
> +	    atomic_inc_return(&pgdat->nr_writeback_throttled) == 1) {
>  		WRITE_ONCE(pgdat->nr_reclaim_start,
>  			node_page_state(pgdat, NR_THROTTLED_WRITTEN));
>  	}
> @@ -1030,7 +1032,9 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason,
>  	prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
>  	ret = schedule_timeout(timeout);
>  	finish_wait(wqh, &wait);
> -	atomic_dec(&pgdat->nr_reclaim_throttled);
> +
> +	if (acct_writeback)
> +		atomic_dec(&pgdat->nr_writeback_throttled);
>  
>  	trace_mm_vmscan_throttled(pgdat->node_id, jiffies_to_usecs(timeout),
>  				jiffies_to_usecs(timeout - ret),
> @@ -4349,7 +4353,7 @@ static int kswapd(void *p)
>  
>  	WRITE_ONCE(pgdat->kswapd_order, 0);
>  	WRITE_ONCE(pgdat->kswapd_highest_zoneidx, MAX_NR_ZONES);
> -	atomic_set(&pgdat->nr_reclaim_throttled, 0);
> +	atomic_set(&pgdat->nr_writeback_throttled, 0);
>  	for ( ; ; ) {
>  		bool ret;
>  
> 


  reply	other threads:[~2021-10-14 15:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-08 13:53 [PATCH v3 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-08 13:53 ` [PATCH 1/8] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman
2021-10-13 15:39   ` Vlastimil Babka
2021-10-14 10:47     ` Mel Gorman
2021-10-14 15:42       ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Mel Gorman
2021-10-14  8:06   ` Vlastimil Babka
2021-10-14 11:56     ` Mel Gorman
2021-10-14 15:44       ` Vlastimil Babka [this message]
2021-10-08 13:53 ` [PATCH 3/8] mm/vmscan: Throttle reclaim when no progress is being made Mel Gorman
2021-10-14 12:31   ` Vlastimil Babka
2021-10-14 13:03     ` Mel Gorman
2021-10-14 15:45       ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 4/8] mm/writeback: Throttle based on page writeback instead of congestion Mel Gorman
2021-10-14 15:34   ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 5/8] mm/page_alloc: Remove the throttling logic from the page allocator Mel Gorman
2021-10-14 15:36   ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 6/8] mm/vmscan: Centralise timeout values for reclaim_throttle Mel Gorman
2021-10-14 15:38   ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 7/8] mm/vmscan: Increase the timeout if page reclaim is not making progress Mel Gorman
2021-10-14 15:39   ` Vlastimil Babka
2021-10-08 13:53 ` [PATCH 8/8] mm/vmscan: Delay waking of tasks throttled on NOPROGRESS Mel Gorman
2021-10-14 15:41   ` Vlastimil Babka
2021-10-19  9:01 [PATCH v4 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-19  9:01 ` [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Mel Gorman
2021-10-19 17:12   ` Yang Shi
2021-10-22 14:46 [PATCH v5 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-22 14:46 ` [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1953635e-a97a-eff3-8019-3d012b065938@suse.cz \
    --to=vbabka@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=corbet@lwn.net \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=neilb@suse.de \
    --cc=riel@surriel.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).