All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: NeilBrown <neilb@suse.de>
Cc: Linux-MM <linux-mm@kvack.org>, Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Dave Chinner <david@fromorbit.com>,
	Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/5] mm/vmscan: Throttle reclaim until some writeback completes if congested
Date: Tue, 21 Sep 2021 12:12:34 +0100	[thread overview]
Message-ID: <20210921111234.GQ3959@techsingularity.net> (raw)
In-Reply-To: <163217994752.3992.5443677201798473600@noble.neil.brown.name>

On Tue, Sep 21, 2021 at 09:19:07AM +1000, NeilBrown wrote:
> On Mon, 20 Sep 2021, Mel Gorman wrote:
> >  
> > +void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page);
> > +static inline void acct_reclaim_writeback(struct page *page)
> > +{
> > +	pg_data_t *pgdat = page_pgdat(page);
> > +
> > +	if (atomic_read(&pgdat->nr_reclaim_throttled))
> > +		__acct_reclaim_writeback(pgdat, page);
> 
> The first thing __acct_reclaim_writeback() does is repeat that
> atomic_read().
> Should we read it once and pass the value in to
> __acct_reclaim_writeback(), or is that an unnecessary
> micro-optimisation?
> 

I think it's a micro-optimisation but I can still do it.

> 
> > +/*
> > + * Account for pages written if tasks are throttled waiting on dirty
> > + * pages to clean. If enough pages have been cleaned since throttling
> > + * started then wakeup the throttled tasks.
> > + */
> > +void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page)
> > +{
> > +	unsigned long nr_written;
> > +	int nr_throttled = atomic_read(&pgdat->nr_reclaim_throttled);
> > +
> > +	__inc_node_page_state(page, NR_THROTTLED_WRITTEN);
> > +	nr_written = node_page_state(pgdat, NR_THROTTLED_WRITTEN) -
> > +		READ_ONCE(pgdat->nr_reclaim_start);
> > +
> > +	if (nr_written > SWAP_CLUSTER_MAX * nr_throttled)
> > +		wake_up_interruptible_all(&pgdat->reclaim_wait);
> 
> A simple wake_up() could be used here.  "interruptible" is only needed
> if non-interruptible waiters should be left alone.  "_all" is only needed
> if there are some exclusive waiters.  Neither of these apply, so I think
> the simpler interface is best.
> 

You're right.

> 
> > +}
> > +
> >  /* possible outcome of pageout() */
> >  typedef enum {
> >  	/* failed to write page out, page is locked */
> > @@ -1412,9 +1453,8 @@ static unsigned int shrink_page_list(struct list_head *page_list,
> >  
> >  		/*
> >  		 * The number of dirty pages determines if a node is marked
> > -		 * reclaim_congested which affects wait_iff_congested. kswapd
> > -		 * will stall and start writing pages if the tail of the LRU
> > -		 * is all dirty unqueued pages.
> > +		 * reclaim_congested. kswapd will stall and start writing
> > +		 * pages if the tail of the LRU is all dirty unqueued pages.
> >  		 */
> >  		page_check_dirty_writeback(page, &dirty, &writeback);
> >  		if (dirty || writeback)
> > @@ -3180,19 +3220,20 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
> >  		 * If kswapd scans pages marked for immediate
> >  		 * reclaim and under writeback (nr_immediate), it
> >  		 * implies that pages are cycling through the LRU
> > -		 * faster than they are written so also forcibly stall.
> > +		 * faster than they are written so forcibly stall
> > +		 * until some pages complete writeback.
> >  		 */
> >  		if (sc->nr.immediate)
> > -			congestion_wait(BLK_RW_ASYNC, HZ/10);
> > +			reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK, HZ/10);
> >  	}
> >  
> >  	/*
> >  	 * Tag a node/memcg as congested if all the dirty pages
> >  	 * scanned were backed by a congested BDI and
> 
> "congested BDI" doesn't mean anything any more.  Is this a good time to
> correct that comment.
> This comment seems to refer to the test
> 
>       sc->nr.dirty && sc->nr.dirty == sc->nr.congested)
> 
> a few lines down.  But nr.congested is set from nr_congested which
> counts when inode_write_congested() is true - almost never - and when 
> "writeback and PageReclaim()".
> 
> Is that last test the sign that we are cycling through the LRU to fast?
> So the comment could become:
> 
>    Tag a node/memcg as congested if all the dirty page were
>    already marked for writeback and immediate reclaim (counted in
>    nr.congested).
> 
> ??
> 
> Patch seems to make sense to me, but I'm not expert in this area.
> 

Comments updated.

Diff on top looks like

diff --git a/mm/internal.h b/mm/internal.h
index e25b3686bfab..90764d646e02 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -34,13 +34,15 @@
 
 void page_writeback_init(void);
 
-void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page);
+void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page,
+						int nr_throttled);
 static inline void acct_reclaim_writeback(struct page *page)
 {
 	pg_data_t *pgdat = page_pgdat(page);
+	int nr_throttled = atomic_read(&pgdat->nr_reclaim_throttled);
 
-	if (atomic_read(&pgdat->nr_reclaim_throttled))
-		__acct_reclaim_writeback(pgdat, page);
+	if (nr_throttled)
+		__acct_reclaim_writeback(pgdat, page, nr_throttled);
 }
 
 vm_fault_t do_swap_page(struct vm_fault *vmf);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b58ea0b13286..2dc17de91d32 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1034,10 +1034,10 @@ reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason,
  * pages to clean. If enough pages have been cleaned since throttling
  * started then wakeup the throttled tasks.
  */
-void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page)
+void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page,
+							int nr_throttled)
 {
 	unsigned long nr_written;
-	int nr_throttled = atomic_read(&pgdat->nr_reclaim_throttled);
 
 	__inc_node_page_state(page, NR_THROTTLED_WRITTEN);
 	nr_written = node_page_state(pgdat, NR_THROTTLED_WRITTEN) -
@@ -3228,9 +3228,8 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 	}
 
 	/*
-	 * Tag a node/memcg as congested if all the dirty pages
-	 * scanned were backed by a congested BDI and
-	 * non-kswapd tasks will stall on reclaim_throttle.
+	 * Tag a node/memcg as congested if all the dirty pages were marked
+	 * for writeback and immediate reclaim (counted in nr.congested).
 	 *
 	 * Legacy memcg will stall in page writeback so avoid forcibly
 	 * stalling in reclaim_throttle().
@@ -3241,8 +3240,8 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 		set_bit(LRUVEC_CONGESTED, &target_lruvec->flags);
 
 	/*
-	 * Stall direct reclaim for IO completions if underlying BDIs
-	 * and node is congested. Allow kswapd to continue until it
+	 * Stall direct reclaim for IO completions if the lruvec is
+	 * node is congested. Allow kswapd to continue until it
 	 * starts encountering unqueued dirty pages or cycling through
 	 * the LRU too quickly.
 	 */
@@ -4427,7 +4426,7 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
 
 	trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, highest_zoneidx, order,
 				      gfp_flags);
-	wake_up_interruptible(&pgdat->kswapd_wait);
+	wake_up_all(&pgdat->kswapd_wait);
 }
 
 #ifdef CONFIG_HIBERNATION

  reply	other threads:[~2021-09-21 11:12 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-20  8:54 [RFC PATCH 0/5] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-09-20  8:54 ` [PATCH 1/5] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman
2021-09-20 23:19   ` NeilBrown
2021-09-21 11:12     ` Mel Gorman [this message]
2021-09-21 21:27       ` NeilBrown
2021-09-21  0:13   ` NeilBrown
2021-09-21 10:58     ` Mel Gorman
2021-09-21 21:40       ` NeilBrown
2021-09-22  6:04       ` Dave Chinner
2021-09-22  8:03         ` Mel Gorman
2021-09-22 12:16   ` Hillf Danton
2021-09-22 14:13     ` Mel Gorman
2021-09-20  8:54 ` [PATCH 2/5] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Mel Gorman
2021-09-20 23:27   ` NeilBrown
2021-09-21 11:03     ` Mel Gorman
2021-09-21 18:45   ` Yang Shi
2021-09-21 18:45     ` Yang Shi
2021-09-22  8:11     ` Mel Gorman
2021-09-20  8:54 ` [PATCH 3/5] mm/vmscan: Throttle reclaim when no progress is being made Mel Gorman
2021-09-20 23:31   ` NeilBrown
2021-09-21 11:16     ` Mel Gorman
2021-09-21 21:46       ` NeilBrown
2021-09-22  9:21         ` Mel Gorman
2021-09-20  8:54 ` [PATCH 4/5] mm/writeback: Throttle based on page writeback instead of congestion Mel Gorman
2021-09-20  8:54 ` [PATCH 5/5] mm/page_alloc: Remove the throttling logic from the page allocator Mel Gorman
2021-09-20 11:42 ` [RFC PATCH 0/5] Remove dependency on congestion_wait in mm/ Matthew Wilcox
2021-09-20 12:50   ` Mel Gorman
2021-09-20 14:11     ` David Sterba
2021-09-21 11:18       ` Mel Gorman
2021-09-20 19:51   ` Mel Gorman
2021-09-21 20:46 ` Dave Chinner
2021-09-22 17:52   ` Mel Gorman
2021-09-29 10:09 [PATCH 0/5] Remove dependency on congestion_wait in mm/ v2 Mel Gorman
2021-09-29 10:09 ` [PATCH 1/5] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210921111234.GQ3959@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=adilger.kernel@dilger.ca \
    --cc=corbet@lwn.net \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=neilb@suse.de \
    --cc=riel@surriel.com \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --subject='Re: [PATCH 1/5] mm/vmscan: Throttle reclaim until some writeback completes if congested' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.