All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	NeilBrown <neilb@suse.de>, Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Dave Chinner <david@fromorbit.com>,
	Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>, Linux-MM <linux-mm@kvack.org>,
	Linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/8] mm/vmscan: Throttle reclaim when no progress is being made
Date: Wed, 24 Nov 2021 14:35:59 +0000	[thread overview]
Message-ID: <20211124143559.GI3366@techsingularity.net> (raw)
In-Reply-To: <20211124014914.GB265983@magnolia>

On Tue, Nov 23, 2021 at 05:49:14PM -0800, Darrick J. Wong wrote:
> > Ever since Christoph broke swapfiles, I've been carrying around a little
> > fstest in my dev tree[1] that tries to exercise paging things in and out
> > of a swapfile.  Sadly I've been trapped in about three dozen customer
> > escalations for over a month, which means I haven't been able to do much
> > upstream in weeks.  Like submit this test upstream. :(
> > 
> > Now that I've finally gotten around to trying out a 5.16-rc2 build, I
> > notice that the runtime of this test has gone from ~5s to 2 hours.
> > Among other things that it does, the test sets up a cgroup with a memory
> > controller limiting the memory usage to 25MB, then runs a program that
> > tries to dirty 50MB of memory.  There's 2GB of memory in the VM, so
> > we're not running reclaim globally, but the cgroup gets throttled very
> > severely.
> > 
> > AFAICT the system is mostly idle, but it's difficult to tell because ps
> > and top also get stuck waiting for this cgroup for whatever reason.  My
> > uninformed spculation is that usemem_and_swapoff takes a page fault
> > while dirtying the 50MB memory buffer, prepares to pull a page in from
> > swap, tries to evict another page to stay under the memcg limit, but
> > that decides that it's making no progress and calls
> > reclaim_throttle(..., VMSCAN_THROTTLE_NOPROGRESS).
> > 
> > The sleep is uninterruptible, so I can't even kill -9 fstests to shut it
> > down.  Eventually we either finish the test or (for the mlock part) the
> > OOM killer actually kills the process, but this takes a very long time.
> > 
> > Any thoughts?  For now I can just hack around this by skipping
> > reclaim_throttle if cgroup_reclaim() == true, but that's probably not
> > the correct fix. :)
> 
> Update: after adding timing information to usemem_and_swapoff, it looks
> like dirtying the 50MB buffer takes ~22s (up from 0.06s on 5.15).  The
> mlock call stalls for ~280s until the OOM killer kills it (up from
> nearly instantaneous on 5.15), and the swapon/swapoff variant takes
> 20 minutes to hours depending on the run.
> 

Can you try the patch below please? I think I'm running the test
correctly and it finishes for me in 16 seconds with this applied

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 07db03883062..d9166e94eb95 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1057,7 +1057,17 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)
 
 		break;
 	case VMSCAN_THROTTLE_NOPROGRESS:
-		timeout = HZ/2;
+		timeout = 1;
+
+		/*
+		 * If kswapd is disabled, reschedule if necessary but do not
+		 * throttle as the system is likely near OOM.
+		 */
+		if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES) {
+			cond_resched();
+			return;
+		}
+
 		break;
 	case VMSCAN_THROTTLE_ISOLATED:
 		timeout = HZ/50;
@@ -3395,7 +3405,7 @@ static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc)
 		return;
 
 	/* Throttle if making no progress at high prioities. */
-	if (sc->priority < DEF_PRIORITY - 2)
+	if (sc->priority < DEF_PRIORITY - 2 && !sc->nr_reclaimed)
 		reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS);
 }
 
@@ -3415,6 +3425,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 	unsigned long nr_soft_scanned;
 	gfp_t orig_mask;
 	pg_data_t *last_pgdat = NULL;
+	pg_data_t *first_pgdat = NULL;
 
 	/*
 	 * If the number of buffer_heads in the machine exceeds the maximum
@@ -3478,14 +3489,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 			/* need some check for avoid more shrink_zone() */
 		}
 
+		if (!first_pgdat)
+			first_pgdat = zone->zone_pgdat;
+
 		/* See comment about same check for global reclaim above */
 		if (zone->zone_pgdat == last_pgdat)
 			continue;
 		last_pgdat = zone->zone_pgdat;
 		shrink_node(zone->zone_pgdat, sc);
-		consider_reclaim_throttle(zone->zone_pgdat, sc);
 	}
 
+	consider_reclaim_throttle(first_pgdat, sc);
+
 	/*
 	 * Restore to original mask to avoid the impact on the caller if we
 	 * promoted it to __GFP_HIGHMEM.

  reply	other threads:[~2021-11-24 14:36 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-22 14:46 [PATCH v5 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-22 14:46 ` [PATCH 1/8] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman
2021-10-22 14:46 ` [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Mel Gorman
2021-10-22 14:46 ` [PATCH 3/8] mm/vmscan: Throttle reclaim when no progress is being made Mel Gorman
2021-11-24  1:19   ` Darrick J. Wong
2021-11-24  1:49     ` Darrick J. Wong
2021-11-24 14:35       ` Mel Gorman [this message]
2021-11-24 18:02         ` Darrick J. Wong
2021-11-24 10:32     ` Mel Gorman
2021-11-24 10:43       ` Vlastimil Babka
2021-11-24 10:53         ` Mel Gorman
2021-11-24 17:24       ` Mike Galbraith
2021-10-22 14:46 ` [PATCH 4/8] mm/writeback: Throttle based on page writeback instead of congestion Mel Gorman
2021-10-22 14:46 ` [PATCH 5/8] mm/page_alloc: Remove the throttling logic from the page allocator Mel Gorman
2021-10-25 10:07   ` Vlastimil Babka
2021-10-22 14:46 ` [PATCH 6/8] mm/vmscan: Centralise timeout values for reclaim_throttle Mel Gorman
2021-10-22 14:46 ` [PATCH 7/8] mm/vmscan: Increase the timeout if page reclaim is not making progress Mel Gorman
2021-10-22 14:46 ` [PATCH 8/8] mm/vmscan: Delay waking of tasks throttled on NOPROGRESS Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2021-10-19  9:01 [PATCH v4 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-19  9:01 ` [PATCH 3/8] mm/vmscan: Throttle reclaim when no progress is being made Mel Gorman
2021-10-08 13:53 [PATCH v3 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-08 13:53 ` [PATCH 3/8] mm/vmscan: Throttle reclaim when no progress is being made Mel Gorman
2021-10-14 12:31   ` Vlastimil Babka
2021-10-14 13:03     ` Mel Gorman
2021-10-14 15:45       ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211124143559.GI3366@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=neilb@suse.de \
    --cc=riel@surriel.com \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.