All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thorsten Leemhuis <regressions@leemhuis.info>
To: Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
	Alexey Avramov <hakavlad@inbox.lv>,
	Rik van Riel <riel@surriel.com>, Mike Galbraith <efault@gmx.de>,
	Darrick Wong <djwong@kernel.org>,
	regressions@lists.linux.dev,
	Linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 1/1] mm: vmscan: Reduce throttling due to a failure to make progress
Date: Tue, 28 Dec 2021 11:04:18 +0100	[thread overview]
Message-ID: <caf247ab-f6fe-a3b9-c4b5-7ce17d1d5e43@leemhuis.info> (raw)
In-Reply-To: <20211202150614.22440-1-mgorman@techsingularity.net>

Hi, this is your Linux kernel regression tracker speaking.

On 02.12.21 16:06, Mel Gorman wrote:
> Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
> problems due to reclaim throttling for excessive lengths of time.
> In Alexey's case, a memory hog that should go OOM quickly stalls for
> several minutes before stalling. In Mike and Darrick's cases, a small
> memcg environment stalled excessively even though the system had enough
> memory overall.

Just wondering: this patch afaics is now in -mm and  Linux next for
nearly two weeks. Is that intentional? I had expected it to be mainlined
with the batch of patches Andrew mailed to Linus last week, but it
wasn't among them.

Or am I missing something?

Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat)

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply, that's in everyone's interest.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.


> Commit 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being
> made") introduced the problem although commit a19594ca4a8b ("mm/vmscan:
> increase the timeout if page reclaim is not making progress") made it
> worse. Systems at or near an OOM state that cannot be recovered must
> reach OOM quickly and memcg should kill tasks if a memcg is near OOM.
> 
> To address this, only stall for the first zone in the zonelist, reduce
> the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if
> the scan control nr_reclaimed is 0, kswapd is still active and there were
> excessive pages pending for writeback. If kswapd has stopped reclaiming due
> to excessive failures, do not stall at all so that OOM triggers relatively
> quickly. Similarly, if an LRU is simply congested, only lightly throttle
> similar to NOPROGRESS.
> 
> Alexey's original case was the most straight forward
> 
> 	for i in {1..3}; do tail /dev/zero; done
> 
> On vanilla 5.16-rc1, this test stalled heavily, after the patch the test
> completes in a few seconds similar to 5.15.
> 
> Alexey's second test case added watching a youtube video while tail runs
> 10 times. On 5.15, playback only jitters slightly, 5.16-rc1 stalls a lot
> with lots of frames missing and numerous audio glitches. With this patch
> applies, the video plays similarly to 5.15.
> 
> Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de
> Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv
> Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net
> 
> [lkp@intel.com: Fix W=1 build warning]
> Reported-and-tested-by: Alexey Avramov <hakavlad@inbox.lv>
> Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
> Reported-and-tested-by: Darrick J. Wong <djwong@kernel.org>
> Reported-by: kernel test robot <lkp@intel.com>
> Fixes: 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made")
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  include/linux/mmzone.h        |  1 +
>  include/trace/events/vmscan.h |  4 ++-
>  mm/vmscan.c                   | 64 ++++++++++++++++++++++++++++++-----
>  3 files changed, 59 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 58e744b78c2c..936dc0b6c226 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -277,6 +277,7 @@ enum vmscan_throttle_state {
>  	VMSCAN_THROTTLE_WRITEBACK,
>  	VMSCAN_THROTTLE_ISOLATED,
>  	VMSCAN_THROTTLE_NOPROGRESS,
> +	VMSCAN_THROTTLE_CONGESTED,
>  	NR_VMSCAN_THROTTLE,
>  };
>  
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index f25a6149d3ba..ca2e9009a651 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -30,12 +30,14 @@
>  #define _VMSCAN_THROTTLE_WRITEBACK	(1 << VMSCAN_THROTTLE_WRITEBACK)
>  #define _VMSCAN_THROTTLE_ISOLATED	(1 << VMSCAN_THROTTLE_ISOLATED)
>  #define _VMSCAN_THROTTLE_NOPROGRESS	(1 << VMSCAN_THROTTLE_NOPROGRESS)
> +#define _VMSCAN_THROTTLE_CONGESTED	(1 << VMSCAN_THROTTLE_CONGESTED)
>  
>  #define show_throttle_flags(flags)						\
>  	(flags) ? __print_flags(flags, "|",					\
>  		{_VMSCAN_THROTTLE_WRITEBACK,	"VMSCAN_THROTTLE_WRITEBACK"},	\
>  		{_VMSCAN_THROTTLE_ISOLATED,	"VMSCAN_THROTTLE_ISOLATED"},	\
> -		{_VMSCAN_THROTTLE_NOPROGRESS,	"VMSCAN_THROTTLE_NOPROGRESS"}	\
> +		{_VMSCAN_THROTTLE_NOPROGRESS,	"VMSCAN_THROTTLE_NOPROGRESS"},	\
> +		{_VMSCAN_THROTTLE_CONGESTED,	"VMSCAN_THROTTLE_CONGESTED"}	\
>  		) : "VMSCAN_THROTTLE_NONE"
>  
>  
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fb9584641ac7..4c4d5f6cd8a3 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1021,6 +1021,39 @@ static void handle_write_error(struct address_space *mapping,
>  	unlock_page(page);
>  }
>  
> +static bool skip_throttle_noprogress(pg_data_t *pgdat)
> +{
> +	int reclaimable = 0, write_pending = 0;
> +	int i;
> +
> +	/*
> +	 * If kswapd is disabled, reschedule if necessary but do not
> +	 * throttle as the system is likely near OOM.
> +	 */
> +	if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
> +		return true;
> +
> +	/*
> +	 * If there are a lot of dirty/writeback pages then do not
> +	 * throttle as throttling will occur when the pages cycle
> +	 * towards the end of the LRU if still under writeback.
> +	 */
> +	for (i = 0; i < MAX_NR_ZONES; i++) {
> +		struct zone *zone = pgdat->node_zones + i;
> +
> +		if (!populated_zone(zone))
> +			continue;
> +
> +		reclaimable += zone_reclaimable_pages(zone);
> +		write_pending += zone_page_state_snapshot(zone,
> +						  NR_ZONE_WRITE_PENDING);
> +	}
> +	if (2 * write_pending <= reclaimable)
> +		return true;
> +
> +	return false;
> +}
> +
>  void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)
>  {
>  	wait_queue_head_t *wqh = &pgdat->reclaim_wait[reason];
> @@ -1056,8 +1089,16 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)
>  		}
>  
>  		break;
> +	case VMSCAN_THROTTLE_CONGESTED:
> +		fallthrough;
>  	case VMSCAN_THROTTLE_NOPROGRESS:
> -		timeout = HZ/2;
> +		if (skip_throttle_noprogress(pgdat)) {
> +			cond_resched();
> +			return;
> +		}
> +
> +		timeout = 1;
> +
>  		break;
>  	case VMSCAN_THROTTLE_ISOLATED:
>  		timeout = HZ/50;
> @@ -3321,7 +3362,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
>  	if (!current_is_kswapd() && current_may_throttle() &&
>  	    !sc->hibernation_mode &&
>  	    test_bit(LRUVEC_CONGESTED, &target_lruvec->flags))
> -		reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
> +		reclaim_throttle(pgdat, VMSCAN_THROTTLE_CONGESTED);
>  
>  	if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
>  				    sc))
> @@ -3386,16 +3427,16 @@ static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc)
>  	}
>  
>  	/*
> -	 * Do not throttle kswapd on NOPROGRESS as it will throttle on
> -	 * VMSCAN_THROTTLE_WRITEBACK if there are too many pages under
> -	 * writeback and marked for immediate reclaim at the tail of
> -	 * the LRU.
> +	 * Do not throttle kswapd or cgroup reclaim on NOPROGRESS as it will
> +	 * throttle on VMSCAN_THROTTLE_WRITEBACK if there are too many pages
> +	 * under writeback and marked for immediate reclaim at the tail of the
> +	 * LRU.
>  	 */
> -	if (current_is_kswapd())
> +	if (current_is_kswapd() || cgroup_reclaim(sc))
>  		return;
>  
>  	/* Throttle if making no progress at high prioities. */
> -	if (sc->priority < DEF_PRIORITY - 2)
> +	if (sc->priority == 1 && !sc->nr_reclaimed)
>  		reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS);
>  }
>  
> @@ -3415,6 +3456,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>  	unsigned long nr_soft_scanned;
>  	gfp_t orig_mask;
>  	pg_data_t *last_pgdat = NULL;
> +	pg_data_t *first_pgdat = NULL;
>  
>  	/*
>  	 * If the number of buffer_heads in the machine exceeds the maximum
> @@ -3478,14 +3520,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>  			/* need some check for avoid more shrink_zone() */
>  		}
>  
> +		if (!first_pgdat)
> +			first_pgdat = zone->zone_pgdat;
> +
>  		/* See comment about same check for global reclaim above */
>  		if (zone->zone_pgdat == last_pgdat)
>  			continue;
>  		last_pgdat = zone->zone_pgdat;
>  		shrink_node(zone->zone_pgdat, sc);
> -		consider_reclaim_throttle(zone->zone_pgdat, sc);
>  	}
>  
> +	consider_reclaim_throttle(first_pgdat, sc);
> +
>  	/*
>  	 * Restore to original mask to avoid the impact on the caller if we
>  	 * promoted it to __GFP_HIGHMEM.


  parent reply	other threads:[~2021-12-28 10:04 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-02 15:06 [PATCH v4 1/1] mm: vmscan: Reduce throttling due to a failure to make progress Mel Gorman
2021-12-02 16:30 ` Shakeel Butt
2021-12-02 16:52   ` Mel Gorman
2021-12-02 17:41     ` Shakeel Butt
2021-12-03  9:01       ` Mel Gorman
2021-12-03 17:50         ` Shakeel Butt
2021-12-03 19:08           ` Mel Gorman
2021-12-06  6:06             ` Shakeel Butt
2021-12-06 11:25               ` Mel Gorman
2021-12-07  7:14                 ` Shakeel Butt
2021-12-07  9:28                   ` Mel Gorman
2021-12-09  6:20 ` Hugh Dickins
2021-12-09  9:53   ` Mel Gorman
2021-12-28 10:04 ` Thorsten Leemhuis [this message]
2021-12-29 23:45   ` Andrew Morton
2021-12-31 14:24     ` Thorsten Leemhuis
2021-12-31 18:33       ` Hugh Dickins
2021-12-31 19:14       ` Linus Torvalds
2021-12-31 19:21         ` Linus Torvalds
2021-12-31 19:22           ` Linus Torvalds
2022-01-01 10:52             ` Thorsten Leemhuis
2021-12-31 21:04           ` Andrew Morton
2021-12-31 21:18             ` Linus Torvalds
2022-02-14 21:10 ` Shuang Zhai
2022-02-15 14:49   ` Mel Gorman
2022-02-22 17:27     ` [PATCH v4 1/1] mm: vmscan: Reduce throttling due to a failure to make progress' Shuang Zhai
2022-02-23 12:50       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=caf247ab-f6fe-a3b9-c4b5-7ce17d1d5e43@leemhuis.info \
    --to=regressions@leemhuis.info \
    --cc=akpm@linux-foundation.org \
    --cc=broonie@kernel.org \
    --cc=djwong@kernel.org \
    --cc=efault@gmx.de \
    --cc=hakavlad@inbox.lv \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=regressions@lists.linux.dev \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.