linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: mm: 5.16 regression: reclaim_throttle leads to stall in near-OOM conditions
       [not found] <20211124011954.7cab9bb4@mail.inbox.lv>
@ 2021-11-24  7:40 ` Thorsten Leemhuis
  2021-11-24 10:35 ` Mel Gorman
  1 sibling, 0 replies; 6+ messages in thread
From: Thorsten Leemhuis @ 2021-11-24  7:40 UTC (permalink / raw)
  To: Alexey Avramov, linux-mm
  Cc: linux-kernel, mgorman, mhocko, vbabka, neilb, akpm, corbet, riel,
	hannes, david, willy, hdanton, penguin-kernel, oleksandr, kernel,
	michael, aros, hakavlad, regressions

Hi, this is your Linux kernel regression tracker speaking.

CCing regression mailing list, which should be in the loop for all
regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

On 23.11.21 17:19, Alexey Avramov wrote:
> I found stalls in near-OOM conditions with Linux 5.16. This is not the
> hang-up that was reported by Artem S. Tashkinov in 2019 [1]. It's a *new* 
> regression. I will demonstrate this with one simple experiment, which I
> will reproduce with different kernels or settings.
> 
> With older versions of the kernel, running the `tail /dev/zero` command
> usually quickly leads to OOM condition.
> 
> I will run the command `for i in {1...3}; do tail /dev/zero; done` and log
> PSI metrics (using psi2log script from nohang v0.2.0 [2]) and some values
> from `/proc/meminfo` (using mem2log v0.1.0 [3]) while this command is
> running. During the experiment a single tab browser will be kept opened in
> which some video will be playing.
> [...]
TWIMC: To be sure this issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced v5.15..v5.16-rc1
#regzbot title mm: reclaim_throttle leads to stall in near-OOM conditions
#regzbot ignore-activity

Ciao, Thorsten, your Linux kernel regression tracker.

P.S.: If you want to know more about regzbot, check out its
web-interface, the getting start guide, and/or the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for the reporter: when reporting a regression it's in your interest
to tell #regzbot about it in the report, as that will ensure the
regression gets on the radar of regzbot and the regression tracker.
That's in your interest, as they will make sure the report won't fall
through the cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot, just
fix the issue as you normally would. Just remember to include a 'Link:'
tag to the report in the commit message, as explained in
Documentation/process/submitting-patches.rst
That aspect was recently was made more explicit in commit 1f57bd42b77c:
https://git.kernel.org/linus/1f57bd42b77c

P.P.S.: As a Linux kernel regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them.
Unfortunately therefore I sometimes will get things wrong or miss
something important. I hope that's not the case here; if you think it
is, don't hesitate to tell me about it in a public reply. That's in
everyone's interest, as what I wrote above might be misleading to
everyone reading this; any suggestion I gave they thus might sent
someone reading this down the wrong rabbit hole, which none of us wants.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mm: 5.16 regression: reclaim_throttle leads to stall in near-OOM conditions
       [not found] <20211124011954.7cab9bb4@mail.inbox.lv>
  2021-11-24  7:40 ` mm: 5.16 regression: reclaim_throttle leads to stall in near-OOM conditions Thorsten Leemhuis
@ 2021-11-24 10:35 ` Mel Gorman
       [not found]   ` <20211124195449.33f31e7f@mail.inbox.lv>
  1 sibling, 1 reply; 6+ messages in thread
From: Mel Gorman @ 2021-11-24 10:35 UTC (permalink / raw)
  To: Alexey Avramov
  Cc: linux-mm, linux-kernel, mhocko, vbabka, neilb, akpm, corbet,
	riel, hannes, david, willy, hdanton, penguin-kernel, oleksandr,
	kernel, michael, aros, hakavlad

On Wed, Nov 24, 2021 at 01:19:54AM +0900, Alexey Avramov wrote:
> I found stalls in near-OOM conditions with Linux 5.16. This is not the
> hang-up that was reported by Artem S. Tashkinov in 2019 [1]. It's a *new* 
> regression. I will demonstrate this with one simple experiment, which I
> will reproduce with different kernels or settings.
> 
> With older versions of the kernel, running the `tail /dev/zero` command
> usually quickly leads to OOM condition.
> 
> I will run the command `for i in {1...3}; do tail /dev/zero; done` and log
> PSI metrics (using psi2log script from nohang v0.2.0 [2]) and some values
> from `/proc/meminfo` (using mem2log v0.1.0 [3]) while this command is
> running. During the experiment a single tab browser will be kept opened in
> which some video will be playing.
> 

Ok, I can reproduce this. However, it does eventually get killed OOM so
the system makes progress but maybe the throttling should be for very
short intervals if failing to make progress and there have been multiple
reclaim failures recently. Disabling the throttling entirely just results
in cases where 100% CPU is used spinning through lru lists.

Thanks for the report

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mm: 5.16 regression: reclaim_throttle leads to stall in near-OOM conditions
       [not found]   ` <20211124195449.33f31e7f@mail.inbox.lv>
@ 2021-11-24 11:50     ` Mel Gorman
       [not found]       ` <20211124214443.5c179d34@mail.inbox.lv>
  0 siblings, 1 reply; 6+ messages in thread
From: Mel Gorman @ 2021-11-24 11:50 UTC (permalink / raw)
  To: Alexey Avramov
  Cc: linux-mm, linux-kernel, mhocko, vbabka, neilb, akpm, corbet,
	riel, hannes, david, willy, hdanton, penguin-kernel, oleksandr,
	kernel, michael, aros, hakavlad

On Wed, Nov 24, 2021 at 07:54:49PM +0900, Alexey Avramov wrote:
> > it does eventually get killed OOM
> 
> However, a full minute freeze can be a great evil in many situations - 
> during such a freeze, the system is completely unresponsive. 
> 
> So my next question is: How reasonable is the value MAX_RECLAIM_RETRIES?
> Is it also get "out of thin air"?
> 

The value is out of thin air but adjusting it may reintroduce issues
with kswapd running at 100% CPU.

> And would it make sense to have buttons to adjust the timeouts?

I don't think we should introduce a tunable for something like this,
it'll be impossible to use properly but can you test this?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 07db03883062..aa72c0f39dcc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1058,6 +1058,14 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)
 		break;
 	case VMSCAN_THROTTLE_NOPROGRESS:
 		timeout = HZ/2;
+
+		/*
+		 * If kswapd is disabled, use the minimum timeout as the
+		 * system may be at or near OOM.
+		 */
+		if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
+			timeout = 1;
+
 		break;
 	case VMSCAN_THROTTLE_ISOLATED:
 		timeout = HZ/50;
@@ -3395,7 +3403,7 @@ static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc)
 		return;
 
 	/* Throttle if making no progress at high prioities. */
-	if (sc->priority < DEF_PRIORITY - 2)
+	if (sc->priority < DEF_PRIORITY - 2 && !sc->nr_reclaimed)
 		reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS);
 }
 
@@ -3415,6 +3423,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 	unsigned long nr_soft_scanned;
 	gfp_t orig_mask;
 	pg_data_t *last_pgdat = NULL;
+	pg_data_t *first_pgdat = NULL;
 
 	/*
 	 * If the number of buffer_heads in the machine exceeds the maximum
@@ -3478,14 +3487,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 			/* need some check for avoid more shrink_zone() */
 		}
 
+		if (!first_pgdat)
+			first_pgdat = zone->zone_pgdat;
+
 		/* See comment about same check for global reclaim above */
 		if (zone->zone_pgdat == last_pgdat)
 			continue;
 		last_pgdat = zone->zone_pgdat;
 		shrink_node(zone->zone_pgdat, sc);
-		consider_reclaim_throttle(zone->zone_pgdat, sc);
 	}
 
+	consider_reclaim_throttle(first_pgdat, sc);
+
 	/*
 	 * Restore to original mask to avoid the impact on the caller if we
 	 * promoted it to __GFP_HIGHMEM.


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: mm: 5.16 regression: reclaim_throttle leads to stall in near-OOM conditions
       [not found]       ` <20211124214443.5c179d34@mail.inbox.lv>
@ 2021-11-24 14:33         ` Mel Gorman
       [not found]           ` <20211127010631.4e33a432@mail.inbox.lv>
  0 siblings, 1 reply; 6+ messages in thread
From: Mel Gorman @ 2021-11-24 14:33 UTC (permalink / raw)
  To: Alexey Avramov
  Cc: linux-mm, linux-kernel, mhocko, vbabka, neilb, akpm, corbet,
	riel, hannes, david, willy, hdanton, penguin-kernel, oleksandr,
	kernel, michael, aros, hakavlad

On Wed, Nov 24, 2021 at 09:44:43PM +0900, Alexey Avramov wrote:
> > can you test this?
> 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> 
> Sorry, I didn't notice the diff you provided right away.
> 
> Now I've tested it and the result is the same: 1 min stall:
> 
> $ mem2log
> Starting mem2log with interval 2s, mode: 1
> Process memory locked with MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT
> All values are in mebibytes
> MemTotal: 11798.5, SwapTotal: 0.0

Curious that it's the same, it reduced the time to OOM for me quite a
bit. Another version is in a diff below. It special cases NOPROGRESS
to not stall at all if kswapd is disabled and otherwise stall for the
shortest possible duration. For my tests, it almost always hits OOM in
the same time as 5.15 with one corner case but OOM may still be delayed if
kswapd active or there are a lot of pages under writeback as there is the
possibility the system can make forward progress when writeback completes.

From another mail, you wrote

> My dissatisfaction is caused by the fact that the scale has now
> tipped sharply in favor of stall.

Understandable but the old throttling mechanism was functionally broken and
without some sort of throttling, CPU usage due to excessive LRU scanning
causes a different class of bugs.

> Although even before this change, users complained about the inability
> to wait for OOM:
> https://lore.kernel.org/lkml/d9802b6a-949b-b327-c4a6-3dbca485ec20@gmx.com/

I think there might be an unwritten mm law now that someone is always
unhappy with OOM behaviour :(

Please let me know if this version works any better

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 07db03883062..d9166e94eb95 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1057,7 +1057,17 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)
 
 		break;
 	case VMSCAN_THROTTLE_NOPROGRESS:
-		timeout = HZ/2;
+		timeout = 1;
+
+		/*
+		 * If kswapd is disabled, reschedule if necessary but do not
+		 * throttle as the system is likely near OOM.
+		 */
+		if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES) {
+			cond_resched();
+			return;
+		}
+
 		break;
 	case VMSCAN_THROTTLE_ISOLATED:
 		timeout = HZ/50;
@@ -3395,7 +3405,7 @@ static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc)
 		return;
 
 	/* Throttle if making no progress at high prioities. */
-	if (sc->priority < DEF_PRIORITY - 2)
+	if (sc->priority < DEF_PRIORITY - 2 && !sc->nr_reclaimed)
 		reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS);
 }
 
@@ -3415,6 +3425,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 	unsigned long nr_soft_scanned;
 	gfp_t orig_mask;
 	pg_data_t *last_pgdat = NULL;
+	pg_data_t *first_pgdat = NULL;
 
 	/*
 	 * If the number of buffer_heads in the machine exceeds the maximum
@@ -3478,14 +3489,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 			/* need some check for avoid more shrink_zone() */
 		}
 
+		if (!first_pgdat)
+			first_pgdat = zone->zone_pgdat;
+
 		/* See comment about same check for global reclaim above */
 		if (zone->zone_pgdat == last_pgdat)
 			continue;
 		last_pgdat = zone->zone_pgdat;
 		shrink_node(zone->zone_pgdat, sc);
-		consider_reclaim_throttle(zone->zone_pgdat, sc);
 	}
 
+	consider_reclaim_throttle(first_pgdat, sc);
+
 	/*
 	 * Restore to original mask to avoid the impact on the caller if we
 	 * promoted it to __GFP_HIGHMEM.
-- 
Mel Gorman
SUSE Labs


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: mm: 5.16 regression: reclaim_throttle leads to stall in near-OOM conditions
       [not found]           ` <20211127010631.4e33a432@mail.inbox.lv>
@ 2021-11-26 16:24             ` Mel Gorman
  2021-12-20  8:50               ` Sultan Alsawaf
  0 siblings, 1 reply; 6+ messages in thread
From: Mel Gorman @ 2021-11-26 16:24 UTC (permalink / raw)
  To: Alexey Avramov
  Cc: linux-mm, linux-kernel, mhocko, vbabka, neilb, akpm, corbet,
	riel, hannes, david, willy, hdanton, penguin-kernel, oleksandr,
	kernel, michael, aros, hakavlad

On Sat, Nov 27, 2021 at 01:06:31AM +0900, Alexey Avramov wrote:
> >Please let me know if this version works any better
> 
> It's better, but not the same as 5.15.
> 
> Sometimes stall is short, sometimes is long (3 `tail /dev/zero` test):
> 

It's somewhat expected. If the system is able to make some sort of
progress and kswapd is active, it'll throttle until progress is
impossible. It'll be somewhat variable how long it can keep making
progress be it discarding page cache or writing to swap but it'll only
OOM when the system is truly OOM.

Might be worth trying the patch below on top. It will delay throttling
for longer with the caveat that CPU usage due to reclaim when very low
on memory may be excessive.

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 176ddd28df21..167ea4f324a8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3404,8 +3404,8 @@ static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc)
 	if (current_is_kswapd())
 		return;
 
-	/* Throttle if making no progress at high prioities. */
-	if (sc->priority < DEF_PRIORITY - 2 && !sc->nr_reclaimed)
+	/* Throttle if making no progress at high priority. */
+	if (sc->priority == 1 && !sc->nr_reclaimed)
 		reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS);
 }



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: mm: 5.16 regression: reclaim_throttle leads to stall in near-OOM conditions
  2021-11-26 16:24             ` Mel Gorman
@ 2021-12-20  8:50               ` Sultan Alsawaf
  0 siblings, 0 replies; 6+ messages in thread
From: Sultan Alsawaf @ 2021-12-20  8:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Avramov, linux-mm, linux-kernel, mhocko, vbabka, neilb,
	akpm, corbet, riel, hannes, david, willy, hdanton,
	penguin-kernel, oleksandr, kernel, michael, aros, hakavlad

On Fri, Nov 26, 2021 at 04:24:16PM +0000, Mel Gorman wrote:
> It's somewhat expected. If the system is able to make some sort of
> progress and kswapd is active, it'll throttle until progress is
> impossible. It'll be somewhat variable how long it can keep making
> progress be it discarding page cache or writing to swap but it'll only
> OOM when the system is truly OOM.
> 
> Might be worth trying the patch below on top. It will delay throttling
> for longer with the caveat that CPU usage due to reclaim when very low
> on memory may be excessive.

Mel,

Perhaps my old submission [1] could be helpful here? I could send a refreshed
version if you're interested. Using wall time to throttle reclaim seems quite
catastrophic IMO, given the inherent assumptions it makes about the running
system's performance characteristics and its workloads.

My patch tackles the issue from the opposite direction: rather than throttling
when there's no reclaim progress to be made, my approach stops kswapd early when
there is no longer any need for reclaim, which conveniently doesn't require any
sort of tunable or heuristic since kswapd can just be immediately woken up again
right after if needed.

Looking back, it seems your chief complaint was that my patch may stop kswapd
before it could reclaim up to the high watermark, which could thereby introduce
stalls; however, I've never run into any such issue in my testing, and neither
have the several people who use my patch under a wide range of setups.

[1] https://lore.kernel.org/linux-mm/20200219182522.1960-1-sultan@kerneltoast.com/

Sultan


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-12-20  8:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20211124011954.7cab9bb4@mail.inbox.lv>
2021-11-24  7:40 ` mm: 5.16 regression: reclaim_throttle leads to stall in near-OOM conditions Thorsten Leemhuis
2021-11-24 10:35 ` Mel Gorman
     [not found]   ` <20211124195449.33f31e7f@mail.inbox.lv>
2021-11-24 11:50     ` Mel Gorman
     [not found]       ` <20211124214443.5c179d34@mail.inbox.lv>
2021-11-24 14:33         ` Mel Gorman
     [not found]           ` <20211127010631.4e33a432@mail.inbox.lv>
2021-11-26 16:24             ` Mel Gorman
2021-12-20  8:50               ` Sultan Alsawaf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).