linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hillf Danton <hdanton@sina.com>
To: Ivan Babrou <ivan@cloudflare.com>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	mm <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>
Subject: Re: Reclaim regression after 1c30844d2dfe
Date: Sat,  8 Feb 2020 19:11:33 +0800	[thread overview]
Message-ID: <20200208111133.16808-1-hdanton@sina.com> (raw)
In-Reply-To: <CABWYdi1eOUD1DHORJxTsWPMT3BcZhz++xP1pXhT=x4SgxtgQZA@mail.gmail.com>


On Fri, 7 Feb 2020 14:54:43 -0800 Ivan Babrou wrote:
> This change from 5.5 times:
> 
> * https://github.com/torvalds/linux/commit/1c30844d2dfe
> 
> > mm: reclaim small amounts of memory when an external fragmentation event occurs
> 
> Introduced undesired effects in our environment.
> 
> * NUMA with 2 x CPU
> * 128GB of RAM
> * THP disabled
> * Upgraded from 4.19 to 5.4
> 
> Before we saw free memory hover at around 1.4GB with no spikes. After
> the upgrade we saw some machines decide that they need a lot more than
> that, with frequent spikes above 10GB, often only on a single numa
> node.
> 
> We can see kswapd quite active in balance_pgdat (it didn't look like
> it slept at all):
> 
> $ ps uax | fgrep kswapd
> root       1850 23.0  0.0      0     0 ?        R    Jan30 1902:24 [kswapd0]
> root       1851  1.8  0.0      0     0 ?        S    Jan30 152:16 [kswapd1]
> 
> This in turn massively increased pressure on page cache, which did not
> go well to services that depend on having a quick response from a
> local cache backed by solid storage.
> 
> Here's how it looked like when I zeroed vm.watermark_boost_factor:
> 
> * https://imgur.com/a/6IZWicU
> 
> IO subsided from 100% busy in page cache population at 300MB/s on a
> single SATA drive down to under 100MB/s.
> 
> This sort of regression doesn't seem like a good thing.

Here are two small diffs :P

[1] cleanup: stop reclaiming pages once balanced.

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3641,6 +3641,9 @@ restart:
 		 * re-evaluate if boosting is required when kswapd next wakes.
 		 */
 		balanced = pgdat_balanced(pgdat, sc.order, classzone_idx);
+		if (balanced)
+			break;
+
 		if (!balanced && nr_boost_reclaim) {
 			nr_boost_reclaim = 0;
 			goto restart;
--

[2] restore the old behavior by ignoring boost before falling in hot water.

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3572,7 +3572,7 @@ static int balance_pgdat(pg_data_t *pgda
 	unsigned long pflags;
 	unsigned long nr_boost_reclaim;
 	unsigned long zone_boosts[MAX_NR_ZONES] = { 0, };
-	bool boosted;
+	bool boosted = false;
 	struct zone *zone;
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
@@ -3591,18 +3591,22 @@ static int balance_pgdat(pg_data_t *pgda
 	 * place so that parallel allocations that are near the watermark will
 	 * stall or direct reclaim until kswapd is finished.
 	 */
+restart:
 	nr_boost_reclaim = 0;
 	for (i = 0; i <= classzone_idx; i++) {
 		zone = pgdat->node_zones + i;
 		if (!managed_zone(zone))
 			continue;
 
+		if (boosted) {
+			zone->watermark_boost = 0;
+			continue;
+		}
 		nr_boost_reclaim += zone->watermark_boost;
 		zone_boosts[i] = zone->watermark_boost;
 	}
 	boosted = nr_boost_reclaim;
 
-restart:
 	sc.priority = DEF_PRIORITY;
 	do {
 		unsigned long nr_reclaimed = sc.nr_reclaimed;
@@ -3644,10 +3648,9 @@ restart:
 		if (balanced)
 			break;
 
-		if (!balanced && nr_boost_reclaim) {
-			nr_boost_reclaim = 0;
+		/* Limit the priority of boosting to avoid reclaim writeback */
+		if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
 			goto restart;
-		}
 
 		/*
 		 * If boosting is not active then only reclaim if there are no
--



  parent reply	other threads:[~2020-02-08 11:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07 22:54 Reclaim regression after 1c30844d2dfe Ivan Babrou
2020-02-07 23:05 ` Rik van Riel
2020-02-08  9:08   ` Vlastimil Babka
2020-02-08 11:11 ` Hillf Danton [this message]
2020-02-11 10:16 ` Mel Gorman
2020-02-12 22:45   ` Ivan Babrou
2020-02-12 23:55     ` Mel Gorman
2020-02-18 22:07       ` Ivan Babrou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200208111133.16808-1-hdanton@sina.com \
    --to=hdanton@sina.com \
    --cc=akpm@linux-foundation.org \
    --cc=ivan@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).