From: Hillf Danton <hdanton@sina.com>
To: Ivan Babrou <ivan@cloudflare.com>
Cc: Mel Gorman <mgorman@techsingularity.net>,
Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
mm <linux-mm@kvack.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
kernel-team <kernel-team@cloudflare.com>
Subject: Re: Reclaim regression after 1c30844d2dfe
Date: Sat, 8 Feb 2020 19:11:33 +0800 [thread overview]
Message-ID: <20200208111133.16808-1-hdanton@sina.com> (raw)
In-Reply-To: <CABWYdi1eOUD1DHORJxTsWPMT3BcZhz++xP1pXhT=x4SgxtgQZA@mail.gmail.com>
On Fri, 7 Feb 2020 14:54:43 -0800 Ivan Babrou wrote:
> This change from 5.5 times:
>
> * https://github.com/torvalds/linux/commit/1c30844d2dfe
>
> > mm: reclaim small amounts of memory when an external fragmentation event occurs
>
> Introduced undesired effects in our environment.
>
> * NUMA with 2 x CPU
> * 128GB of RAM
> * THP disabled
> * Upgraded from 4.19 to 5.4
>
> Before we saw free memory hover at around 1.4GB with no spikes. After
> the upgrade we saw some machines decide that they need a lot more than
> that, with frequent spikes above 10GB, often only on a single numa
> node.
>
> We can see kswapd quite active in balance_pgdat (it didn't look like
> it slept at all):
>
> $ ps uax | fgrep kswapd
> root 1850 23.0 0.0 0 0 ? R Jan30 1902:24 [kswapd0]
> root 1851 1.8 0.0 0 0 ? S Jan30 152:16 [kswapd1]
>
> This in turn massively increased pressure on page cache, which did not
> go well to services that depend on having a quick response from a
> local cache backed by solid storage.
>
> Here's how it looked like when I zeroed vm.watermark_boost_factor:
>
> * https://imgur.com/a/6IZWicU
>
> IO subsided from 100% busy in page cache population at 300MB/s on a
> single SATA drive down to under 100MB/s.
>
> This sort of regression doesn't seem like a good thing.
Here are two small diffs :P
[1] cleanup: stop reclaiming pages once balanced.
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3641,6 +3641,9 @@ restart:
* re-evaluate if boosting is required when kswapd next wakes.
*/
balanced = pgdat_balanced(pgdat, sc.order, classzone_idx);
+ if (balanced)
+ break;
+
if (!balanced && nr_boost_reclaim) {
nr_boost_reclaim = 0;
goto restart;
--
[2] restore the old behavior by ignoring boost before falling in hot water.
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3572,7 +3572,7 @@ static int balance_pgdat(pg_data_t *pgda
unsigned long pflags;
unsigned long nr_boost_reclaim;
unsigned long zone_boosts[MAX_NR_ZONES] = { 0, };
- bool boosted;
+ bool boosted = false;
struct zone *zone;
struct scan_control sc = {
.gfp_mask = GFP_KERNEL,
@@ -3591,18 +3591,22 @@ static int balance_pgdat(pg_data_t *pgda
* place so that parallel allocations that are near the watermark will
* stall or direct reclaim until kswapd is finished.
*/
+restart:
nr_boost_reclaim = 0;
for (i = 0; i <= classzone_idx; i++) {
zone = pgdat->node_zones + i;
if (!managed_zone(zone))
continue;
+ if (boosted) {
+ zone->watermark_boost = 0;
+ continue;
+ }
nr_boost_reclaim += zone->watermark_boost;
zone_boosts[i] = zone->watermark_boost;
}
boosted = nr_boost_reclaim;
-restart:
sc.priority = DEF_PRIORITY;
do {
unsigned long nr_reclaimed = sc.nr_reclaimed;
@@ -3644,10 +3648,9 @@ restart:
if (balanced)
break;
- if (!balanced && nr_boost_reclaim) {
- nr_boost_reclaim = 0;
+ /* Limit the priority of boosting to avoid reclaim writeback */
+ if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
goto restart;
- }
/*
* If boosting is not active then only reclaim if there are no
--
next prev parent reply other threads:[~2020-02-08 11:11 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-07 22:54 Reclaim regression after 1c30844d2dfe Ivan Babrou
2020-02-07 23:05 ` Rik van Riel
2020-02-08 9:08 ` Vlastimil Babka
2020-02-08 11:11 ` Hillf Danton [this message]
2020-02-11 10:16 ` Mel Gorman
2020-02-12 22:45 ` Ivan Babrou
2020-02-12 23:55 ` Mel Gorman
2020-02-18 22:07 ` Ivan Babrou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200208111133.16808-1-hdanton@sina.com \
--to=hdanton@sina.com \
--cc=akpm@linux-foundation.org \
--cc=ivan@cloudflare.com \
--cc=kernel-team@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).