From: Mel Gorman <mgorman@techsingularity.net>
To: Alexey Avramov <hakavlad@inbox.lv>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
Rik van Riel <riel@surriel.com>, Mike Galbraith <efault@gmx.de>,
Darrick Wong <djwong@kernel.org>,
regressions@lists.linux.dev,
Linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/1] mm: vmscan: Reduce throttling due to a failure to make progress
Date: Wed, 1 Dec 2021 14:00:05 +0000 [thread overview]
Message-ID: <20211201140005.GU3366@techsingularity.net> (raw)
In-Reply-To: <20211201033836.4382a474@mail.inbox.lv>
On Wed, Dec 01, 2021 at 03:38:36AM +0900, Alexey Avramov wrote:
> >due to the
> >underlying storage.
>
> I agree.
>
> >and send me the trace.out file please?
>
> https://drive.google.com/file/d/1FBjAmXwXakWPPjcn6K-B50S04vyySQO6/view
>
> Typical entries:
>
> Timer-10841 [006] ..... 14341.639496: mm_vmscan_throttled: nid=0 usec_timeout=100000 usect_delayed=100000 reason=VMSCAN_THROTTLE_WRITEBACK
> gmain-1246 [004] ..... 14341.639498: mm_vmscan_throttled: nid=0 usec_timeout=100000 usect_delayed=100000 reason=VMSCAN_THROTTLE_WRITEBACK
Ok, the bulk of the stalls were for the same reason. Using a laptop with
slower storge (SSD but slower than the first laptop), the stalls were
almost all from the same reason and from one callsite -- shrink_node.
I've included another patch below against 5.16-rc1 but it'll apply to
5.16-rc3. Using the same test I get
Swap off
--------
Kernel: 5.15
2021-12-01 13:06:23,432: Peak values: avg10 avg60 avg300
2021-12-01 13:06:23,432: ----------- ------ ------ ------
2021-12-01 13:06:23,433: some cpu 4.72 2.68 1.25
2021-12-01 13:06:23,433: ----------- ------ ------ ------
2021-12-01 13:06:23,433: some io 7.63 3.29 1.03
2021-12-01 13:06:23,433: full io 4.28 1.91 0.63
2021-12-01 13:06:23,433: ----------- ------ ------ ------
2021-12-01 13:06:23,433: some memory 13.28 4.80 1.20
2021-12-01 13:06:23,434: full memory 12.22 4.44 1.11
2021-12-01 13:06:23,434: Stall times for the last 45.5s:
2021-12-01 13:06:23,434: -----------
2021-12-01 13:06:23,434: some cpu 1.8s, avg 4.0%
2021-12-01 13:06:23,435: -----------
2021-12-01 13:06:23,435: some io 2.9s, avg 6.5%
2021-12-01 13:06:23,435: full io 1.8s, avg 4.0%
2021-12-01 13:06:23,435: -----------
2021-12-01 13:06:23,435: some memory 4.4s, avg 9.6%
2021-12-01 13:06:23,435: full memory 4.0s, avg 8.8%
2021-12-01 13:06:23,435:
Kernel: 5.16-rc1
2021-12-01 13:13:33,662: Peak values: avg10 avg60 avg300
2021-12-01 13:13:33,663: ----------- ------ ------ ------
2021-12-01 13:13:33,663: some cpu 1.97 1.84 0.80
2021-12-01 13:13:33,663: ----------- ------ ------ ------
2021-12-01 13:13:33,663: some io 49.60 26.97 13.26
2021-12-01 13:13:33,663: full io 45.91 24.56 12.13
2021-12-01 13:13:33,663: ----------- ------ ------ ------
2021-12-01 13:13:33,663: some memory 98.49 93.94 54.09
2021-12-01 13:13:33,663: full memory 98.49 93.75 53.82
2021-12-01 13:13:33,664: Stall times for the last 291.0s:
2021-12-01 13:13:33,664: -----------
2021-12-01 13:13:33,664: some cpu 2.3s, avg 0.8%
2021-12-01 13:13:33,664: -----------
2021-12-01 13:13:33,664: some io 57.2s, avg 19.7%
2021-12-01 13:13:33,664: full io 52.9s, avg 18.2%
2021-12-01 13:13:33,664: -----------
2021-12-01 13:13:33,664: some memory 244.8s, avg 84.1%
2021-12-01 13:13:33,664: full memory 242.9s, avg 83.5%
2021-12-01 13:13:33,664:
Kernel: 5.16-rc1-v3r4
2021-12-01 13:16:46,910: Peak values: avg10 avg60 avg300
2021-12-01 13:16:46,910: ----------- ------ ------ ------
2021-12-01 13:16:46,910: some cpu 2.91 2.04 0.97
2021-12-01 13:16:46,910: ----------- ------ ------ ------
2021-12-01 13:16:46,910: some io 6.59 2.94 1.00
2021-12-01 13:16:46,910: full io 3.95 1.82 0.62
2021-12-01 13:16:46,910: ----------- ------ ------ ------
2021-12-01 13:16:46,910: some memory 10.29 3.67 0.90
2021-12-01 13:16:46,911: full memory 9.82 3.46 0.84
2021-12-01 13:16:46,911: Stall times for the last 44.4s:
2021-12-01 13:16:46,911: -----------
2021-12-01 13:16:46,911: some cpu 1.2s, avg 2.7%
2021-12-01 13:16:46,911: -----------
2021-12-01 13:16:46,911: some io 2.4s, avg 5.4%
2021-12-01 13:16:46,911: full io 1.5s, avg 3.5%
2021-12-01 13:16:46,911: -----------
2021-12-01 13:16:46,911: some memory 3.2s, avg 7.1%
2021-12-01 13:16:46,911: full memory 3.0s, avg 6.7%
2021-12-01 13:16:46,911:
So stall times with v5.15 and with the patch are vaguely similar with
stall times for 5.16-rc1 being terrible
Swap on
-------
Kernel: 5.15
2021-12-01 13:23:04,392: Peak values: avg10 avg60 avg300
2021-12-01 13:23:04,392: ----------- ------ ------ ------
2021-12-01 13:23:04,393: some cpu 8.02 5.19 2.93
2021-12-01 13:23:04,393: ----------- ------ ------ ------
2021-12-01 13:23:04,393: some io 77.30 61.75 37.00
2021-12-01 13:23:04,393: full io 62.33 50.53 29.98
2021-12-01 13:23:04,393: ----------- ------ ------ ------
2021-12-01 13:23:04,393: some memory 82.05 66.42 39.74
2021-12-01 13:23:04,393: full memory 73.27 57.79 34.39
2021-12-01 13:23:04,393: Stall times for the last 272.6s:
2021-12-01 13:23:04,393: -----------
2021-12-01 13:23:04,393: some cpu 13.7s, avg 5.0%
2021-12-01 13:23:04,393: -----------
2021-12-01 13:23:04,393: some io 167.4s, avg 61.4%
2021-12-01 13:23:04,393: full io 135.0s, avg 49.5%
2021-12-01 13:23:04,394: -----------
2021-12-01 13:23:04,394: some memory 180.4s, avg 66.2%
2021-12-01 13:23:04,394: full memory 155.9s, avg 57.2%
2021-12-01 13:23:04,394:
Kernel: 5.16-rc1
2021-12-01 13:35:01,025: Peak values: avg10 avg60 avg300
2021-12-01 13:35:01,025: ----------- ------ ------ ------
2021-12-01 13:35:01,025: some cpu 4.03 2.51 1.46
2021-12-01 13:35:01,025: ----------- ------ ------ ------
2021-12-01 13:35:01,025: some io 61.02 38.02 25.97
2021-12-01 13:35:01,025: full io 53.30 32.69 22.18
2021-12-01 13:35:01,025: ----------- ------ ------ ------
2021-12-01 13:35:01,025: some memory 99.02 87.77 68.67
2021-12-01 13:35:01,025: full memory 98.66 85.18 65.45
2021-12-01 13:35:01,026: Stall times for the last 620.8s:
2021-12-01 13:35:01,026: -----------
2021-12-01 13:35:01,026: some cpu 11.9s, avg 1.9%
2021-12-01 13:35:01,026: -----------
2021-12-01 13:35:01,026: some io 182.9s, avg 29.5%
2021-12-01 13:35:01,026: full io 156.0s, avg 25.1%
2021-12-01 13:35:01,026: -----------
2021-12-01 13:35:01,026: some memory 463.1s, avg 74.6%
2021-12-01 13:35:01,026: full memory 437.0s, avg 70.4%
2021-12-01 13:35:01,026:
Kernel: 5.16-rc1-v3r4
2021-12-01 13:42:43,066: Peak values: avg10 avg60 avg300
2021-12-01 13:42:43,066: ----------- ------ ------ ------
2021-12-01 13:42:43,066: some cpu 6.45 3.62 2.21
2021-12-01 13:42:43,066: ----------- ------ ------ ------
2021-12-01 13:42:43,066: some io 74.16 58.74 34.14
2021-12-01 13:42:43,066: full io 62.77 48.21 27.97
2021-12-01 13:42:43,066: ----------- ------ ------ ------
2021-12-01 13:42:43,066: some memory 78.82 62.85 36.39
2021-12-01 13:42:43,067: full memory 71.42 55.12 31.92
2021-12-01 13:42:43,067: Stall times for the last 257.2s:
2021-12-01 13:42:43,067: -----------
2021-12-01 13:42:43,067: some cpu 9.9s, avg 3.9%
2021-12-01 13:42:43,067: -----------
2021-12-01 13:42:43,067: some io 150.9s, avg 58.7%
2021-12-01 13:42:43,067: full io 123.2s, avg 47.9%
2021-12-01 13:42:43,067: -----------
2021-12-01 13:42:43,067: some memory 161.5s, avg 62.8%
2021-12-01 13:42:43,067: full memory 141.6s, avg 55.1%
2021-12-01 13:42:43,067:
Again 5.16-rc1 stuttered badly but the new patch was comparable to 5.15.
As my baseline figures are very different to yours due to differences in
storage, can you test the following please?
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 58e744b78c2c..936dc0b6c226 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -277,6 +277,7 @@ enum vmscan_throttle_state {
VMSCAN_THROTTLE_WRITEBACK,
VMSCAN_THROTTLE_ISOLATED,
VMSCAN_THROTTLE_NOPROGRESS,
+ VMSCAN_THROTTLE_CONGESTED,
NR_VMSCAN_THROTTLE,
};
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index f25a6149d3ba..ca2e9009a651 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -30,12 +30,14 @@
#define _VMSCAN_THROTTLE_WRITEBACK (1 << VMSCAN_THROTTLE_WRITEBACK)
#define _VMSCAN_THROTTLE_ISOLATED (1 << VMSCAN_THROTTLE_ISOLATED)
#define _VMSCAN_THROTTLE_NOPROGRESS (1 << VMSCAN_THROTTLE_NOPROGRESS)
+#define _VMSCAN_THROTTLE_CONGESTED (1 << VMSCAN_THROTTLE_CONGESTED)
#define show_throttle_flags(flags) \
(flags) ? __print_flags(flags, "|", \
{_VMSCAN_THROTTLE_WRITEBACK, "VMSCAN_THROTTLE_WRITEBACK"}, \
{_VMSCAN_THROTTLE_ISOLATED, "VMSCAN_THROTTLE_ISOLATED"}, \
- {_VMSCAN_THROTTLE_NOPROGRESS, "VMSCAN_THROTTLE_NOPROGRESS"} \
+ {_VMSCAN_THROTTLE_NOPROGRESS, "VMSCAN_THROTTLE_NOPROGRESS"}, \
+ {_VMSCAN_THROTTLE_CONGESTED, "VMSCAN_THROTTLE_CONGESTED"} \
) : "VMSCAN_THROTTLE_NONE"
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fb9584641ac7..e3f2dd1e8cd9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1021,6 +1021,39 @@ static void handle_write_error(struct address_space *mapping,
unlock_page(page);
}
+bool skip_throttle_noprogress(pg_data_t *pgdat)
+{
+ int reclaimable = 0, write_pending = 0;
+ int i;
+
+ /*
+ * If kswapd is disabled, reschedule if necessary but do not
+ * throttle as the system is likely near OOM.
+ */
+ if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
+ return true;
+
+ /*
+ * If there are a lot of dirty/writeback pages then do not
+ * throttle as throttling will occur when the pages cycle
+ * towards the end of the LRU if still under writeback.
+ */
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ struct zone *zone = pgdat->node_zones + i;
+
+ if (!populated_zone(zone))
+ continue;
+
+ reclaimable += zone_reclaimable_pages(zone);
+ write_pending += zone_page_state_snapshot(zone,
+ NR_ZONE_WRITE_PENDING);
+ }
+ if (2 * write_pending <= reclaimable)
+ return true;
+
+ return false;
+}
+
void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)
{
wait_queue_head_t *wqh = &pgdat->reclaim_wait[reason];
@@ -1056,8 +1089,16 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)
}
break;
+ case VMSCAN_THROTTLE_CONGESTED:
+ fallthrough;
case VMSCAN_THROTTLE_NOPROGRESS:
- timeout = HZ/2;
+ if (skip_throttle_noprogress(pgdat)) {
+ cond_resched();
+ return;
+ }
+
+ timeout = 1;
+
break;
case VMSCAN_THROTTLE_ISOLATED:
timeout = HZ/50;
@@ -3321,7 +3362,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
if (!current_is_kswapd() && current_may_throttle() &&
!sc->hibernation_mode &&
test_bit(LRUVEC_CONGESTED, &target_lruvec->flags))
- reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
+ reclaim_throttle(pgdat, VMSCAN_THROTTLE_CONGESTED);
if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
sc))
@@ -3386,16 +3427,16 @@ static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc)
}
/*
- * Do not throttle kswapd on NOPROGRESS as it will throttle on
- * VMSCAN_THROTTLE_WRITEBACK if there are too many pages under
- * writeback and marked for immediate reclaim at the tail of
- * the LRU.
+ * Do not throttle kswapd or cgroup reclaim on NOPROGRESS as it will
+ * throttle on VMSCAN_THROTTLE_WRITEBACK if there are too many pages
+ * under writeback and marked for immediate reclaim at the tail of the
+ * LRU.
*/
- if (current_is_kswapd())
+ if (current_is_kswapd() || cgroup_reclaim(sc))
return;
/* Throttle if making no progress at high prioities. */
- if (sc->priority < DEF_PRIORITY - 2)
+ if (sc->priority == 1 && !sc->nr_reclaimed)
reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS);
}
@@ -3415,6 +3456,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
unsigned long nr_soft_scanned;
gfp_t orig_mask;
pg_data_t *last_pgdat = NULL;
+ pg_data_t *first_pgdat = NULL;
/*
* If the number of buffer_heads in the machine exceeds the maximum
@@ -3478,14 +3520,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
/* need some check for avoid more shrink_zone() */
}
+ if (!first_pgdat)
+ first_pgdat = zone->zone_pgdat;
+
/* See comment about same check for global reclaim above */
if (zone->zone_pgdat == last_pgdat)
continue;
last_pgdat = zone->zone_pgdat;
shrink_node(zone->zone_pgdat, sc);
- consider_reclaim_throttle(zone->zone_pgdat, sc);
}
+ consider_reclaim_throttle(first_pgdat, sc);
+
/*
* Restore to original mask to avoid the impact on the caller if we
* promoted it to __GFP_HIGHMEM.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2021-12-01 14:00 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-25 15:18 [PATCH 1/1] mm: vmscan: Reduce throttling due to a failure to make progress Mel Gorman
2021-11-26 10:14 ` Vlastimil Babka
2021-11-26 10:26 ` Mike Galbraith
2021-11-26 16:12 ` Alexey Avramov
2021-11-26 16:52 ` Mel Gorman
2021-11-27 19:26 ` Alexey Avramov
2021-11-28 10:00 ` Mike Galbraith
2021-11-28 11:39 ` Mike Galbraith
2021-11-28 12:35 ` Mike Galbraith
2021-11-28 18:38 ` Mike Galbraith
2021-11-28 11:58 ` Alexey Avramov
2021-11-29 8:26 ` Mel Gorman
2021-11-29 15:01 ` Mel Gorman
2021-11-30 10:14 ` Mike Galbraith
2021-11-30 11:22 ` Mel Gorman
2021-11-30 12:00 ` Mike Galbraith
2021-11-30 12:51 ` Mike Galbraith
2021-11-30 13:09 ` Mel Gorman
2021-12-01 4:32 ` Mike Galbraith
2021-11-30 16:03 ` Alexey Avramov
2021-11-30 17:27 ` Mel Gorman
2021-11-30 17:59 ` Mike Galbraith
2021-12-01 13:01 ` Mel Gorman
2021-12-01 13:52 ` Mike Galbraith
2021-12-01 15:06 ` Mel Gorman
2021-11-30 18:38 ` Alexey Avramov
2021-12-01 14:00 ` Mel Gorman [this message]
2021-12-01 17:29 ` Darrick J. Wong
2021-12-02 9:43 ` Mel Gorman
2021-12-02 16:09 ` Darrick J. Wong
2021-12-02 3:11 ` Mike Galbraith
2021-12-02 10:13 ` Mel Gorman
2021-12-02 10:51 ` Mike Galbraith
2021-12-02 11:42 ` Alexey Avramov
2021-12-02 12:14 ` Alexey Avramov
2021-12-02 12:15 ` Mel Gorman
2021-12-02 12:22 ` Alexey Avramov
2021-12-02 13:18 Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211201140005.GU3366@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=akpm@linux-foundation.org \
--cc=djwong@kernel.org \
--cc=efault@gmx.de \
--cc=hakavlad@inbox.lv \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=regressions@lists.linux.dev \
--cc=riel@surriel.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).