From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47E86C433F5 for ; Fri, 22 Oct 2021 14:48:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EB3B460F02 for ; Fri, 22 Oct 2021 14:48:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EB3B460F02 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 8BC6994000B; Fri, 22 Oct 2021 10:48:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 86BA694000A; Fri, 22 Oct 2021 10:48:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7815794000B; Fri, 22 Oct 2021 10:48:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0198.hostedemail.com [216.40.44.198]) by kanga.kvack.org (Postfix) with ESMTP id 6563994000A for ; Fri, 22 Oct 2021 10:48:05 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 236EF32096 for ; Fri, 22 Oct 2021 14:48:05 +0000 (UTC) X-FDA: 78724353330.17.34E2D4F Received: from outbound-smtp47.blacknight.com (outbound-smtp47.blacknight.com [46.22.136.64]) by imf24.hostedemail.com (Postfix) with ESMTP id 527AEB00009F for ; Fri, 22 Oct 2021 14:48:01 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp47.blacknight.com (Postfix) with ESMTPS id 582DAFB3E4 for ; Fri, 22 Oct 2021 15:48:03 +0100 (IST) Received: (qmail 31553 invoked from network); 22 Oct 2021 14:48:03 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.17.29]) by 81.17.254.9 with ESMTPA; 22 Oct 2021 14:48:03 -0000 From: Mel Gorman To: Andrew Morton Cc: NeilBrown , Theodore Ts'o , Andreas Dilger , "Darrick J . Wong" , Matthew Wilcox , Michal Hocko , Dave Chinner , Rik van Riel , Vlastimil Babka , Johannes Weiner , Jonathan Corbet , Linux-MM , Linux-fsdevel , LKML , Mel Gorman Subject: [PATCH 6/8] mm/vmscan: Centralise timeout values for reclaim_throttle Date: Fri, 22 Oct 2021 15:46:49 +0100 Message-Id: <20211022144651.19914-7-mgorman@techsingularity.net> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211022144651.19914-1-mgorman@techsingularity.net> References: <20211022144651.19914-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Queue-Id: 527AEB00009F Authentication-Results: imf24.hostedemail.com; dkim=none; spf=pass (imf24.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.64 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none X-Stat-Signature: p8kn8y7sodt7ayqjpebx6mk5euqcabqq X-Rspamd-Server: rspam06 X-HE-Tag: 1634914081-77861 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Neil Brown raised concerns about callers of reclaim_throttle specifying a timeout value. The original timeout values to congestion_wait() were probably pulled out of thin air or copy&pasted from somewhere else. This patch centralises the timeout values and selects a timeout based on the reason for reclaim throttling. These figures are also pulled out of the same thin air but better values may be derived Running a workload that is throttling for inappropriate periods and tracing mm_vmscan_throttled can be used to pick a more appropriate value. Excessive throttling would pick a lower timeout where as excessive CPU usage in reclaim context would select a larger timeout. Ideally a large value would always be used and the wakeups would occur before a timeout but that requires careful testing. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 2 +- mm/internal.h | 3 +-- mm/page-writeback.c | 2 +- mm/vmscan.c | 50 +++++++++++++++++++++++++++++++++------------ 4 files changed, 40 insertions(+), 17 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 7359093d8ac0..151b04c4dab3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -828,7 +828,7 @@ isolate_migratepages_block(struct compact_control *cc= , unsigned long low_pfn, if (cc->mode =3D=3D MIGRATE_ASYNC) return -EAGAIN; =20 - reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED); =20 if (fatal_signal_pending(current)) return -EINTR; diff --git a/mm/internal.h b/mm/internal.h index c72d3383ef34..383d9b7e7991 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -129,8 +129,7 @@ extern unsigned long highest_memmap_pfn; */ extern int isolate_lru_page(struct page *page); extern void putback_lru_page(struct page *page); -extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_stat= e reason, - long timeout); +extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_stat= e reason); =20 /* * in mm/rmap.c: diff --git a/mm/page-writeback.c b/mm/page-writeback.c index f34f54fcd5b4..4b01a6872f9e 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2374,7 +2374,7 @@ int do_writepages(struct address_space *mapping, st= ruct writeback_control *wbc) * guess as any. */ reclaim_throttle(NODE_DATA(numa_node_id()), - VMSCAN_THROTTLE_WRITEBACK, HZ/50); + VMSCAN_THROTTLE_WRITEBACK); } /* * Usually few pages are written by now from those we've just submitted diff --git a/mm/vmscan.c b/mm/vmscan.c index 0450f6867d61..66da45084af4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1006,12 +1006,10 @@ static void handle_write_error(struct address_spa= ce *mapping, unlock_page(page); } =20 -void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reaso= n, - long timeout) +void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reaso= n) { wait_queue_head_t *wqh =3D &pgdat->reclaim_wait[reason]; - long ret; - bool acct_writeback =3D (reason =3D=3D VMSCAN_THROTTLE_WRITEBACK); + long timeout, ret; DEFINE_WAIT(wait); =20 /* @@ -1023,17 +1021,43 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmsc= an_throttle_state reason, current->flags & (PF_IO_WORKER|PF_KTHREAD)) return; =20 - if (acct_writeback && - atomic_inc_return(&pgdat->nr_writeback_throttled) =3D=3D 1) { - WRITE_ONCE(pgdat->nr_reclaim_start, - node_page_state(pgdat, NR_THROTTLED_WRITTEN)); + /* + * These figures are pulled out of thin air. + * VMSCAN_THROTTLE_ISOLATED is a transient condition based on too many + * parallel reclaimers which is a short-lived event so the timeout is + * short. Failing to make progress or waiting on writeback are + * potentially long-lived events so use a longer timeout. This is shaky + * logic as a failure to make progress could be due to anything from + * writeback to a slow device to excessive references pages at the tail + * of the inactive LRU. + */ + switch(reason) { + case VMSCAN_THROTTLE_WRITEBACK: + timeout =3D HZ/10; + + if (atomic_inc_return(&pgdat->nr_writeback_throttled) =3D=3D 1) { + WRITE_ONCE(pgdat->nr_reclaim_start, + node_page_state(pgdat, NR_THROTTLED_WRITTEN)); + } + + break; + case VMSCAN_THROTTLE_NOPROGRESS: + timeout =3D HZ/10; + break; + case VMSCAN_THROTTLE_ISOLATED: + timeout =3D HZ/50; + break; + default: + WARN_ON_ONCE(1); + timeout =3D HZ; + break; } =20 prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); ret =3D schedule_timeout(timeout); finish_wait(wqh, &wait); =20 - if (acct_writeback) + if (reason =3D=3D VMSCAN_THROTTLE_WRITEBACK) atomic_dec(&pgdat->nr_writeback_throttled); =20 trace_mm_vmscan_throttled(pgdat->node_id, jiffies_to_usecs(timeout), @@ -2319,7 +2343,7 @@ shrink_inactive_list(unsigned long nr_to_scan, stru= ct lruvec *lruvec, =20 /* wait a bit for the reclaimer. */ stalled =3D true; - reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED); =20 /* We are about to die and free our memory. Return now. */ if (fatal_signal_pending(current)) @@ -3251,7 +3275,7 @@ static void shrink_node(pg_data_t *pgdat, struct sc= an_control *sc) * until some pages complete writeback. */ if (sc->nr.immediate) - reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); } =20 /* @@ -3275,7 +3299,7 @@ static void shrink_node(pg_data_t *pgdat, struct sc= an_control *sc) if (!current_is_kswapd() && current_may_throttle() && !sc->hibernation_mode && test_bit(LRUVEC_CONGESTED, &target_lruvec->flags)) - reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); =20 if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed, sc)) @@ -3347,7 +3371,7 @@ static void consider_reclaim_throttle(pg_data_t *pg= dat, struct scan_control *sc) =20 /* Throttle if making no progress at high prioities. */ if (sc->priority < DEF_PRIORITY - 2) - reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS); } =20 /* --=20 2.31.1