Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>,
	Michal Hocko <mhocko@kernel.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	l.roehrs@profihost.ag, cgroups@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: lot of MemAvailable but falling cache and raising PSI
Date: Fri, 27 Sep 2019 14:45:46 +0200
Message-ID: <4f6f1bc9-08f4-d53a-8788-a761be769757@suse.cz> (raw)
In-Reply-To: <2fe81a9e-5d29-79cf-f747-c66ae35defd0@profihost.ag>


[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]

On 9/19/19 12:21 PM, Stefan Priebe - Profihost AG wrote:
> Kernel 5.2.14 is now running since exactly 7 days and now we can easaly
> view a trend i', not sure if i should post graphs.
> 
> Cache size is continuously shrinking while memfree is rising.
> 
> While there were 4,5GB free in avg in the beginnen we now have an avg of
> 8GB free memory.
> 
> Cache has shrinked from avg 24G to avg 18G.
> 
> Memory pressure has rised from avg 0% to avg 0.1% - not much but if you
> look at the graphs it's continuously rising while cache is shrinking and
> memfree is rising.

Hi, could you try the patch below? I suspect you're hitting a corner
case where compaction_suitable() returns COMPACT_SKIPPED for the
ZONE_DMA, triggering reclaim even if other zones have plenty of free
memory. And should_continue_reclaim() then returns true until twice the
requested page size is reclaimed (compact_gap()). That means 4MB
reclaimed for each THP allocation attempt, which roughly matches the
trace data you preovided previously.

The amplification to 4MB should be removed in patches merged for 5.4, so
it would be only 32 pages reclaimed per THP allocation. The patch below
tries to remove this corner case completely, and it should be more
visible on your 5.2.x, so please apply it there.

[-- Attachment #2: 0001-mm-compaction-distinguish-when-compaction-is-impossi.patch --]
[-- Type: text/x-patch, Size: 6381 bytes --]

From 565008042b759835d51703f1da9b335dc0404546 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Thu, 12 Sep 2019 13:40:46 +0200
Subject: [PATCH] mm, compaction: distinguish when compaction is impossible

---
 include/linux/compaction.h     |  7 ++++++-
 include/trace/events/mmflags.h |  1 +
 mm/compaction.c                | 16 +++++++++++++--
 mm/vmscan.c                    | 36 ++++++++++++++++++++++++----------
 4 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 9569e7c786d3..6e624f482a08 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -17,8 +17,13 @@ enum compact_priority {
 };
 
 /* Return values for compact_zone() and try_to_compact_pages() */
-/* When adding new states, please adjust include/trace/events/compaction.h */
+/* When adding new states, please adjust include/trace/events/mmflags.h */
 enum compact_result {
+	/*
+	 * The zone is too small to provide the requested allocation even if
+	 * fully freed (i.e. ZONE_DMA for THP allocation due to lowmem reserves)
+	 */
+	COMPACT_IMPOSSIBLE,
 	/* For more detailed tracepoint output - internal to compaction */
 	COMPACT_NOT_SUITABLE_ZONE,
 	/*
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index a1675d43777e..557dad69a9db 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -170,6 +170,7 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY,	"softdirty"	)		\
 
 #ifdef CONFIG_COMPACTION
 #define COMPACTION_STATUS					\
+	EM( COMPACT_IMPOSSIBLE,		"impossible")		\
 	EM( COMPACT_SKIPPED,		"skipped")		\
 	EM( COMPACT_DEFERRED,		"deferred")		\
 	EM( COMPACT_CONTINUE,		"continue")		\
diff --git a/mm/compaction.c b/mm/compaction.c
index 9e1b9acb116b..50a3dd2e2b6e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1948,6 +1948,7 @@ static enum compact_result compact_finished(struct compact_control *cc)
 /*
  * compaction_suitable: Is this suitable to run compaction on this zone now?
  * Returns
+ *   COMPACT_IMPOSSIBLE If the allocation would fail even with all pages free
  *   COMPACT_SKIPPED  - If there are too few free pages for compaction
  *   COMPACT_SUCCESS  - If the allocation would succeed without compaction
  *   COMPACT_CONTINUE - If compaction should run now
@@ -1971,6 +1972,16 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 								alloc_flags))
 		return COMPACT_SUCCESS;
 
+	/*
+	 * If the allocation would not succeed even with a fully free zone
+	 * due to e.g. lowmem reserves, indicate that compaction can't possibly
+	 * help and it would be pointless to reclaim.
+	 */
+	watermark += 1UL << order;
+	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
+				 alloc_flags, zone_managed_pages(zone)))
+		return COMPACT_IMPOSSIBLE;
+
 	/*
 	 * Watermarks for order-0 must be met for compaction to be able to
 	 * isolate free pages for migration targets. This means that the
@@ -2058,7 +2069,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 		available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
 		compact_result = __compaction_suitable(zone, order, alloc_flags,
 				ac_classzone_idx(ac), available);
-		if (compact_result != COMPACT_SKIPPED)
+		if (compact_result > COMPACT_SKIPPED)
 			return true;
 	}
 
@@ -2079,7 +2090,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 	ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
 	/* Compaction is likely to fail */
-	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
+	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED
+	    || ret == COMPACT_IMPOSSIBLE)
 		return ret;
 
 	/* huh, compaction_suitable is returning something unexpected */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 910e02c793ff..20ba471a8454 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2778,11 +2778,12 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 }
 
 /*
- * Returns true if compaction should go ahead for a costly-order request, or
- * the allocation would already succeed without compaction. Return false if we
- * should reclaim first.
+ * Returns 1 if compaction should go ahead for a costly-order request, or the
+ * allocation would already succeed without compaction. Return 0 if we should
+ * reclaim first. Return -1 when compaction can't help at all due to zone being
+ * too small, which means there's no point in reclaim nor compaction.
  */
-static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
+static inline int compaction_ready(struct zone *zone, struct scan_control *sc)
 {
 	unsigned long watermark;
 	enum compact_result suitable;
@@ -2790,10 +2791,16 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
 	suitable = compaction_suitable(zone, sc->order, 0, sc->reclaim_idx);
 	if (suitable == COMPACT_SUCCESS)
 		/* Allocation should succeed already. Don't reclaim. */
-		return true;
+		return 1;
 	if (suitable == COMPACT_SKIPPED)
 		/* Compaction cannot yet proceed. Do reclaim. */
-		return false;
+		return 0;
+	if (suitable == COMPACT_IMPOSSIBLE)
+		/*
+		 * Compaction can't possibly help. So don't reclaim, but keep
+		 * checking other zones.
+		 */
+		return -1;
 
 	/*
 	 * Compaction is already possible, but it takes time to run and there
@@ -2839,6 +2846,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 					sc->reclaim_idx, sc->nodemask) {
+		int compact_ready;
 		/*
 		 * Take care memory controller reclaiming has small influence
 		 * to global LRU.
@@ -2858,10 +2866,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 			 * page allocations.
 			 */
 			if (IS_ENABLED(CONFIG_COMPACTION) &&
-			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
-			    compaction_ready(zone, sc)) {
-				sc->compaction_ready = true;
-				continue;
+			    sc->order > PAGE_ALLOC_COSTLY_ORDER) {
+				compact_ready = compaction_ready(zone, sc);
+				if (compact_ready == 1) {
+					sc->compaction_ready = true;
+					continue;
+				} else if (compact_ready == -1) {
+					/*
+					 * In this zone, neither reclaim nor
+					 * compaction can help.
+					 */
+					continue;
+				}
 			}
 
 			/*
-- 
2.23.0


  parent reply index

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05 11:27 Stefan Priebe - Profihost AG
2019-09-05 11:40 ` Michal Hocko
2019-09-05 11:56   ` Stefan Priebe - Profihost AG
2019-09-05 16:28     ` Yang Shi
2019-09-05 17:26       ` Stefan Priebe - Profihost AG
2019-09-05 18:46         ` Yang Shi
2019-09-05 19:31           ` Stefan Priebe - Profihost AG
2019-09-06 10:08     ` Stefan Priebe - Profihost AG
2019-09-06 10:25       ` Vlastimil Babka
2019-09-06 18:52       ` Yang Shi
2019-09-07  7:32         ` Stefan Priebe - Profihost AG
2019-09-09  8:27       ` Michal Hocko
2019-09-09  8:54         ` Stefan Priebe - Profihost AG
2019-09-09 11:01           ` Michal Hocko
2019-09-09 12:08             ` Michal Hocko
2019-09-09 12:10               ` Stefan Priebe - Profihost AG
2019-09-09 12:28                 ` Michal Hocko
2019-09-09 12:37                   ` Stefan Priebe - Profihost AG
2019-09-09 12:49                     ` Michal Hocko
2019-09-09 12:56                       ` Stefan Priebe - Profihost AG
     [not found]                         ` <52235eda-ffe2-721c-7ad7-575048e2d29d@profihost.ag>
2019-09-10  5:58                           ` Stefan Priebe - Profihost AG
2019-09-10  8:29                           ` Michal Hocko
2019-09-10  8:38                             ` Stefan Priebe - Profihost AG
2019-09-10  9:02                               ` Michal Hocko
2019-09-10  9:37                                 ` Stefan Priebe - Profihost AG
2019-09-10 11:07                                   ` Michal Hocko
2019-09-10 12:45                                     ` Stefan Priebe - Profihost AG
2019-09-10 12:57                                       ` Michal Hocko
2019-09-10 13:05                                         ` Stefan Priebe - Profihost AG
2019-09-10 13:14                                           ` Stefan Priebe - Profihost AG
2019-09-10 13:24                                             ` Michal Hocko
2019-09-11  6:12                                               ` Stefan Priebe - Profihost AG
2019-09-11  6:24                                                 ` Stefan Priebe - Profihost AG
2019-09-11 13:59                                                   ` Stefan Priebe - Profihost AG
2019-09-12 10:53                                                     ` Stefan Priebe - Profihost AG
2019-09-12 11:06                                                       ` Stefan Priebe - Profihost AG
2019-09-11  7:09                                                 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko
2019-09-11 14:09                                                   ` Stefan Priebe - Profihost AG
2019-09-11 14:56                                                   ` Filipe Manana
2019-09-11 15:39                                                     ` Stefan Priebe - Profihost AG
2019-09-11 15:56                                                       ` Filipe Manana
2019-09-11 16:15                                                         ` Stefan Priebe - Profihost AG
2019-09-11 16:19                                                           ` Filipe Manana
2019-09-19 10:21                                                 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG
2019-09-23 12:08                                                   ` Michal Hocko
2019-09-27 12:45                                                   ` Vlastimil Babka [this message]
2019-09-30  6:56                                                     ` Stefan Priebe - Profihost AG
2019-09-30  7:21                                                       ` Vlastimil Babka
2019-10-22  7:41                                                     ` Stefan Priebe - Profihost AG
2019-10-22  7:48                                                       ` Vlastimil Babka
2019-10-22 10:02                                                         ` Stefan Priebe - Profihost AG
2019-10-22 10:20                                                           ` Oscar Salvador
2019-10-22 10:21                                                           ` Vlastimil Babka
2019-10-22 11:08                                                             ` Stefan Priebe - Profihost AG
2019-09-10  5:41                       ` Stefan Priebe - Profihost AG
2019-09-09 11:49           ` Vlastimil Babka
2019-09-09 12:09             ` Stefan Priebe - Profihost AG
2019-09-09 12:21               ` Vlastimil Babka
2019-09-09 12:31                 ` Stefan Priebe - Profihost AG
2019-09-05 12:15 ` Vlastimil Babka
2019-09-05 12:27   ` Stefan Priebe - Profihost AG

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4f6f1bc9-08f4-d53a-8788-a761be769757@suse.cz \
    --to=vbabka@suse.cz \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=l.roehrs@profihost.ag \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=s.priebe@profihost.ag \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git