linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Johannes Weiner <hannes@cmpxchg.org>, Leon Yang <lnyng@fb.com>,
	Rik van Riel <riel@surriel.com>,
	Shakeel Butt <shakeelb@google.com>, Roman Gushchin <guro@fb.com>,
	Chris Down <chris@chrisdown.name>, Michal Hocko <mhocko@suse.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.4 58/61] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
Date: Tue, 24 Aug 2021 13:01:03 -0400	[thread overview]
Message-ID: <20210824170106.710221-59-sashal@kernel.org> (raw)
In-Reply-To: <20210824170106.710221-1-sashal@kernel.org>

From: Johannes Weiner <hannes@cmpxchg.org>

[ Upstream commit f56ce412a59d7d938b81de8878faef128812482c ]

We've noticed occasional OOM killing when memory.low settings are in
effect for cgroups.  This is unexpected and undesirable as memory.low is
supposed to express non-OOMing memory priorities between cgroups.

The reason for this is proportional memory.low reclaim.  When cgroups
are below their memory.low threshold, reclaim passes them over in the
first round, and then retries if it couldn't find pages anywhere else.
But when cgroups are slightly above their memory.low setting, page scan
force is scaled down and diminished in proportion to the overage, to the
point where it can cause reclaim to fail as well - only in that case we
currently don't retry, and instead trigger OOM.

To fix this, hook proportional reclaim into the same retry logic we have
in place for when cgroups are skipped entirely.  This way if reclaim
fails and some cgroups were scanned with diminished pressure, we'll try
another full-force cycle before giving up and OOMing.

[akpm@linux-foundation.org: coding-style fixes]

Link: https://lkml.kernel.org/r/20210817180506.220056-1-hannes@cmpxchg.org
Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Leon Yang <lnyng@fb.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>		[5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/linux/memcontrol.h | 29 +++++++++++++++--------------
 mm/vmscan.c                | 27 +++++++++++++++++++--------
 2 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 059f55841cc8..b6d0b68f5503 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -356,12 +356,15 @@ static inline bool mem_cgroup_disabled(void)
 	return !cgroup_subsys_enabled(memory_cgrp_subsys);
 }
 
-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
-						  struct mem_cgroup *memcg,
-						  bool in_low_reclaim)
+static inline void mem_cgroup_protection(struct mem_cgroup *root,
+					 struct mem_cgroup *memcg,
+					 unsigned long *min,
+					 unsigned long *low)
 {
+	*min = *low = 0;
+
 	if (mem_cgroup_disabled())
-		return 0;
+		return;
 
 	/*
 	 * There is no reclaim protection applied to a targeted reclaim.
@@ -397,13 +400,10 @@ static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
 	 *
 	 */
 	if (root == memcg)
-		return 0;
-
-	if (in_low_reclaim)
-		return READ_ONCE(memcg->memory.emin);
+		return;
 
-	return max(READ_ONCE(memcg->memory.emin),
-		   READ_ONCE(memcg->memory.elow));
+	*min = READ_ONCE(memcg->memory.emin);
+	*low = READ_ONCE(memcg->memory.elow);
 }
 
 enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
@@ -884,11 +884,12 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
 {
 }
 
-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
-						  struct mem_cgroup *memcg,
-						  bool in_low_reclaim)
+static inline void mem_cgroup_protection(struct mem_cgroup *root,
+					 struct mem_cgroup *memcg,
+					 unsigned long *min,
+					 unsigned long *low)
 {
-	return 0;
+	*min = *low = 0;
 }
 
 static inline enum mem_cgroup_protection mem_cgroup_protected(
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dc44da27673d..fad9be4703ec 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -89,9 +89,12 @@ struct scan_control {
 	unsigned int may_swap:1;
 
 	/*
-	 * Cgroups are not reclaimed below their configured memory.low,
-	 * unless we threaten to OOM. If any cgroups are skipped due to
-	 * memory.low and nothing was reclaimed, go back for memory.low.
+	 * Cgroup memory below memory.low is protected as long as we
+	 * don't threaten to OOM. If any cgroup is reclaimed at
+	 * reduced force or passed over entirely due to its memory.low
+	 * setting (memcg_low_skipped), and nothing is reclaimed as a
+	 * result, then go back for one more cycle that reclaims the protected
+	 * memory (memcg_low_reclaim) to avert OOM.
 	 */
 	unsigned int memcg_low_reclaim:1;
 	unsigned int memcg_low_skipped:1;
@@ -2458,15 +2461,14 @@ out:
 	for_each_evictable_lru(lru) {
 		int file = is_file_lru(lru);
 		unsigned long lruvec_size;
+		unsigned long low, min;
 		unsigned long scan;
-		unsigned long protection;
 
 		lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
-		protection = mem_cgroup_protection(sc->target_mem_cgroup,
-						   memcg,
-						   sc->memcg_low_reclaim);
+		mem_cgroup_protection(sc->target_mem_cgroup, memcg,
+				      &min, &low);
 
-		if (protection) {
+		if (min || low) {
 			/*
 			 * Scale a cgroup's reclaim pressure by proportioning
 			 * its current usage to its memory.low or memory.min
@@ -2497,6 +2499,15 @@ out:
 			 * hard protection.
 			 */
 			unsigned long cgroup_size = mem_cgroup_size(memcg);
+			unsigned long protection;
+
+			/* memory.low scaling, make sure we retry before OOM */
+			if (!sc->memcg_low_reclaim && low > min) {
+				protection = low;
+				sc->memcg_low_skipped = 1;
+			} else {
+				protection = min;
+			}
 
 			/* Avoid TOCTOU with earlier protection check */
 			cgroup_size = max(cgroup_size, protection);
-- 
2.30.2


  parent reply	other threads:[~2021-08-24 17:20 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24 17:00 [PATCH 5.4 00/61] 5.4.143-rc1 review Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 01/61] ext4: fix EXT4_MAX_LOGICAL_BLOCK macro Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 02/61] x86/fpu: Make init_fpstate correct with optimized XSAVE Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 03/61] ath: Use safer key clearing with key cache entries Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 04/61] ath9k: Clear key cache explicitly on disabling hardware Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 05/61] ath: Export ath_hw_keysetmac() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 06/61] ath: Modify ath_key_delete() to not need full key entry Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 07/61] ath9k: Postpone key cache entry deletion for TXQ frames reference it Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 08/61] mtd: cfi_cmdset_0002: fix crash when erasing/writing AMD cards Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 09/61] media: zr364xx: propagate errors from zr364xx_start_readpipe() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 10/61] media: zr364xx: fix memory leaks in probe() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 11/61] media: drivers/media/usb: fix memory leak in zr364xx_probe Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 12/61] USB: core: Avoid WARNings for 0-length descriptor requests Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 13/61] dmaengine: xilinx_dma: Fix read-after-free bug when terminating transfers Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 14/61] dmaengine: usb-dmac: Fix PM reference leak in usb_dmac_probe() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 15/61] ARM: dts: am43x-epos-evm: Reduce i2c0 bus speed for tps65218 Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 16/61] dmaengine: of-dma: router_xlate to return -EPROBE_DEFER if controller is not yet available Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 17/61] scsi: megaraid_mm: Fix end of loop tests for list_for_each_entry() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 18/61] scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 19/61] scsi: core: Avoid printing an error if target_alloc() returns -ENXIO Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 20/61] scsi: core: Fix capacity set to zero after offlinining device Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 21/61] ARM: dts: nomadik: Fix up interrupt controller node names Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 22/61] net: usb: lan78xx: don't modify phy_device state concurrently Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 23/61] drm/amd/display: Fix Dynamic bpp issue with 8K30 with Navi 1X Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 24/61] Bluetooth: hidp: use correct wait queue when removing ctrl_wait Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 25/61] iommu: Check if group is NULL before remove device Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 26/61] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 27/61] dccp: add do-while-0 stubs for dccp_pr_debug macros Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 28/61] virtio: Protect vqs list access Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 29/61] vhost: Fix the calculation in vhost_overflow() Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 30/61] bpf: Clear zext_dst of dead insns Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 31/61] bnxt: don't lock the tx queue from napi poll Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 32/61] bnxt: disable napi before canceling DIM Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 33/61] net: 6pack: fix slab-out-of-bounds in decode_data Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 34/61] ptp_pch: Restore dependency on PCI Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 35/61] bnxt_en: Add missing DMA memory barriers Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 36/61] vrf: Reset skb conntrack connection on VRF rcv Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 37/61] virtio-net: support XDP when not more queues Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 38/61] virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 39/61] net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32 Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 40/61] net: mdio-mux: Don't ignore memory allocation errors Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 41/61] net: mdio-mux: Handle -EPROBE_DEFER correctly Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 42/61] ovs: clear skb->tstamp in forwarding path Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 43/61] i40e: Fix ATR queue selection Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 44/61] iavf: Fix ping is lost after untrusted VF had tried to change MAC Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 45/61] ovl: add splice file read write helper Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 46/61] mmc: dw_mmc: Fix hang on data CRC error Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 47/61] ALSA: hda - fix the 'Capture Switch' value change notifications Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 48/61] tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 49/61] slimbus: messaging: start transaction ids from 1 instead of zero Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 50/61] slimbus: messaging: check for valid transaction id Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 51/61] slimbus: ngd: reset dma setup during runtime pm Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 52/61] ipack: tpci200: fix many double free issues in tpci200_pci_probe Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 53/61] ipack: tpci200: fix memory leak in the tpci200_register Sasha Levin
2021-08-24 17:00 ` [PATCH 5.4 54/61] btrfs: prevent rename2 from exchanging a subvol with a directory from different parents Sasha Levin
2021-08-24 17:01 ` [PATCH 5.4 55/61] PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI Sasha Levin
2021-08-24 17:01 ` [PATCH 5.4 56/61] ASoC: intel: atom: Fix breakage for PCM buffer address setup Sasha Levin
2021-08-24 17:01 ` [PATCH 5.4 57/61] mm, memcg: avoid stale protection values when cgroup is above protection Sasha Levin
2021-08-24 17:01 ` Sasha Levin [this message]
2021-08-24 17:01 ` [PATCH 5.4 59/61] fs: warn about impending deprecation of mandatory locks Sasha Levin
2021-08-24 17:01 ` [PATCH 5.4 60/61] netfilter: nft_exthdr: fix endianness of tcp option cast Sasha Levin
2021-08-24 17:01 ` [PATCH 5.4 61/61] Linux 5.4.143-rc1 Sasha Levin
2021-08-25  7:38 ` [PATCH 5.4 00/61] 5.4.143-rc1 review Samuel Zou
2021-08-25 13:03 ` Sudip Mukherjee
2021-08-25 18:37 ` Daniel Díaz
2021-08-25 20:25 ` Guenter Roeck
2021-08-25 22:36 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210824170106.710221-59-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=chris@chrisdown.name \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lnyng@fb.com \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).