linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] shmem, memcg: enable memcg aware shrinker
@ 2020-06-01  3:22 Greg Thelen
  2020-06-01 11:05 ` Kirill Tkhai
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Greg Thelen @ 2020-06-01  3:22 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton, Kirill Tkhai
  Cc: linux-mm, linux-kernel, Greg Thelen, stable

Since v4.19 commit b0dedc49a2da ("mm/vmscan.c: iterate only over charged
shrinkers during memcg shrink_slab()") a memcg aware shrinker is only
called when the per-memcg per-node shrinker_map indicates that the
shrinker may have objects to release to the memcg and node.

shmem_unused_huge_count and shmem_unused_huge_scan support the per-tmpfs
shrinker which advertises per memcg and numa awareness.  The shmem
shrinker releases memory by splitting hugepages that extend beyond
i_size.

Shmem does not currently set bits in shrinker_map.  So, starting with
b0dedc49a2da, memcg reclaim avoids calling the shmem shrinker under
pressure.  This leads to undeserved memcg OOM kills.
Example that reliably sees memcg OOM kill in unpatched kernel:
  FS=/tmp/fs
  CONTAINER=/cgroup/memory/tmpfs_shrinker
  mkdir -p $FS
  mount -t tmpfs -o huge=always nodev $FS
  # Create 1000 MB container, which shouldn't suffer OOM.
  mkdir $CONTAINER
  echo 1000M > $CONTAINER/memory.limit_in_bytes
  echo $BASHPID >> $CONTAINER/cgroup.procs
  # Create 4000 files.  Ideally each file uses 4k data page + a little
  # metadata.  Assume 8k total per-file, 32MB (4000*8k) should easily
  # fit within container's 1000 MB.  But if data pages use 2MB
  # hugepages (due to aggressive huge=always) then files consume 8GB,
  # which hits memcg 1000 MB limit.
  for i in {1..4000}; do
    echo . > $FS/$i
  done

v5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg
aware") maintains the per-node per-memcg shrinker bitmap for THP
shrinker.  But there's no such logic in shmem.  Make shmem set the
per-memcg per-node shrinker bits when it modifies inodes to have
shrinkable pages.

Fixes: b0dedc49a2da ("mm/vmscan.c: iterate only over charged shrinkers during memcg shrink_slab()")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Greg Thelen <gthelen@google.com>
---
 mm/shmem.c | 61 +++++++++++++++++++++++++++++++-----------------------
 1 file changed, 35 insertions(+), 26 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index bd8840082c94..e11090f78cb5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1002,6 +1002,33 @@ static int shmem_getattr(const struct path *path, struct kstat *stat,
 	return 0;
 }
 
+/*
+ * Expose inode and optional page to shrinker as having a possibly splittable
+ * hugepage that reaches beyond i_size.
+ */
+static void shmem_shrinker_add(struct shmem_sb_info *sbinfo,
+			       struct inode *inode, struct page *page)
+{
+	struct shmem_inode_info *info = SHMEM_I(inode);
+
+	spin_lock(&sbinfo->shrinklist_lock);
+	/*
+	 * _careful to defend against unlocked access to ->shrink_list in
+	 * shmem_unused_huge_shrink()
+	 */
+	if (list_empty_careful(&info->shrinklist)) {
+		list_add_tail(&info->shrinklist, &sbinfo->shrinklist);
+		sbinfo->shrinklist_len++;
+	}
+	spin_unlock(&sbinfo->shrinklist_lock);
+
+#ifdef CONFIG_MEMCG
+	if (page && PageTransHuge(page))
+		memcg_set_shrinker_bit(page->mem_cgroup, page_to_nid(page),
+				       inode->i_sb->s_shrink.id);
+#endif
+}
+
 static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
@@ -1048,17 +1075,13 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 			 * to shrink under memory pressure.
 			 */
 			if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
-				spin_lock(&sbinfo->shrinklist_lock);
-				/*
-				 * _careful to defend against unlocked access to
-				 * ->shrink_list in shmem_unused_huge_shrink()
-				 */
-				if (list_empty_careful(&info->shrinklist)) {
-					list_add_tail(&info->shrinklist,
-							&sbinfo->shrinklist);
-					sbinfo->shrinklist_len++;
-				}
-				spin_unlock(&sbinfo->shrinklist_lock);
+				struct page *page;
+
+				page = find_get_page(inode->i_mapping,
+					(newsize & HPAGE_PMD_MASK) >> PAGE_SHIFT);
+				shmem_shrinker_add(sbinfo, inode, page);
+				if (page)
+					put_page(page);
 			}
 		}
 	}
@@ -1889,21 +1912,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 	if (PageTransHuge(page) &&
 	    DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE) <
 			hindex + HPAGE_PMD_NR - 1) {
-		/*
-		 * Part of the huge page is beyond i_size: subject
-		 * to shrink under memory pressure.
-		 */
-		spin_lock(&sbinfo->shrinklist_lock);
-		/*
-		 * _careful to defend against unlocked access to
-		 * ->shrink_list in shmem_unused_huge_shrink()
-		 */
-		if (list_empty_careful(&info->shrinklist)) {
-			list_add_tail(&info->shrinklist,
-				      &sbinfo->shrinklist);
-			sbinfo->shrinklist_len++;
-		}
-		spin_unlock(&sbinfo->shrinklist_lock);
+		shmem_shrinker_add(sbinfo, inode, page);
 	}
 
 	/*
-- 
2.27.0.rc0.183.gde8f92d652-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-08  4:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-01  3:22 [PATCH] shmem, memcg: enable memcg aware shrinker Greg Thelen
2020-06-01 11:05 ` Kirill Tkhai
2020-06-01 21:48   ` Greg Thelen
2020-06-02  8:28     ` Kirill Tkhai
2020-06-01 22:59 ` Yang Shi
2020-06-04  8:17   ` Greg Thelen
2020-06-05 22:05     ` Yang Shi
2020-06-08  4:49 ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).