From: Kirill Tkhai <ktkhai@virtuozzo.com>
To: Greg Thelen <gthelen@google.com>, Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] shmem, memcg: enable memcg aware shrinker
Date: Mon, 1 Jun 2020 14:05:10 +0300 [thread overview]
Message-ID: <ffdff0be-f2b6-c7c0-debc-9c5e8a33ae4e@virtuozzo.com> (raw)
In-Reply-To: <20200601032204.124624-1-gthelen@google.com>
Hi, Greg,
good finding. See comments below.
On 01.06.2020 06:22, Greg Thelen wrote:
> Since v4.19 commit b0dedc49a2da ("mm/vmscan.c: iterate only over charged
> shrinkers during memcg shrink_slab()") a memcg aware shrinker is only
> called when the per-memcg per-node shrinker_map indicates that the
> shrinker may have objects to release to the memcg and node.
>
> shmem_unused_huge_count and shmem_unused_huge_scan support the per-tmpfs
> shrinker which advertises per memcg and numa awareness. The shmem
> shrinker releases memory by splitting hugepages that extend beyond
> i_size.
>
> Shmem does not currently set bits in shrinker_map. So, starting with
> b0dedc49a2da, memcg reclaim avoids calling the shmem shrinker under
> pressure. This leads to undeserved memcg OOM kills.
> Example that reliably sees memcg OOM kill in unpatched kernel:
> FS=/tmp/fs
> CONTAINER=/cgroup/memory/tmpfs_shrinker
> mkdir -p $FS
> mount -t tmpfs -o huge=always nodev $FS
> # Create 1000 MB container, which shouldn't suffer OOM.
> mkdir $CONTAINER
> echo 1000M > $CONTAINER/memory.limit_in_bytes
> echo $BASHPID >> $CONTAINER/cgroup.procs
> # Create 4000 files. Ideally each file uses 4k data page + a little
> # metadata. Assume 8k total per-file, 32MB (4000*8k) should easily
> # fit within container's 1000 MB. But if data pages use 2MB
> # hugepages (due to aggressive huge=always) then files consume 8GB,
> # which hits memcg 1000 MB limit.
> for i in {1..4000}; do
> echo . > $FS/$i
> done
>
> v5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg
> aware") maintains the per-node per-memcg shrinker bitmap for THP
> shrinker. But there's no such logic in shmem. Make shmem set the
> per-memcg per-node shrinker bits when it modifies inodes to have
> shrinkable pages.
>
> Fixes: b0dedc49a2da ("mm/vmscan.c: iterate only over charged shrinkers during memcg shrink_slab()")
> Cc: <stable@vger.kernel.org> # 4.19+
> Signed-off-by: Greg Thelen <gthelen@google.com>
> ---
> mm/shmem.c | 61 +++++++++++++++++++++++++++++++-----------------------
> 1 file changed, 35 insertions(+), 26 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index bd8840082c94..e11090f78cb5 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1002,6 +1002,33 @@ static int shmem_getattr(const struct path *path, struct kstat *stat,
> return 0;
> }
>
> +/*
> + * Expose inode and optional page to shrinker as having a possibly splittable
> + * hugepage that reaches beyond i_size.
> + */
> +static void shmem_shrinker_add(struct shmem_sb_info *sbinfo,
> + struct inode *inode, struct page *page)
> +{
> + struct shmem_inode_info *info = SHMEM_I(inode);
> +
> + spin_lock(&sbinfo->shrinklist_lock);
> + /*
> + * _careful to defend against unlocked access to ->shrink_list in
> + * shmem_unused_huge_shrink()
> + */
> + if (list_empty_careful(&info->shrinklist)) {
> + list_add_tail(&info->shrinklist, &sbinfo->shrinklist);
> + sbinfo->shrinklist_len++;
> + }
> + spin_unlock(&sbinfo->shrinklist_lock);
> +
> +#ifdef CONFIG_MEMCG
> + if (page && PageTransHuge(page))
> + memcg_set_shrinker_bit(page->mem_cgroup, page_to_nid(page),
> + inode->i_sb->s_shrink.id);
> +#endif
> +}
> +
> static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
> {
> struct inode *inode = d_inode(dentry);
> @@ -1048,17 +1075,13 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
> * to shrink under memory pressure.
> */
> if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> - spin_lock(&sbinfo->shrinklist_lock);
> - /*
> - * _careful to defend against unlocked access to
> - * ->shrink_list in shmem_unused_huge_shrink()
> - */
> - if (list_empty_careful(&info->shrinklist)) {
> - list_add_tail(&info->shrinklist,
> - &sbinfo->shrinklist);
> - sbinfo->shrinklist_len++;
> - }
> - spin_unlock(&sbinfo->shrinklist_lock);
> + struct page *page;
> +
> + page = find_get_page(inode->i_mapping,
> + (newsize & HPAGE_PMD_MASK) >> PAGE_SHIFT);
> + shmem_shrinker_add(sbinfo, inode, page);
> + if (page)
> + put_page(page);
1)I'd move PageTransHuge() check from shmem_shrinker_add() to here. In case of page is not trans huge,
it looks strange and completely useless to add inode to shrinklist and then to avoid memcg_set_shrinker_bit().
Nothing should be added to the shrinklist in this case.
2)In general I think this "last inode page spliter" does not fit SHINKER_MEMCG_AWARE conception, and
shmem_unused_huge_shrink() should be reworked as a new separate !memcg-aware shrinker instead of
.nr_cached_objects callback of generic fs shrinker.
CC: Kirill Shutemov
Kirill, are there any fundamental reasons to keep this shrinking logic in the generic fs shrinker?
Are there any no-go to for conversion this as a separate !memcg-aware shrinker?
> }
> }
> }
> @@ -1889,21 +1912,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> if (PageTransHuge(page) &&
> DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE) <
> hindex + HPAGE_PMD_NR - 1) {
> - /*
> - * Part of the huge page is beyond i_size: subject
> - * to shrink under memory pressure.
> - */
> - spin_lock(&sbinfo->shrinklist_lock);
> - /*
> - * _careful to defend against unlocked access to
> - * ->shrink_list in shmem_unused_huge_shrink()
> - */
> - if (list_empty_careful(&info->shrinklist)) {
> - list_add_tail(&info->shrinklist,
> - &sbinfo->shrinklist);
> - sbinfo->shrinklist_len++;
> - }
> - spin_unlock(&sbinfo->shrinklist_lock);
> + shmem_shrinker_add(sbinfo, inode, page);
> }
>
> /*
Thanks,
Kirill
next prev parent reply other threads:[~2020-06-01 11:05 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-01 3:22 [PATCH] shmem, memcg: enable memcg aware shrinker Greg Thelen
2020-06-01 11:05 ` Kirill Tkhai [this message]
2020-06-01 21:48 ` Greg Thelen
2020-06-02 8:28 ` Kirill Tkhai
2020-06-01 22:59 ` Yang Shi
2020-06-04 8:17 ` Greg Thelen
2020-06-05 22:05 ` Yang Shi
2020-06-08 4:49 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ffdff0be-f2b6-c7c0-debc-9c5e8a33ae4e@virtuozzo.com \
--to=ktkhai@virtuozzo.com \
--cc=akpm@linux-foundation.org \
--cc=gthelen@google.com \
--cc=hughd@google.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).