linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Kirill Tkhai <ktkhai@virtuozzo.com>,
	Linux MM <linux-mm@kvack.org>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	stable@vger.kernel.org
Subject: Re: [PATCH] shmem, memcg: enable memcg aware shrinker
Date: Mon, 1 Jun 2020 15:59:56 -0700	[thread overview]
Message-ID: <CAHbLzkq84qtOqfvP5SmPoAyL+Pyffd9K3108AOYk5yKF03jBmw@mail.gmail.com> (raw)
In-Reply-To: <20200601032204.124624-1-gthelen@google.com>

On Sun, May 31, 2020 at 8:22 PM Greg Thelen <gthelen@google.com> wrote:
>
> Since v4.19 commit b0dedc49a2da ("mm/vmscan.c: iterate only over charged
> shrinkers during memcg shrink_slab()") a memcg aware shrinker is only
> called when the per-memcg per-node shrinker_map indicates that the
> shrinker may have objects to release to the memcg and node.
>
> shmem_unused_huge_count and shmem_unused_huge_scan support the per-tmpfs
> shrinker which advertises per memcg and numa awareness.  The shmem
> shrinker releases memory by splitting hugepages that extend beyond
> i_size.
>
> Shmem does not currently set bits in shrinker_map.  So, starting with
> b0dedc49a2da, memcg reclaim avoids calling the shmem shrinker under
> pressure.  This leads to undeserved memcg OOM kills.
> Example that reliably sees memcg OOM kill in unpatched kernel:
>   FS=/tmp/fs
>   CONTAINER=/cgroup/memory/tmpfs_shrinker
>   mkdir -p $FS
>   mount -t tmpfs -o huge=always nodev $FS
>   # Create 1000 MB container, which shouldn't suffer OOM.
>   mkdir $CONTAINER
>   echo 1000M > $CONTAINER/memory.limit_in_bytes
>   echo $BASHPID >> $CONTAINER/cgroup.procs
>   # Create 4000 files.  Ideally each file uses 4k data page + a little
>   # metadata.  Assume 8k total per-file, 32MB (4000*8k) should easily
>   # fit within container's 1000 MB.  But if data pages use 2MB
>   # hugepages (due to aggressive huge=always) then files consume 8GB,
>   # which hits memcg 1000 MB limit.
>   for i in {1..4000}; do
>     echo . > $FS/$i
>   done

It looks all the inodes which have tail THP beyond i_size are on one
single list, then the shrinker actually just splits the first
nr_to_scan inodes. But since the list is not memcg aware, so it seems
it may split the THPs which are not charged to the victim memcg and
the victim memcg still may suffer from pre-mature oom, right?

>
> v5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg
> aware") maintains the per-node per-memcg shrinker bitmap for THP
> shrinker.  But there's no such logic in shmem.  Make shmem set the
> per-memcg per-node shrinker bits when it modifies inodes to have
> shrinkable pages.
>
> Fixes: b0dedc49a2da ("mm/vmscan.c: iterate only over charged shrinkers during memcg shrink_slab()")
> Cc: <stable@vger.kernel.org> # 4.19+
> Signed-off-by: Greg Thelen <gthelen@google.com>
> ---
>  mm/shmem.c | 61 +++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 35 insertions(+), 26 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index bd8840082c94..e11090f78cb5 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1002,6 +1002,33 @@ static int shmem_getattr(const struct path *path, struct kstat *stat,
>         return 0;
>  }
>
> +/*
> + * Expose inode and optional page to shrinker as having a possibly splittable
> + * hugepage that reaches beyond i_size.
> + */
> +static void shmem_shrinker_add(struct shmem_sb_info *sbinfo,
> +                              struct inode *inode, struct page *page)
> +{
> +       struct shmem_inode_info *info = SHMEM_I(inode);
> +
> +       spin_lock(&sbinfo->shrinklist_lock);
> +       /*
> +        * _careful to defend against unlocked access to ->shrink_list in
> +        * shmem_unused_huge_shrink()
> +        */
> +       if (list_empty_careful(&info->shrinklist)) {
> +               list_add_tail(&info->shrinklist, &sbinfo->shrinklist);
> +               sbinfo->shrinklist_len++;
> +       }
> +       spin_unlock(&sbinfo->shrinklist_lock);
> +
> +#ifdef CONFIG_MEMCG
> +       if (page && PageTransHuge(page))
> +               memcg_set_shrinker_bit(page->mem_cgroup, page_to_nid(page),
> +                                      inode->i_sb->s_shrink.id);
> +#endif
> +}
> +
>  static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
>  {
>         struct inode *inode = d_inode(dentry);
> @@ -1048,17 +1075,13 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
>                          * to shrink under memory pressure.
>                          */
>                         if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> -                               spin_lock(&sbinfo->shrinklist_lock);
> -                               /*
> -                                * _careful to defend against unlocked access to
> -                                * ->shrink_list in shmem_unused_huge_shrink()
> -                                */
> -                               if (list_empty_careful(&info->shrinklist)) {
> -                                       list_add_tail(&info->shrinklist,
> -                                                       &sbinfo->shrinklist);
> -                                       sbinfo->shrinklist_len++;
> -                               }
> -                               spin_unlock(&sbinfo->shrinklist_lock);
> +                               struct page *page;
> +
> +                               page = find_get_page(inode->i_mapping,
> +                                       (newsize & HPAGE_PMD_MASK) >> PAGE_SHIFT);
> +                               shmem_shrinker_add(sbinfo, inode, page);
> +                               if (page)
> +                                       put_page(page);
>                         }
>                 }
>         }
> @@ -1889,21 +1912,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
>         if (PageTransHuge(page) &&
>             DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE) <
>                         hindex + HPAGE_PMD_NR - 1) {
> -               /*
> -                * Part of the huge page is beyond i_size: subject
> -                * to shrink under memory pressure.
> -                */
> -               spin_lock(&sbinfo->shrinklist_lock);
> -               /*
> -                * _careful to defend against unlocked access to
> -                * ->shrink_list in shmem_unused_huge_shrink()
> -                */
> -               if (list_empty_careful(&info->shrinklist)) {
> -                       list_add_tail(&info->shrinklist,
> -                                     &sbinfo->shrinklist);
> -                       sbinfo->shrinklist_len++;
> -               }
> -               spin_unlock(&sbinfo->shrinklist_lock);
> +               shmem_shrinker_add(sbinfo, inode, page);
>         }
>
>         /*
> --
> 2.27.0.rc0.183.gde8f92d652-goog
>
>


  parent reply	other threads:[~2020-06-01 23:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-01  3:22 [PATCH] shmem, memcg: enable memcg aware shrinker Greg Thelen
2020-06-01 11:05 ` Kirill Tkhai
2020-06-01 21:48   ` Greg Thelen
2020-06-02  8:28     ` Kirill Tkhai
2020-06-01 22:59 ` Yang Shi [this message]
2020-06-04  8:17   ` Greg Thelen
2020-06-05 22:05     ` Yang Shi
2020-06-08  4:49 ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkq84qtOqfvP5SmPoAyL+Pyffd9K3108AOYk5yKF03jBmw@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=hughd@google.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).