All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Fam Zheng <zhengfeiran@bytedance.com>
Cc: cgroups@vger.kernel.org, "Linux MM" <linux-mm@kvack.org>,
	tj@kernel.org, "Johannes Weiner" <hannes@cmpxchg.org>,
	lizefan@huawei.com, "Michal Hocko" <mhocko@kernel.org>,
	"Vladimir Davydov" <vdavydov.dev@gmail.com>,
	duanxiongchun@bytedance.com, 张永肃 <zhangyongsu@bytedance.com>,
	liuxiaozhou@bytedance.com
Subject: Re: memory cgroup pagecache and inode problem
Date: Fri, 4 Jan 2019 11:36:27 -0800	[thread overview]
Message-ID: <CAHbLzkpbVjtx+uxb1sq-wjBAAv_My6kq4c4bwqRKAmOTZ9dR8g@mail.gmail.com> (raw)
In-Reply-To: <ADF3C74C-BE96-495F-911F-77DDF3368912@bytedance.com>

On Thu, Jan 3, 2019 at 9:12 PM Fam Zheng <zhengfeiran@bytedance.com> wrote:
>
>
>
> On Jan 4, 2019, at 13:00, Yang Shi <shy828301@gmail.com> wrote:
>
> On Thu, Jan 3, 2019 at 8:45 PM Fam Zheng <zhengfeiran@bytedance.com> wrote:
>
>
> Fixing the mm list address. Sorry for the noise.
>
> Fam
>
>
> On Jan 4, 2019, at 12:43, Fam Zheng <zhengfeiran@bytedance.com> wrote:
>
> Hi,
>
> In our server which frequently spawns containers, we find that if a process used pagecache in memory cgroup, after the process exits and memory cgroup is offlined, because the pagecache is still charged in this memory cgroup, this memory cgroup will not be destroyed until the pagecaches are dropped. This brings huge memory stress over time. We find that over one hundred thounsand such offlined memory cgroup in system hold too much memory (~100G). This memory can not be released immediately even after all associated pagecahes are released, because those memory cgroups are destroy asynchronously by a kworker. In some cases this can cause oom, since the synchronous memory allocation failed.
>
>
> Does force_empty help out your usecase? You can write to
> memory.force_empty to reclaim as much as possible memory before
> rmdir'ing memcg. This would prevent from page cache accumulating.
>
>
> Hmm, this might be an option. FWIW we have been using drop_caches to workaround.

drop_caches would drop all page caches globally. You may not want to
drop the page caches used by other memcgs.

>
>
> BTW, this is cgroup v1 only, I'm working on a patch to bring this back
> into v2 as discussed in https://lkml.org/lkml/2019/1/3/484.
>
> We think a fix is to create a kworker that scans all pagecaches and dentry caches etc. in the background, if a referenced memory cgroup is offline, try to drop the cache or move it to the parent cgroup. This kworker can wake up periodically, or upon memory cgroup offline event (or both).
>
>
> Reparenting has been deprecated for a long time. I don't think we want
> to bring it back. Actually, css offline is handled by kworker now. I
> proposed a patch to do force_empty in kworker, please see
> https://lkml.org/lkml/2019/1/2/377.
>
>
> Could you elaborate a bit about why reparenting is not a good idea?

AFAIK, reparenting may cause some tricky race condition. Since we can
iterate offline memcgs now, so the memory charged to offline memcg
could get reclaimed when memory pressure happens.

Johannes and Michal would know more about the background than me.

Yang

>
>
>
> There is a similar problem in inode. After digging in ext4 code, we find that when creating inode cache, SLAB_ACCOUNT is used. In this case, inode will alloc in slab which belongs to the current memory cgroup. After this memory cgroup goes offline, this inode may be held by a dentry cache. If another process uses the same file. this inode will be held by that process, preventing the previous memory cgroup from being destroyed until this other process closes the file and drops the dentry cache.
>
>
> I'm not sure if you really need kmem charge. If not, you may try
> cgroup.memory=nokmem.
>
>
> A very good hint, we’ll investigate, thanks!
>
> Fam
>
>
> Regards,
> Yang
>
>
> We still don't have a reasonable way to fix this.
>
> Ideas?
>
>

  reply	other threads:[~2019-01-04 19:36 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <15614FDC-198E-449B-BFAF-B00D6EF61155@bytedance.com>
2019-01-04  4:44 ` memory cgroup pagecache and inode problem Fam Zheng
2019-01-04  5:00   ` Yang Shi
2019-01-04  5:12     ` Fam Zheng
2019-01-04 19:36       ` Yang Shi [this message]
2019-01-07  5:10         ` Fam Zheng
2019-01-07  8:53           ` Michal Hocko
2019-01-07  9:01             ` Fam Zheng
2019-01-07  9:13               ` Michal Hocko
2019-01-09  4:33               ` Fam Zheng
2019-01-10  5:36           ` Yang Shi
2019-01-10  8:30             ` Fam Zheng
2019-01-10  8:41               ` Michal Hocko
2019-01-16  0:50               ` Yang Shi
2019-01-16  3:52                 ` Fam Zheng
2019-01-16  7:06                   ` Michal Hocko
2019-01-16 21:08                     ` Yang Shi
2019-01-16 21:06                   ` Yang Shi
2019-01-17  2:41                     ` Fam Zheng
2019-01-17  5:06                       ` Yang Shi
2019-01-19  3:17                         ` 段熊春
2019-01-20 23:15                         ` Shakeel Butt
2019-01-20 23:15                           ` Shakeel Butt
2019-01-20 23:20                           ` Shakeel Butt
2019-01-21 10:27                           ` Michal Hocko
2019-01-04  9:04 ` Michal Hocko
2019-01-04 10:02   ` Fam Zheng
2019-01-04 10:12     ` Michal Hocko
2019-01-04 10:35       ` Fam Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkpbVjtx+uxb1sq-wjBAAv_My6kq4c4bwqRKAmOTZ9dR8g@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=duanxiongchun@bytedance.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=liuxiaozhou@bytedance.com \
    --cc=lizefan@huawei.com \
    --cc=mhocko@kernel.org \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=zhangyongsu@bytedance.com \
    --cc=zhengfeiran@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.