From: Michal Hocko <mhocko@kernel.org>
To: Greg Thelen <gthelen@google.com>
Cc: Shakeel Butt <shakeelb@google.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux MM <linux-mm@kvack.org>,
linux-fsdevel@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg
Date: Mon, 9 Oct 2017 20:04:09 +0200 [thread overview]
Message-ID: <20171009180409.z3mpk3m7m75hjyfv@dhcp22.suse.cz> (raw)
In-Reply-To: <xr93a810xl77.fsf@gthelen.svl.corp.google.com>
[CC Johannes - the thread starts
http://lkml.kernel.org/r/20171005222144.123797-1-shakeelb@google.com]
On Mon 09-10-17 10:52:44, Greg Thelen wrote:
> Michal Hocko <mhocko@kernel.org> wrote:
>
> > On Fri 06-10-17 12:33:03, Shakeel Butt wrote:
> >> >> names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
> >> >> - SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
> >> >> + SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL);
> >> >
> >> > I might be wrong but isn't name cache only holding temporary objects
> >> > used for path resolution which are not stored anywhere?
> >> >
> >>
> >> Even though they're temporary, many containers can together use a
> >> significant amount of transient uncharged memory. We've seen machines
> >> with 100s of MiBs in names_cache.
> >
> > Yes that might be possible but are we prepared for random ENOMEM from
> > vfs calls which need to allocate a temporary name?
> >
> >>
> >> >> filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
> >> >> - SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
> >> >> + SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL);
> >> >> percpu_counter_init(&nr_files, 0, GFP_KERNEL);
> >> >> }
> >> >
> >> > Don't we have a limit for the maximum number of open files?
> >> >
> >>
> >> Yes, there is a system limit of maximum number of open files. However
> >> this limit is shared between different users on the system and one
> >> user can hog this resource. To cater that, we set the maximum limit
> >> very high and let the memory limit of each user limit the number of
> >> files they can open.
> >
> > Similarly here. Are all syscalls allocating a fd prepared to return
> > ENOMEM?
> >
> > --
> > Michal Hocko
> > SUSE Labs
>
> Even before this patch I find memcg oom handling inconsistent. Page
> cache pages trigger oom killer and may allow caller to succeed once the
> kernel retries. But kmem allocations don't call oom killer. They
> surface errors to user space. This makes memcg hard to use for memory
> overcommit because it's desirable for a high priority task to
> transparently kill a lower priority task using the memcg oom killer.
>
> A few ideas on how to make it more flexible:
>
> a) Go back to memcg oom killing within memcg charging. This runs risk
> of oom killing while caller holds locks which oom victim selection or
> oom victim termination may need. Google's been running this way for
> a while.
>
> b) Have every syscall return do something similar to page fault handler:
> kmem allocations in oom memcg mark the current task as needing an oom
> check return NULL. If marked oom, syscall exit would use
> mem_cgroup_oom_synchronize() before retrying the syscall. Seems
> risky. I doubt every syscall is compatible with such a restart.
>
> c) Overcharge kmem to oom memcg and queue an async memcg limit checker,
> which will oom kill if needed.
>
> Comments?
>
> Demo program which eventually gets ENOSPC from mkdir.
>
> $ cat /tmp/t
> while umount /tmp/mnt; do true; done
> mkdir -p /tmp/mnt
> mount -t tmpfs nodev /tmp/mnt
> cd /dev/cgroup/memory
> rmdir t
> mkdir t
> echo 32M > t/memory.limit_in_bytes
> (echo $BASHPID > t/cgroup.procs && cd /tmp/mnt && exec /tmp/mkdirs)
>
> $ cat /tmp/mkdirs.c
> #include <err.h>
> #include <stdio.h>
> #include <sys/mman.h>
> #include <sys/stat.h>
> #include <sys/types.h>
>
> int main()
> {
> int i;
> char name[32];
>
> if (mlockall(MCL_CURRENT|MCL_FUTURE))
> err(1, "mlockall");
> for (i = 0; i < (1<<20); i++) {
> sprintf(name, "%d", i);
> if (mkdir(name, 0700))
> err(1, "mkdir");
> }
> printf("done\n");
> return 0;
> }
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2017-10-09 18:04 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-05 22:21 [PATCH] fs, mm: account filp and names caches to kmemcg Shakeel Butt
2017-10-06 7:59 ` Michal Hocko
2017-10-06 19:33 ` Shakeel Butt
2017-10-09 6:24 ` Michal Hocko
2017-10-09 17:52 ` Greg Thelen
2017-10-09 18:04 ` Michal Hocko [this message]
2017-10-09 18:17 ` Michal Hocko
2017-10-10 9:10 ` Michal Hocko
2017-10-10 22:21 ` Shakeel Butt
2017-10-11 9:09 ` Michal Hocko
2017-10-09 20:26 ` Johannes Weiner
2017-10-10 9:14 ` Michal Hocko
2017-10-10 14:17 ` Johannes Weiner
2017-10-10 14:24 ` Michal Hocko
2017-10-12 19:03 ` Johannes Weiner
2017-10-12 23:57 ` Greg Thelen
2017-10-13 6:51 ` Michal Hocko
2017-10-13 6:35 ` Michal Hocko
2017-10-13 7:00 ` Michal Hocko
2017-10-13 15:24 ` Michal Hocko
2017-10-24 12:18 ` Michal Hocko
2017-10-24 17:54 ` Johannes Weiner
2017-10-24 16:06 ` Johannes Weiner
2017-10-24 16:22 ` Michal Hocko
2017-10-24 17:23 ` Johannes Weiner
2017-10-24 17:55 ` Michal Hocko
2017-10-24 18:58 ` Johannes Weiner
2017-10-24 20:15 ` Michal Hocko
2017-10-25 6:51 ` Greg Thelen
2017-10-25 7:15 ` Michal Hocko
2017-10-25 13:11 ` Johannes Weiner
2017-10-25 14:12 ` Michal Hocko
2017-10-25 16:44 ` Johannes Weiner
2017-10-25 17:29 ` Michal Hocko
2017-10-25 18:11 ` Johannes Weiner
2017-10-25 19:00 ` Michal Hocko
2017-10-25 21:13 ` Johannes Weiner
2017-10-25 22:49 ` Greg Thelen
2017-10-26 7:49 ` Michal Hocko
2017-10-26 12:45 ` Tetsuo Handa
2017-10-26 14:31 ` Johannes Weiner
2017-10-26 19:56 ` Greg Thelen
2017-10-27 8:20 ` Michal Hocko
2017-10-27 20:50 ` Shakeel Butt
2017-10-30 8:29 ` Michal Hocko
2017-10-30 19:28 ` Shakeel Butt
2017-10-31 8:00 ` Michal Hocko
2017-10-31 16:49 ` Johannes Weiner
2017-10-31 18:50 ` Michal Hocko
2017-10-24 15:45 ` Johannes Weiner
2017-10-24 16:30 ` Michal Hocko
2017-10-10 23:32 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171009180409.z3mpk3m7m75hjyfv@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shakeelb@google.com \
--cc=vdavydov.dev@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).