From: Michal Hocko <mhocko@kernel.org> To: Johannes Weiner <hannes@cmpxchg.org> Cc: Greg Thelen <gthelen@google.com>, Shakeel Butt <shakeelb@google.com>, Alexander Viro <viro@zeniv.linux.org.uk>, Vladimir Davydov <vdavydov.dev@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, Linux MM <linux-mm@kvack.org>, linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org> Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg Date: Tue, 10 Oct 2017 11:14:30 +0200 [thread overview] Message-ID: <20171010091430.giflzlayvjblx5bu@dhcp22.suse.cz> (raw) In-Reply-To: <20171009202613.GA15027@cmpxchg.org> On Mon 09-10-17 16:26:13, Johannes Weiner wrote: > On Mon, Oct 09, 2017 at 10:52:44AM -0700, Greg Thelen wrote: > > Michal Hocko <mhocko@kernel.org> wrote: > > > > > On Fri 06-10-17 12:33:03, Shakeel Butt wrote: > > >> >> names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0, > > >> >> - SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); > > >> >> + SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); > > >> > > > >> > I might be wrong but isn't name cache only holding temporary objects > > >> > used for path resolution which are not stored anywhere? > > >> > > > >> > > >> Even though they're temporary, many containers can together use a > > >> significant amount of transient uncharged memory. We've seen machines > > >> with 100s of MiBs in names_cache. > > > > > > Yes that might be possible but are we prepared for random ENOMEM from > > > vfs calls which need to allocate a temporary name? > > > > > >> > > >> >> filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, > > >> >> - SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL); > > >> >> + SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL); > > >> >> percpu_counter_init(&nr_files, 0, GFP_KERNEL); > > >> >> } > > >> > > > >> > Don't we have a limit for the maximum number of open files? > > >> > > > >> > > >> Yes, there is a system limit of maximum number of open files. However > > >> this limit is shared between different users on the system and one > > >> user can hog this resource. To cater that, we set the maximum limit > > >> very high and let the memory limit of each user limit the number of > > >> files they can open. > > > > > > Similarly here. Are all syscalls allocating a fd prepared to return > > > ENOMEM? > > > > > > -- > > > Michal Hocko > > > SUSE Labs > > > > Even before this patch I find memcg oom handling inconsistent. Page > > cache pages trigger oom killer and may allow caller to succeed once the > > kernel retries. But kmem allocations don't call oom killer. > > It's consistent in the sense that only page faults enable the memcg > OOM killer. It's not the type of memory that decides, it's whether the > allocation context has a channel to communicate an error to userspace. > > Whether userspace is able to handle -ENOMEM from syscalls was a voiced > concern at the time this patch was merged, although there haven't been > any reports so far, Well, I remember reports about MAP_POPULATE breaking or at least having an unexpected behavior. > and it seemed like the lesser evil between that > and deadlocking the kernel. agreed on this part though > If we could find a way to invoke the OOM killer safely, I would > welcome such patches. Well, we should be able to do that with the oom_reaper. At least for v2 which doesn't have synchronous userspace oom killing. [...] > > c) Overcharge kmem to oom memcg and queue an async memcg limit checker, > > which will oom kill if needed. > > This makes the most sense to me. Architecturally, I imagine this would > look like b), with an OOM handler at the point of return to userspace, > except that we'd overcharge instead of retrying the syscall. I do not think we should break the hard limit semantic if possible. We can currently allow that for allocations which are very short term (oom victims) or too important to fail but allowing that for kmem charges in general sounds like too easy to runaway. -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: Johannes Weiner <hannes@cmpxchg.org> Cc: Greg Thelen <gthelen@google.com>, Shakeel Butt <shakeelb@google.com>, Alexander Viro <viro@zeniv.linux.org.uk>, Vladimir Davydov <vdavydov.dev@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, Linux MM <linux-mm@kvack.org>, linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org> Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg Date: Tue, 10 Oct 2017 11:14:30 +0200 [thread overview] Message-ID: <20171010091430.giflzlayvjblx5bu@dhcp22.suse.cz> (raw) In-Reply-To: <20171009202613.GA15027@cmpxchg.org> On Mon 09-10-17 16:26:13, Johannes Weiner wrote: > On Mon, Oct 09, 2017 at 10:52:44AM -0700, Greg Thelen wrote: > > Michal Hocko <mhocko@kernel.org> wrote: > > > > > On Fri 06-10-17 12:33:03, Shakeel Butt wrote: > > >> >> names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0, > > >> >> - SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); > > >> >> + SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); > > >> > > > >> > I might be wrong but isn't name cache only holding temporary objects > > >> > used for path resolution which are not stored anywhere? > > >> > > > >> > > >> Even though they're temporary, many containers can together use a > > >> significant amount of transient uncharged memory. We've seen machines > > >> with 100s of MiBs in names_cache. > > > > > > Yes that might be possible but are we prepared for random ENOMEM from > > > vfs calls which need to allocate a temporary name? > > > > > >> > > >> >> filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, > > >> >> - SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL); > > >> >> + SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL); > > >> >> percpu_counter_init(&nr_files, 0, GFP_KERNEL); > > >> >> } > > >> > > > >> > Don't we have a limit for the maximum number of open files? > > >> > > > >> > > >> Yes, there is a system limit of maximum number of open files. However > > >> this limit is shared between different users on the system and one > > >> user can hog this resource. To cater that, we set the maximum limit > > >> very high and let the memory limit of each user limit the number of > > >> files they can open. > > > > > > Similarly here. Are all syscalls allocating a fd prepared to return > > > ENOMEM? > > > > > > -- > > > Michal Hocko > > > SUSE Labs > > > > Even before this patch I find memcg oom handling inconsistent. Page > > cache pages trigger oom killer and may allow caller to succeed once the > > kernel retries. But kmem allocations don't call oom killer. > > It's consistent in the sense that only page faults enable the memcg > OOM killer. It's not the type of memory that decides, it's whether the > allocation context has a channel to communicate an error to userspace. > > Whether userspace is able to handle -ENOMEM from syscalls was a voiced > concern at the time this patch was merged, although there haven't been > any reports so far, Well, I remember reports about MAP_POPULATE breaking or at least having an unexpected behavior. > and it seemed like the lesser evil between that > and deadlocking the kernel. agreed on this part though > If we could find a way to invoke the OOM killer safely, I would > welcome such patches. Well, we should be able to do that with the oom_reaper. At least for v2 which doesn't have synchronous userspace oom killing. [...] > > c) Overcharge kmem to oom memcg and queue an async memcg limit checker, > > which will oom kill if needed. > > This makes the most sense to me. Architecturally, I imagine this would > look like b), with an OOM handler at the point of return to userspace, > except that we'd overcharge instead of retrying the syscall. I do not think we should break the hard limit semantic if possible. We can currently allow that for allocations which are very short term (oom victims) or too important to fail but allowing that for kmem charges in general sounds like too easy to runaway. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-10 9:14 UTC|newest] Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-10-05 22:21 [PATCH] fs, mm: account filp and names caches to kmemcg Shakeel Butt 2017-10-05 22:21 ` Shakeel Butt 2017-10-06 7:59 ` Michal Hocko 2017-10-06 7:59 ` Michal Hocko 2017-10-06 19:33 ` Shakeel Butt 2017-10-06 19:33 ` Shakeel Butt 2017-10-09 6:24 ` Michal Hocko 2017-10-09 6:24 ` Michal Hocko 2017-10-09 17:52 ` Greg Thelen 2017-10-09 17:52 ` Greg Thelen 2017-10-09 18:04 ` Michal Hocko 2017-10-09 18:04 ` Michal Hocko 2017-10-09 18:17 ` Michal Hocko 2017-10-09 18:17 ` Michal Hocko 2017-10-10 9:10 ` Michal Hocko 2017-10-10 9:10 ` Michal Hocko 2017-10-10 22:21 ` Shakeel Butt 2017-10-10 22:21 ` Shakeel Butt 2017-10-11 9:09 ` Michal Hocko 2017-10-11 9:09 ` Michal Hocko 2017-10-09 20:26 ` Johannes Weiner 2017-10-09 20:26 ` Johannes Weiner 2017-10-10 9:14 ` Michal Hocko [this message] 2017-10-10 9:14 ` Michal Hocko 2017-10-10 14:17 ` Johannes Weiner 2017-10-10 14:17 ` Johannes Weiner 2017-10-10 14:24 ` Michal Hocko 2017-10-10 14:24 ` Michal Hocko 2017-10-12 19:03 ` Johannes Weiner 2017-10-12 19:03 ` Johannes Weiner 2017-10-12 23:57 ` Greg Thelen 2017-10-12 23:57 ` Greg Thelen 2017-10-13 6:51 ` Michal Hocko 2017-10-13 6:51 ` Michal Hocko 2017-10-13 6:35 ` Michal Hocko 2017-10-13 6:35 ` Michal Hocko 2017-10-13 7:00 ` Michal Hocko 2017-10-13 7:00 ` Michal Hocko 2017-10-13 15:24 ` Michal Hocko 2017-10-13 15:24 ` Michal Hocko 2017-10-24 12:18 ` Michal Hocko 2017-10-24 12:18 ` Michal Hocko 2017-10-24 17:54 ` Johannes Weiner 2017-10-24 17:54 ` Johannes Weiner 2017-10-24 16:06 ` Johannes Weiner 2017-10-24 16:06 ` Johannes Weiner 2017-10-24 16:22 ` Michal Hocko 2017-10-24 16:22 ` Michal Hocko 2017-10-24 17:23 ` Johannes Weiner 2017-10-24 17:23 ` Johannes Weiner 2017-10-24 17:55 ` Michal Hocko 2017-10-24 17:55 ` Michal Hocko 2017-10-24 18:58 ` Johannes Weiner 2017-10-24 18:58 ` Johannes Weiner 2017-10-24 20:15 ` Michal Hocko 2017-10-24 20:15 ` Michal Hocko 2017-10-25 6:51 ` Greg Thelen 2017-10-25 6:51 ` Greg Thelen 2017-10-25 7:15 ` Michal Hocko 2017-10-25 7:15 ` Michal Hocko 2017-10-25 13:11 ` Johannes Weiner 2017-10-25 13:11 ` Johannes Weiner 2017-10-25 14:12 ` Michal Hocko 2017-10-25 14:12 ` Michal Hocko 2017-10-25 16:44 ` Johannes Weiner 2017-10-25 16:44 ` Johannes Weiner 2017-10-25 17:29 ` Michal Hocko 2017-10-25 17:29 ` Michal Hocko 2017-10-25 18:11 ` Johannes Weiner 2017-10-25 18:11 ` Johannes Weiner 2017-10-25 19:00 ` Michal Hocko 2017-10-25 19:00 ` Michal Hocko 2017-10-25 21:13 ` Johannes Weiner 2017-10-25 21:13 ` Johannes Weiner 2017-10-25 22:49 ` Greg Thelen 2017-10-25 22:49 ` Greg Thelen 2017-10-26 7:49 ` Michal Hocko 2017-10-26 7:49 ` Michal Hocko 2017-10-26 12:45 ` Tetsuo Handa 2017-10-26 12:45 ` Tetsuo Handa 2017-10-26 14:31 ` Johannes Weiner 2017-10-26 14:31 ` Johannes Weiner 2017-10-26 19:56 ` Greg Thelen 2017-10-26 19:56 ` Greg Thelen 2017-10-27 8:20 ` Michal Hocko 2017-10-27 8:20 ` Michal Hocko 2017-10-27 20:50 ` Shakeel Butt 2017-10-27 20:50 ` Shakeel Butt 2017-10-30 8:29 ` Michal Hocko 2017-10-30 8:29 ` Michal Hocko 2017-10-30 19:28 ` Shakeel Butt 2017-10-30 19:28 ` Shakeel Butt 2017-10-31 8:00 ` Michal Hocko 2017-10-31 8:00 ` Michal Hocko 2017-10-31 16:49 ` Johannes Weiner 2017-10-31 16:49 ` Johannes Weiner 2017-10-31 18:50 ` Michal Hocko 2017-10-31 18:50 ` Michal Hocko 2017-10-24 15:45 ` Johannes Weiner 2017-10-24 15:45 ` Johannes Weiner 2017-10-24 16:30 ` Michal Hocko 2017-10-24 16:30 ` Michal Hocko 2017-10-10 23:32 ` Al Viro 2017-10-10 23:32 ` Al Viro
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20171010091430.giflzlayvjblx5bu@dhcp22.suse.cz \ --to=mhocko@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=gthelen@google.com \ --cc=hannes@cmpxchg.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=shakeelb@google.com \ --cc=vdavydov.dev@gmail.com \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.