linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Thelen <gthelen@google.com>
To: Michal Hocko <mhocko@kernel.org>, Shakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg
Date: Mon, 09 Oct 2017 10:52:44 -0700	[thread overview]
Message-ID: <xr93a810xl77.fsf@gthelen.svl.corp.google.com> (raw)
In-Reply-To: <20171009062426.hmqedtqz5hkmhnff@dhcp22.suse.cz>

Michal Hocko <mhocko@kernel.org> wrote:

> On Fri 06-10-17 12:33:03, Shakeel Butt wrote:
>> >>       names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
>> >> -                     SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
>> >> +                     SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL);
>> >
>> > I might be wrong but isn't name cache only holding temporary objects
>> > used for path resolution which are not stored anywhere?
>> >
>> 
>> Even though they're temporary, many containers can together use a
>> significant amount of transient uncharged memory. We've seen machines
>> with 100s of MiBs in names_cache.
>
> Yes that might be possible but are we prepared for random ENOMEM from
> vfs calls which need to allocate a temporary name?
>
>> 
>> >>       filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
>> >> -                     SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
>> >> +                     SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL);
>> >>       percpu_counter_init(&nr_files, 0, GFP_KERNEL);
>> >>  }
>> >
>> > Don't we have a limit for the maximum number of open files?
>> >
>> 
>> Yes, there is a system limit of maximum number of open files. However
>> this limit is shared between different users on the system and one
>> user can hog this resource. To cater that, we set the maximum limit
>> very high and let the memory limit of each user limit the number of
>> files they can open.
>
> Similarly here. Are all syscalls allocating a fd prepared to return
> ENOMEM?
>
> -- 
> Michal Hocko
> SUSE Labs

Even before this patch I find memcg oom handling inconsistent.  Page
cache pages trigger oom killer and may allow caller to succeed once the
kernel retries.  But kmem allocations don't call oom killer.  They
surface errors to user space.  This makes memcg hard to use for memory
overcommit because it's desirable for a high priority task to
transparently kill a lower priority task using the memcg oom killer.

A few ideas on how to make it more flexible:

a) Go back to memcg oom killing within memcg charging.  This runs risk
   of oom killing while caller holds locks which oom victim selection or
   oom victim termination may need.  Google's been running this way for
   a while.

b) Have every syscall return do something similar to page fault handler:
   kmem allocations in oom memcg mark the current task as needing an oom
   check return NULL.  If marked oom, syscall exit would use
   mem_cgroup_oom_synchronize() before retrying the syscall.  Seems
   risky.  I doubt every syscall is compatible with such a restart.

c) Overcharge kmem to oom memcg and queue an async memcg limit checker,
   which will oom kill if needed.

Comments?

Demo program which eventually gets ENOSPC from mkdir.

$ cat /tmp/t
while umount /tmp/mnt; do true; done
mkdir -p /tmp/mnt
mount -t tmpfs nodev /tmp/mnt
cd /dev/cgroup/memory
rmdir t
mkdir t
echo 32M > t/memory.limit_in_bytes
(echo $BASHPID > t/cgroup.procs && cd /tmp/mnt && exec /tmp/mkdirs)

$ cat /tmp/mkdirs.c
#include <err.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>

int main()
{
        int i;
        char name[32];

        if (mlockall(MCL_CURRENT|MCL_FUTURE))
                err(1, "mlockall");
        for (i = 0; i < (1<<20); i++) {
                sprintf(name, "%d", i);
                if (mkdir(name, 0700))
                        err(1, "mkdir");
        }
        printf("done\n");
        return 0;
}

  reply	other threads:[~2017-10-09 17:52 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-05 22:21 [PATCH] fs, mm: account filp and names caches to kmemcg Shakeel Butt
2017-10-06  7:59 ` Michal Hocko
2017-10-06 19:33   ` Shakeel Butt
2017-10-09  6:24     ` Michal Hocko
2017-10-09 17:52       ` Greg Thelen [this message]
2017-10-09 18:04         ` Michal Hocko
2017-10-09 18:17           ` Michal Hocko
2017-10-10  9:10             ` Michal Hocko
2017-10-10 22:21               ` Shakeel Butt
2017-10-11  9:09                 ` Michal Hocko
2017-10-09 20:26         ` Johannes Weiner
2017-10-10  9:14           ` Michal Hocko
2017-10-10 14:17             ` Johannes Weiner
2017-10-10 14:24               ` Michal Hocko
2017-10-12 19:03                 ` Johannes Weiner
2017-10-12 23:57                   ` Greg Thelen
2017-10-13  6:51                     ` Michal Hocko
2017-10-13  6:35                   ` Michal Hocko
2017-10-13  7:00                     ` Michal Hocko
2017-10-13 15:24                       ` Michal Hocko
2017-10-24 12:18                         ` Michal Hocko
2017-10-24 17:54                           ` Johannes Weiner
2017-10-24 16:06                         ` Johannes Weiner
2017-10-24 16:22                           ` Michal Hocko
2017-10-24 17:23                             ` Johannes Weiner
2017-10-24 17:55                               ` Michal Hocko
2017-10-24 18:58                                 ` Johannes Weiner
2017-10-24 20:15                                   ` Michal Hocko
2017-10-25  6:51                                     ` Greg Thelen
2017-10-25  7:15                                       ` Michal Hocko
2017-10-25 13:11                                         ` Johannes Weiner
2017-10-25 14:12                                           ` Michal Hocko
2017-10-25 16:44                                             ` Johannes Weiner
2017-10-25 17:29                                               ` Michal Hocko
2017-10-25 18:11                                                 ` Johannes Weiner
2017-10-25 19:00                                                   ` Michal Hocko
2017-10-25 21:13                                                     ` Johannes Weiner
2017-10-25 22:49                                                       ` Greg Thelen
2017-10-26  7:49                                                         ` Michal Hocko
2017-10-26 12:45                                                           ` Tetsuo Handa
2017-10-26 14:31                                                         ` Johannes Weiner
2017-10-26 19:56                                                           ` Greg Thelen
2017-10-27  8:20                                                             ` Michal Hocko
2017-10-27 20:50                                               ` Shakeel Butt
2017-10-30  8:29                                                 ` Michal Hocko
2017-10-30 19:28                                                   ` Shakeel Butt
2017-10-31  8:00                                                     ` Michal Hocko
2017-10-31 16:49                                                       ` Johannes Weiner
2017-10-31 18:50                                                         ` Michal Hocko
2017-10-24 15:45                     ` Johannes Weiner
2017-10-24 16:30                       ` Michal Hocko
2017-10-10 23:32 ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xr93a810xl77.fsf@gthelen.svl.corp.google.com \
    --to=gthelen@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).