linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Vasily Averin <vvs@virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Cgroups <cgroups@vger.kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Roman Gushchin <guro@fb.com>, Jens Axboe <axboe@kernel.dk>,
	Oleg Nesterov <oleg@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v5 13/16] memcg: enable accounting for signals
Date: Tue, 20 Jul 2021 07:37:23 -0700	[thread overview]
Message-ID: <CALvZod7durMR7vYQ-5Uw5S=4UFTE5ojb=Vqk5oTEmRageH8zOg@mail.gmail.com> (raw)
In-Reply-To: <56816a9d-c2e5-127d-4d90-5d7d17782c8a@virtuozzo.com>

On Tue, Jul 20, 2021 at 1:35 AM Vasily Averin <vvs@virtuozzo.com> wrote:
>
> On 7/19/21 8:32 PM, Eric W. Biederman wrote:
> > Vasily Averin <vvs@virtuozzo.com> writes:
> >
> >> When a user send a signal to any another processes it forces the kernel
> >> to allocate memory for 'struct sigqueue' objects. The number of signals
> >> is limited by RLIMIT_SIGPENDING resource limit, but even the default
> >> settings allow each user to consume up to several megabytes of memory.
> >> Moreover, an untrusted admin inside container can increase the limit or
> >> create new fake users and force them to sent signals.
> >
> > Not any more.  Currently the number of sigqueue objects is limited
> > by the rlimit of the creator of the user namespace of the container.
> >
> >> It makes sense to account for these allocations to restrict the host's
> >> memory consumption from inside the memcg-limited container.
> >
> > Does it?  Why?  The given justification appears to have bit-rotted
> > since -rc1.
>
> Could you please explain what was changed in rc1?
> From my POV accounting is required to help OOM-killer to select proper target.
>
> > I know a lot of these things only really need a limit just to catch a
> > program that starts malfunctioning.  If that is indeed the case
> > reasonable per-resource limits are probably better than some great big
> > group limit that can be exhausted with any single resource in the group.
> >
> > Is there a reason I am not aware of that where it makes sense to group
> > all of the resources together and only count the number of bytes
> > consumed?
>
> Any new limits:
> a) should be set properly depending on huge number of incoming parameters.
> b) should properly notify about hits
> c) should be updated properly after b)
> d) do a)-c) automatically if possible
>
> In past OpenVz had own accounting subsystem, user beancounters (UBC).
> It accounted and limited 20+ resources  per-container: numfiles, file locks,
> signals, netfilter rules, socket buffers and so on.
> I assume you want to do something similar, so let me share our experience.
>
> We had a lot of problems with UBC:
> - it's quite hard to set up the limit.
>   Why it's good to consume N entities of some resource but it's bad to consume N+1 ones?
>   per-process? per-user? per-thread? per-task? per-namespace? if nested? per-container? per-host?
>   To answer the questions host admin should have additional knowledge and skills.
>
> - Ok, we have set all limits. Some application hits it and fails.
>   It's quite hard to understand that application hits the limit, and failed due to this reason.
>   From users point of view, if some application does not work (stable enough)
>   inside container => containers are guilty.
>
> - It's quite hard to understand that failed application just want to increase limit X up to N entities.
>
> As result both host admins and container users was unhappy.
> So after years of such fights we decided just to limit accounted memory instead.
>
> Anyway, OOM-killer must know who consumed memory to select proper target.
>

Just to support Vasily's point further, for systems running multiple
workloads, it is much more preferred to be able to set one limit for
each workload than to set many different limits.

One concrete example is described in commit ac7b79fd190b ("inotify,
memcg: account inotify instances to kmemcg"). The inotify instances
which can be limited through fs sysctl inotify/max_user_instances and
be further partitioned to users through per-user namespace specific
sysctl but there is no sensible way to set a limit and partition it on
a system that runs different workloads.

  reply	other threads:[~2021-07-20 14:53 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8664122a-99d3-7199-869a-781b21b7e712@virtuozzo.com>
2021-04-28  6:51 ` [PATCH v4 00/16] memcg accounting from OpenVZ Vasily Averin
2021-07-15 17:11   ` Shakeel Butt
2021-07-16  4:11     ` Vasily Averin
2021-07-16 12:55       ` Shakeel Butt
2021-07-19 10:44         ` [PATCH v5 " Vasily Averin
2021-07-26 18:59           ` [PATCH v6 00/16] memcg accounting from Vasily Averin
2021-07-26 21:59             ` David Miller
2021-07-27  4:44               ` [PATCH v6 00/16] memcg accounting from OpenVZ Vasily Averin
2021-07-27  5:33                 ` [PATCH v7 00/10] " Vasily Averin
     [not found]                 ` <cover.1627362057.git.vvs@virtuozzo.com>
2021-07-27  5:33                   ` [PATCH v7 01/10] memcg: enable accounting for mnt_cache entries Vasily Averin
2021-07-27  6:44                     ` Shakeel Butt
2021-07-27  7:21                     ` Christian Brauner
2021-07-27  5:33                   ` [PATCH v7 02/10] memcg: enable accounting for pollfd and select bits arrays Vasily Averin
2021-07-27 21:39                     ` Shakeel Butt
2021-07-27  5:33                   ` [PATCH v7 03/10] memcg: enable accounting for file lock caches Vasily Averin
2021-07-27 21:41                     ` Shakeel Butt
2021-07-27  5:33                   ` [PATCH v7 04/10] memcg: enable accounting for fasync_cache Vasily Averin
2021-07-27 21:50                     ` Shakeel Butt
2021-07-27  5:33                   ` [PATCH v7 05/10] memcg: enable accounting for new namesapces and struct nsproxy Vasily Averin
2021-07-27 21:51                     ` Shakeel Butt
2021-07-27  5:33                   ` [PATCH v7 06/10] memcg: enable accounting of ipc resources Vasily Averin
2021-07-27 22:33                     ` Shakeel Butt
2021-07-27  5:34                   ` [PATCH v7 07/10] memcg: enable accounting for signals Vasily Averin
2021-07-27  5:34                   ` [PATCH v7 08/10] memcg: enable accounting for posix_timers_cache slab Vasily Averin
2021-07-27 22:33                     ` Shakeel Butt
2021-07-27  5:34                   ` [PATCH v7 09/10] memcg: enable accounting for tty-related objects Vasily Averin
2021-07-27  6:09                     ` Greg Kroah-Hartman
2021-07-27  6:54                     ` Jiri Slaby
2021-07-27  8:02                       ` Vasily Averin
2021-07-27  9:26                         ` [PATCH TTY] memcg: drop GFP_KERNEL_ACCOUNT use in tty_save_termios() Vasily Averin
2021-07-27  9:32                           ` Greg Kroah-Hartman
2022-02-28  9:13                           ` [PATCH v2] memcg: enable accounting for tty-related objects Vasily Averin
2021-07-27  9:30                         ` [PATCH v7 09/10] " Greg Kroah-Hartman
2021-07-27  5:34                   ` [PATCH v7 10/10] memcg: enable accounting for ldt_struct objects Vasily Averin
2021-07-27 22:36                     ` Shakeel Butt
     [not found]           ` <cover.1627321321.git.vvs@virtuozzo.com>
2021-07-26 18:59             ` [PATCH v6 01/16] memcg: enable accounting for net_device and Tx/Rx queues Vasily Averin
2021-07-26 19:00             ` [PATCH v6 02/16] memcg: enable accounting for IP address and routing-related objects Vasily Averin
2021-07-26 19:00             ` [PATCH v6 03/16] memcg: enable accounting for inet_bin_bucket cache Vasily Averin
2021-07-26 19:00             ` [PATCH v6 04/16] memcg: enable accounting for VLAN group array Vasily Averin
2021-07-26 19:00             ` [PATCH v6 05/16] memcg: ipv6/sit: account and don't WARN on ip_tunnel_prl structs allocation Vasily Averin
2021-07-26 19:00             ` [PATCH v6 06/16] memcg: enable accounting for scm_fp_list objects Vasily Averin
2021-07-26 19:00             ` [PATCH v6 07/16] memcg: enable accounting for mnt_cache entries Vasily Averin
2021-07-26 19:00             ` [PATCH v6 08/16] memcg: enable accounting for pollfd and select bits arrays Vasily Averin
2021-07-26 19:01             ` [PATCH v6 09/16] memcg: enable accounting for file lock caches Vasily Averin
2021-07-26 19:01             ` [PATCH v6 10/16] memcg: enable accounting for fasync_cache Vasily Averin
2021-07-26 19:01             ` [PATCH v6 11/16] memcg: enable accounting for new namesapces and struct nsproxy Vasily Averin
2021-07-26 19:58               ` Kirill Tkhai
2021-07-26 19:01             ` [PATCH v6 12/16] memcg: enable accounting of ipc resources Vasily Averin
2021-07-26 19:01             ` [PATCH v6 13/16] memcg: enable accounting for signals Vasily Averin
2021-07-26 19:01             ` [PATCH v6 14/16] memcg: enable accounting for posix_timers_cache slab Vasily Averin
2021-07-26 19:01             ` [PATCH v6 15/16] memcg: enable accounting for tty-related objects Vasily Averin
2021-07-26 19:01             ` [PATCH v6 16/16] memcg: enable accounting for ldt_struct objects Vasily Averin
     [not found]         ` <cover.1626688654.git.vvs@virtuozzo.com>
2021-07-19 10:44           ` [PATCH v5 01/16] memcg: enable accounting for net_device and Tx/Rx queues Vasily Averin
2021-07-19 10:44           ` [PATCH v5 02/16] memcg: enable accounting for IP address and routing-related objects Vasily Averin
2021-07-19 14:00             ` Dmitry Safonov
2021-07-19 14:22               ` Shakeel Butt
2021-07-19 14:24                 ` Dmitry Safonov
2021-07-20 19:26             ` Shakeel Butt
2021-07-26 10:23               ` Vasily Averin
2021-07-26 13:48                 ` Shakeel Butt
2021-07-26 16:53                   ` [PATCH] memcg: replace in_interrupt() by !in_task() in active_memcg() Vasily Averin
2021-07-26 16:57                     ` Shakeel Butt
2021-07-19 10:44           ` [PATCH v5 03/16] memcg: enable accounting for inet_bin_bucket cache Vasily Averin
2021-07-19 10:44           ` [PATCH v5 04/16] memcg: enable accounting for VLAN group array Vasily Averin
2021-07-19 10:44           ` [PATCH v5 05/16] memcg: ipv6/sit: account and don't WARN on ip_tunnel_prl structs allocation Vasily Averin
2021-07-19 10:44           ` [PATCH v5 06/16] memcg: enable accounting for scm_fp_list objects Vasily Averin
2021-07-19 10:45           ` [PATCH v5 07/16] memcg: enable accounting for mnt_cache entries Vasily Averin
2021-07-19 10:45           ` [PATCH v5 08/16] memcg: enable accounting for pollfd and select bits arrays Vasily Averin
2021-07-19 10:45           ` [PATCH v5 09/16] memcg: enable accounting for file lock caches Vasily Averin
2021-07-19 10:45           ` [PATCH v5 10/16] memcg: enable accounting for fasync_cache Vasily Averin
2021-07-19 10:45           ` [PATCH v5 11/16] memcg: enable accounting for new namesapces and struct nsproxy Vasily Averin
2021-07-19 10:45           ` [PATCH v5 12/16] memcg: enable accounting of ipc resources Vasily Averin
2021-07-19 10:45           ` [PATCH v5 13/16] memcg: enable accounting for signals Vasily Averin
2021-07-19 17:32             ` Eric W. Biederman
2021-07-20  8:35               ` Vasily Averin
2021-07-20 14:37                 ` Shakeel Butt [this message]
2021-07-20 16:42                 ` Eric W. Biederman
2021-07-20 19:15             ` Shakeel Butt
2021-07-19 10:45           ` [PATCH v5 14/16] memcg: enable accounting for posix_timers_cache slab Vasily Averin
2021-07-19 10:45           ` [PATCH v5 15/16] memcg: enable accounting for tty-related objects Vasily Averin
2021-07-19 10:46           ` [PATCH v5 16/16] memcg: enable accounting for ldt_struct objects Vasily Averin
2021-04-28  6:51 ` [PATCH v4 01/16] memcg: enable accounting for net_device and Tx/Rx queues Vasily Averin
2021-04-28  6:51 ` [PATCH v4 02/16] memcg: enable accounting for IP address and routing-related objects Vasily Averin
2021-04-28  6:51 ` [PATCH v4 03/16] memcg: enable accounting for inet_bin_bucket cache Vasily Averin
2021-04-28  6:52 ` [PATCH v4 04/16] memcg: enable accounting for VLAN group array Vasily Averin
2021-04-28  6:52 ` [PATCH v4 05/16] memcg: ipv6/sit: account and don't WARN on ip_tunnel_prl structs allocation Vasily Averin
2021-04-28  6:52 ` [PATCH v4 06/16] memcg: enable accounting for scm_fp_list objects Vasily Averin
2021-04-28  6:52 ` [PATCH v4 07/16] memcg: enable accounting for new namesapces and struct nsproxy Vasily Averin
2021-05-07 13:45   ` Serge E. Hallyn
2021-05-07 15:03   ` Christian Brauner
2021-04-28  6:52 ` [PATCH v4 08/16] memcg: enable accounting of ipc resources Vasily Averin
2021-04-28  6:53 ` [PATCH v4 09/16] memcg: enable accounting for mnt_cache entries Vasily Averin
2021-04-28  6:53 ` [PATCH v4 10/16] memcg: enable accounting for pollfd and select bits arrays Vasily Averin
2021-04-28  6:53 ` [PATCH v4 11/16] memcg: enable accounting for signals Vasily Averin
2021-04-28  6:53 ` [PATCH v4 12/16] memcg: enable accounting for posix_timers_cache slab Vasily Averin
2021-05-07 15:48   ` Thomas Gleixner
2021-04-28  6:53 ` [PATCH v4 13/16] memcg: enable accounting for file lock caches Vasily Averin
2021-04-28  6:54 ` [PATCH v4 14/16] memcg: enable accounting for fasync_cache Vasily Averin
2021-04-28  6:54 ` [PATCH v4 15/16] memcg: enable accounting for tty-related objects Vasily Averin
2021-04-28  7:38   ` Greg Kroah-Hartman
2021-04-28  6:54 ` [PATCH v4 16/16] memcg: enable accounting for ldt_struct objects Vasily Averin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALvZod7durMR7vYQ-5Uw5S=4UFTE5ojb=Vqk5oTEmRageH8zOg@mail.gmail.com' \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=vvs@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).