containers.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linux Containers <containers@lists.linux-foundation.org>
Subject: Re: Per user rlimits
Date: Mon, 31 Aug 2020 08:35:05 -0500	[thread overview]
Message-ID: <87k0xfey5y.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20200831081207.b6kajp5jhcyelwnt@yavin.dot.cyphar.com> (Aleksa Sarai's message of "Mon, 31 Aug 2020 18:12:07 +1000")

Aleksa Sarai <cyphar@cyphar.com> writes:

> On 2020-08-31, Aleksa Sarai <asarai@suse.de> wrote:
>> On 2020-08-28, Sargun Dhillon <sargun@sargun.me> wrote:
>> > On Fri, Aug 28, 2020 at 12:29 PM Eric W. Biederman
>> > <ebiederm@xmission.com> wrote:
>> > > Just to scope how much work it would be to fix rlimits
>> > > so they are not a problem for user namespaces I took a quick
>> > > survey.
>> > >
>> > > The rlimits can be found in
>> > > include/uapi/asm-generic/resource.h
>> > >
>> > > There are a total of 16 rlimits.
>> > > There are only 4 rlimits that are enforced at anything other
>> > > than process granularity.
>> > >
>> > > RLIMIT_NPROC
>> > > RLIMIT_MEMLOCK
>> > > RLIMIT_SIGPENDING
>> > > RLIMIT_MSGQUEUE
>> > >
>> > > So it should not be difficult to fix those rlimits.
>> > 
>> > What are your proposed semantics for what the "fix" would look like? Or
>> > are you saying that once we take on Christian's proposal of 64-bit kuid
>> > they would be trivial to fix? I think the reason we didn't move forward with
>> > fixing it is the only real thing we could agree upon is an rlimit namespace,
>> 
>> From memory, we did briefly discuss how this would work in the call. I
>> believe the basic idea was that the host rlimit would act as a maximum
>> setting but there would be an optional lower limit that a user namespace
>> could set and would be accounted separately. That way containers
>> wouldn't interfere with each others' rlimit settings. I imagine this
>> would be nested with user namespaces and presumable means that rlimit
>> would now be attached to userns directly.
>> 
>> (But I might be misremembering the details of the proposal. I do
>> remember Eric mentioning that the "maximum namespaces" sysctl semantics
>> were a useful model to look at.)
>> 
>> > and then you get into a question of why do these even exist, and should
>> > they just be cgroup(v2) controllers, and should calling setrlimit just
>> > be a wrapper around a cgroup(v2) controller that has a map of
>> > uid -> limit?
>> 
>> To mirror what I said when this came up in the actual discussion, the
>> reason why we don't have cgroups for all of these things is that some of
>> those limits aren't "real resources" and arguably should all be managed
>> through kmemcg policies.
>> 
>> Right after getting the pids cgroup controller merged, I did mention
>> adding controllers for the other rlimits and Tejun said that they didn't
>> make sense to add ([1] is one of the responses I found through a quick
>> search). The only reason the pids controller was merged is that you
>> could still fork-bomb a system even with modest kmemcg limits.
>> 
>> [1]: https://lore.kernel.org/lkml/20150227114940.GB3964@htj.duckdns.org/
>
> [2] is a more explicit NACK from Tejun in that thread.
>
> [2]: https://lore.kernel.org/lkml/20150227170640.GK3964@htj.duckdns.org/

Right now in /proc/sys/user we have a number of other recursive limits.
So that is the infrastructure I am proposing that we reuse.

Assuming doing so does not cause real world performance issue.


Fundamentally we need to change rlimits if are going to have multiple uids
without root privileges as having multiple uids allows you to escape
rlimits.


I am very confused about what cgroups have become after they were
proposed.  Some of what cgroups are is good, some of what cgroups are is
very awkward.  What cgroups are not is a nice file system based API
on top of OpenVZ beancounters, that plays well with containers.


So for where it makes sense for rlimits and the like I am happy to
include these kinds of limits into user namespaces.  Mostly I am
expecting we aim this functionality at the case where one application
goes crazy that craziness is detected by hitting the limit.  By doing
that we can now things are bad without bringing down the entire system.


It is not the nuanced control that cgroups can provide when setup
properly but it is enough to keep things working, and it is effectively
what rlimits provide today.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

      reply	other threads:[~2020-08-31 13:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-28 19:25 Per user rlimits Eric W. Biederman
2020-08-28 20:33 ` Sargun Dhillon
2020-08-31  8:09   ` Aleksa Sarai
2020-08-31  8:12     ` Aleksa Sarai
2020-08-31 13:35       ` Eric W. Biederman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k0xfey5y.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=cyphar@cyphar.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).