Re: Per user rlimits

From: ebiederm@xmission.com (Eric W. Biederman)
To: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linux Containers <containers@lists.linux-foundation.org>
Subject: Re: Per user rlimits
Date: Mon, 31 Aug 2020 08:35:05 -0500	[thread overview]
Message-ID: <87k0xfey5y.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20200831081207.b6kajp5jhcyelwnt@yavin.dot.cyphar.com> (Aleksa Sarai's message of "Mon, 31 Aug 2020 18:12:07 +1000")

Aleksa Sarai <cyphar@cyphar.com> writes:

> On 2020-08-31, Aleksa Sarai <asarai@suse.de> wrote:
>> On 2020-08-28, Sargun Dhillon <sargun@sargun.me> wrote:
>> > On Fri, Aug 28, 2020 at 12:29 PM Eric W. Biederman
>> > <ebiederm@xmission.com> wrote:
>> > > Just to scope how much work it would be to fix rlimits
>> > > so they are not a problem for user namespaces I took a quick
>> > > survey.
>> > >
>> > > The rlimits can be found in
>> > > include/uapi/asm-generic/resource.h
>> > >
>> > > There are a total of 16 rlimits.
>> > > There are only 4 rlimits that are enforced at anything other
>> > > than process granularity.
>> > >
>> > > RLIMIT_NPROC
>> > > RLIMIT_MEMLOCK
>> > > RLIMIT_SIGPENDING
>> > > RLIMIT_MSGQUEUE
>> > >
>> > > So it should not be difficult to fix those rlimits.
>> > 
>> > What are your proposed semantics for what the "fix" would look like? Or
>> > are you saying that once we take on Christian's proposal of 64-bit kuid
>> > they would be trivial to fix? I think the reason we didn't move forward with
>> > fixing it is the only real thing we could agree upon is an rlimit namespace,
>> 
>> From memory, we did briefly discuss how this would work in the call. I
>> believe the basic idea was that the host rlimit would act as a maximum
>> setting but there would be an optional lower limit that a user namespace
>> could set and would be accounted separately. That way containers
>> wouldn't interfere with each others' rlimit settings. I imagine this
>> would be nested with user namespaces and presumable means that rlimit
>> would now be attached to userns directly.
>> 
>> (But I might be misremembering the details of the proposal. I do
>> remember Eric mentioning that the "maximum namespaces" sysctl semantics
>> were a useful model to look at.)
>> 
>> > and then you get into a question of why do these even exist, and should
>> > they just be cgroup(v2) controllers, and should calling setrlimit just
>> > be a wrapper around a cgroup(v2) controller that has a map of
>> > uid -> limit?
>> 
>> To mirror what I said when this came up in the actual discussion, the
>> reason why we don't have cgroups for all of these things is that some of
>> those limits aren't "real resources" and arguably should all be managed
>> through kmemcg policies.
>> 
>> Right after getting the pids cgroup controller merged, I did mention
>> adding controllers for the other rlimits and Tejun said that they didn't
>> make sense to add ([1] is one of the responses I found through a quick
>> search). The only reason the pids controller was merged is that you
>> could still fork-bomb a system even with modest kmemcg limits.
>> 
>> [1]: https://lore.kernel.org/lkml/20150227114940.GB3964@htj.duckdns.org/
>
> [2] is a more explicit NACK from Tejun in that thread.
>
> [2]: https://lore.kernel.org/lkml/20150227170640.GK3964@htj.duckdns.org/

Right now in /proc/sys/user we have a number of other recursive limits.
So that is the infrastructure I am proposing that we reuse.

Assuming doing so does not cause real world performance issue.

Fundamentally we need to change rlimits if are going to have multiple uids
without root privileges as having multiple uids allows you to escape
rlimits.

I am very confused about what cgroups have become after they were
proposed.  Some of what cgroups are is good, some of what cgroups are is
very awkward.  What cgroups are not is a nice file system based API
on top of OpenVZ beancounters, that plays well with containers.

So for where it makes sense for rlimits and the like I am happy to
include these kinds of limits into user namespaces.  Mostly I am
expecting we aim this functionality at the case where one application
goes crazy that craziness is detected by hitting the limit.  By doing
that we can now things are bad without bringing down the entire system.

It is not the nuanced control that cgroups can provide when setup
properly but it is enough to keep things working, and it is effectively
what rlimits provide today.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers