On 2020-08-28, Sargun Dhillon wrote: > On Fri, Aug 28, 2020 at 12:29 PM Eric W. Biederman > wrote: > > Just to scope how much work it would be to fix rlimits > > so they are not a problem for user namespaces I took a quick > > survey. > > > > The rlimits can be found in > > include/uapi/asm-generic/resource.h > > > > There are a total of 16 rlimits. > > There are only 4 rlimits that are enforced at anything other > > than process granularity. > > > > RLIMIT_NPROC > > RLIMIT_MEMLOCK > > RLIMIT_SIGPENDING > > RLIMIT_MSGQUEUE > > > > So it should not be difficult to fix those rlimits. > > What are your proposed semantics for what the "fix" would look like? Or > are you saying that once we take on Christian's proposal of 64-bit kuid > they would be trivial to fix? I think the reason we didn't move forward with > fixing it is the only real thing we could agree upon is an rlimit namespace, From memory, we did briefly discuss how this would work in the call. I believe the basic idea was that the host rlimit would act as a maximum setting but there would be an optional lower limit that a user namespace could set and would be accounted separately. That way containers wouldn't interfere with each others' rlimit settings. I imagine this would be nested with user namespaces and presumable means that rlimit would now be attached to userns directly. (But I might be misremembering the details of the proposal. I do remember Eric mentioning that the "maximum namespaces" sysctl semantics were a useful model to look at.) > and then you get into a question of why do these even exist, and should > they just be cgroup(v2) controllers, and should calling setrlimit just > be a wrapper around a cgroup(v2) controller that has a map of > uid -> limit? To mirror what I said when this came up in the actual discussion, the reason why we don't have cgroups for all of these things is that some of those limits aren't "real resources" and arguably should all be managed through kmemcg policies. Right after getting the pids cgroup controller merged, I did mention adding controllers for the other rlimits and Tejun said that they didn't make sense to add ([1] is one of the responses I found through a quick search). The only reason the pids controller was merged is that you could still fork-bomb a system even with modest kmemcg limits. [1]: https://lore.kernel.org/lkml/20150227114940.GB3964@htj.duckdns.org/ -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH