On 2020-08-31, Aleksa Sarai wrote: > On 2020-08-28, Sargun Dhillon wrote: > > On Fri, Aug 28, 2020 at 12:29 PM Eric W. Biederman > > wrote: > > > Just to scope how much work it would be to fix rlimits > > > so they are not a problem for user namespaces I took a quick > > > survey. > > > > > > The rlimits can be found in > > > include/uapi/asm-generic/resource.h > > > > > > There are a total of 16 rlimits. > > > There are only 4 rlimits that are enforced at anything other > > > than process granularity. > > > > > > RLIMIT_NPROC > > > RLIMIT_MEMLOCK > > > RLIMIT_SIGPENDING > > > RLIMIT_MSGQUEUE > > > > > > So it should not be difficult to fix those rlimits. > > > > What are your proposed semantics for what the "fix" would look like? Or > > are you saying that once we take on Christian's proposal of 64-bit kuid > > they would be trivial to fix? I think the reason we didn't move forward with > > fixing it is the only real thing we could agree upon is an rlimit namespace, > > From memory, we did briefly discuss how this would work in the call. I > believe the basic idea was that the host rlimit would act as a maximum > setting but there would be an optional lower limit that a user namespace > could set and would be accounted separately. That way containers > wouldn't interfere with each others' rlimit settings. I imagine this > would be nested with user namespaces and presumable means that rlimit > would now be attached to userns directly. > > (But I might be misremembering the details of the proposal. I do > remember Eric mentioning that the "maximum namespaces" sysctl semantics > were a useful model to look at.) > > > and then you get into a question of why do these even exist, and should > > they just be cgroup(v2) controllers, and should calling setrlimit just > > be a wrapper around a cgroup(v2) controller that has a map of > > uid -> limit? > > To mirror what I said when this came up in the actual discussion, the > reason why we don't have cgroups for all of these things is that some of > those limits aren't "real resources" and arguably should all be managed > through kmemcg policies. > > Right after getting the pids cgroup controller merged, I did mention > adding controllers for the other rlimits and Tejun said that they didn't > make sense to add ([1] is one of the responses I found through a quick > search). The only reason the pids controller was merged is that you > could still fork-bomb a system even with modest kmemcg limits. > > [1]: https://lore.kernel.org/lkml/20150227114940.GB3964@htj.duckdns.org/ [2] is a more explicit NACK from Tejun in that thread. [2]: https://lore.kernel.org/lkml/20150227170640.GK3964@htj.duckdns.org/ -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH