containers.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Per user rlimits
@ 2020-08-28 19:25 Eric W. Biederman
  2020-08-28 20:33 ` Sargun Dhillon
  0 siblings, 1 reply; 5+ messages in thread
From: Eric W. Biederman @ 2020-08-28 19:25 UTC (permalink / raw)
  To: Linux Containers


Just to scope how much work it would be to fix rlimits
so they are not a problem for user namespaces I took a quick
survey.

The rlimits can be found in
include/uapi/asm-generic/resource.h

There are a total of 16 rlimits.
There are only 4 rlimits that are enforced at anything other
than process granularity.

RLIMIT_NPROC
RLIMIT_MEMLOCK 
RLIMIT_SIGPENDING
RLIMIT_MSGQUEUE

So it should not be difficult to fix those rlimits.

I think the implementation of RLIMIT_MEMLOCK is highly suspect, and
might be worth reexamining, as RLMIT_MEMLOCK it interpreted differently
in different contexts.  For the limit there is mm->locked_vm,
user->lock_vm, and user->locked_shm.

Eric



_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Per user rlimits
  2020-08-28 19:25 Per user rlimits Eric W. Biederman
@ 2020-08-28 20:33 ` Sargun Dhillon
  2020-08-31  8:09   ` Aleksa Sarai
  0 siblings, 1 reply; 5+ messages in thread
From: Sargun Dhillon @ 2020-08-28 20:33 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux Containers

On Fri, Aug 28, 2020 at 12:29 PM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
>
> Just to scope how much work it would be to fix rlimits
> so they are not a problem for user namespaces I took a quick
> survey.
>
> The rlimits can be found in
> include/uapi/asm-generic/resource.h
>
> There are a total of 16 rlimits.
> There are only 4 rlimits that are enforced at anything other
> than process granularity.
>
> RLIMIT_NPROC
> RLIMIT_MEMLOCK
> RLIMIT_SIGPENDING
> RLIMIT_MSGQUEUE
>
> So it should not be difficult to fix those rlimits.
What are your proposed semantics for what the "fix" would look like? Or
are you saying that once we take on Christian's proposal of 64-bit kuid
they would be trivial to fix? I think the reason we didn't move forward with
fixing it is the only real thing we could agree upon is an rlimit namespace,
and then you get into a question of why do these even exist, and should
they just be cgroup(v2) controllers, and should calling setrlimit just
be a wrapper around a cgroup(v2) controller that has a map of
uid -> limit?

>
> I think the implementation of RLIMIT_MEMLOCK is highly suspect, and
> might be worth reexamining, as RLMIT_MEMLOCK it interpreted differently
> in different contexts.  For the limit there is mm->locked_vm,
> user->lock_vm, and user->locked_shm.
>
> Eric
>
>
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Per user rlimits
  2020-08-28 20:33 ` Sargun Dhillon
@ 2020-08-31  8:09   ` Aleksa Sarai
  2020-08-31  8:12     ` Aleksa Sarai
  0 siblings, 1 reply; 5+ messages in thread
From: Aleksa Sarai @ 2020-08-31  8:09 UTC (permalink / raw)
  To: Sargun Dhillon; +Cc: Linux Containers, Eric W. Biederman


[-- Attachment #1.1: Type: text/plain, Size: 2621 bytes --]

On 2020-08-28, Sargun Dhillon <sargun@sargun.me> wrote:
> On Fri, Aug 28, 2020 at 12:29 PM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
> > Just to scope how much work it would be to fix rlimits
> > so they are not a problem for user namespaces I took a quick
> > survey.
> >
> > The rlimits can be found in
> > include/uapi/asm-generic/resource.h
> >
> > There are a total of 16 rlimits.
> > There are only 4 rlimits that are enforced at anything other
> > than process granularity.
> >
> > RLIMIT_NPROC
> > RLIMIT_MEMLOCK
> > RLIMIT_SIGPENDING
> > RLIMIT_MSGQUEUE
> >
> > So it should not be difficult to fix those rlimits.
> 
> What are your proposed semantics for what the "fix" would look like? Or
> are you saying that once we take on Christian's proposal of 64-bit kuid
> they would be trivial to fix? I think the reason we didn't move forward with
> fixing it is the only real thing we could agree upon is an rlimit namespace,

From memory, we did briefly discuss how this would work in the call. I
believe the basic idea was that the host rlimit would act as a maximum
setting but there would be an optional lower limit that a user namespace
could set and would be accounted separately. That way containers
wouldn't interfere with each others' rlimit settings. I imagine this
would be nested with user namespaces and presumable means that rlimit
would now be attached to userns directly.

(But I might be misremembering the details of the proposal. I do
remember Eric mentioning that the "maximum namespaces" sysctl semantics
were a useful model to look at.)

> and then you get into a question of why do these even exist, and should
> they just be cgroup(v2) controllers, and should calling setrlimit just
> be a wrapper around a cgroup(v2) controller that has a map of
> uid -> limit?

To mirror what I said when this came up in the actual discussion, the
reason why we don't have cgroups for all of these things is that some of
those limits aren't "real resources" and arguably should all be managed
through kmemcg policies.

Right after getting the pids cgroup controller merged, I did mention
adding controllers for the other rlimits and Tejun said that they didn't
make sense to add ([1] is one of the responses I found through a quick
search). The only reason the pids controller was merged is that you
could still fork-bomb a system even with modest kmemcg limits.

[1]: https://lore.kernel.org/lkml/20150227114940.GB3964@htj.duckdns.org/

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 171 bytes --]

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Per user rlimits
  2020-08-31  8:09   ` Aleksa Sarai
@ 2020-08-31  8:12     ` Aleksa Sarai
  2020-08-31 13:35       ` Eric W. Biederman
  0 siblings, 1 reply; 5+ messages in thread
From: Aleksa Sarai @ 2020-08-31  8:12 UTC (permalink / raw)
  To: Aleksa Sarai; +Cc: Linux Containers, Eric W. Biederman


[-- Attachment #1.1: Type: text/plain, Size: 2918 bytes --]

On 2020-08-31, Aleksa Sarai <asarai@suse.de> wrote:
> On 2020-08-28, Sargun Dhillon <sargun@sargun.me> wrote:
> > On Fri, Aug 28, 2020 at 12:29 PM Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
> > > Just to scope how much work it would be to fix rlimits
> > > so they are not a problem for user namespaces I took a quick
> > > survey.
> > >
> > > The rlimits can be found in
> > > include/uapi/asm-generic/resource.h
> > >
> > > There are a total of 16 rlimits.
> > > There are only 4 rlimits that are enforced at anything other
> > > than process granularity.
> > >
> > > RLIMIT_NPROC
> > > RLIMIT_MEMLOCK
> > > RLIMIT_SIGPENDING
> > > RLIMIT_MSGQUEUE
> > >
> > > So it should not be difficult to fix those rlimits.
> > 
> > What are your proposed semantics for what the "fix" would look like? Or
> > are you saying that once we take on Christian's proposal of 64-bit kuid
> > they would be trivial to fix? I think the reason we didn't move forward with
> > fixing it is the only real thing we could agree upon is an rlimit namespace,
> 
> From memory, we did briefly discuss how this would work in the call. I
> believe the basic idea was that the host rlimit would act as a maximum
> setting but there would be an optional lower limit that a user namespace
> could set and would be accounted separately. That way containers
> wouldn't interfere with each others' rlimit settings. I imagine this
> would be nested with user namespaces and presumable means that rlimit
> would now be attached to userns directly.
> 
> (But I might be misremembering the details of the proposal. I do
> remember Eric mentioning that the "maximum namespaces" sysctl semantics
> were a useful model to look at.)
> 
> > and then you get into a question of why do these even exist, and should
> > they just be cgroup(v2) controllers, and should calling setrlimit just
> > be a wrapper around a cgroup(v2) controller that has a map of
> > uid -> limit?
> 
> To mirror what I said when this came up in the actual discussion, the
> reason why we don't have cgroups for all of these things is that some of
> those limits aren't "real resources" and arguably should all be managed
> through kmemcg policies.
> 
> Right after getting the pids cgroup controller merged, I did mention
> adding controllers for the other rlimits and Tejun said that they didn't
> make sense to add ([1] is one of the responses I found through a quick
> search). The only reason the pids controller was merged is that you
> could still fork-bomb a system even with modest kmemcg limits.
> 
> [1]: https://lore.kernel.org/lkml/20150227114940.GB3964@htj.duckdns.org/

[2] is a more explicit NACK from Tejun in that thread.

[2]: https://lore.kernel.org/lkml/20150227170640.GK3964@htj.duckdns.org/

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 171 bytes --]

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Per user rlimits
  2020-08-31  8:12     ` Aleksa Sarai
@ 2020-08-31 13:35       ` Eric W. Biederman
  0 siblings, 0 replies; 5+ messages in thread
From: Eric W. Biederman @ 2020-08-31 13:35 UTC (permalink / raw)
  To: Aleksa Sarai; +Cc: Linux Containers

Aleksa Sarai <cyphar@cyphar.com> writes:

> On 2020-08-31, Aleksa Sarai <asarai@suse.de> wrote:
>> On 2020-08-28, Sargun Dhillon <sargun@sargun.me> wrote:
>> > On Fri, Aug 28, 2020 at 12:29 PM Eric W. Biederman
>> > <ebiederm@xmission.com> wrote:
>> > > Just to scope how much work it would be to fix rlimits
>> > > so they are not a problem for user namespaces I took a quick
>> > > survey.
>> > >
>> > > The rlimits can be found in
>> > > include/uapi/asm-generic/resource.h
>> > >
>> > > There are a total of 16 rlimits.
>> > > There are only 4 rlimits that are enforced at anything other
>> > > than process granularity.
>> > >
>> > > RLIMIT_NPROC
>> > > RLIMIT_MEMLOCK
>> > > RLIMIT_SIGPENDING
>> > > RLIMIT_MSGQUEUE
>> > >
>> > > So it should not be difficult to fix those rlimits.
>> > 
>> > What are your proposed semantics for what the "fix" would look like? Or
>> > are you saying that once we take on Christian's proposal of 64-bit kuid
>> > they would be trivial to fix? I think the reason we didn't move forward with
>> > fixing it is the only real thing we could agree upon is an rlimit namespace,
>> 
>> From memory, we did briefly discuss how this would work in the call. I
>> believe the basic idea was that the host rlimit would act as a maximum
>> setting but there would be an optional lower limit that a user namespace
>> could set and would be accounted separately. That way containers
>> wouldn't interfere with each others' rlimit settings. I imagine this
>> would be nested with user namespaces and presumable means that rlimit
>> would now be attached to userns directly.
>> 
>> (But I might be misremembering the details of the proposal. I do
>> remember Eric mentioning that the "maximum namespaces" sysctl semantics
>> were a useful model to look at.)
>> 
>> > and then you get into a question of why do these even exist, and should
>> > they just be cgroup(v2) controllers, and should calling setrlimit just
>> > be a wrapper around a cgroup(v2) controller that has a map of
>> > uid -> limit?
>> 
>> To mirror what I said when this came up in the actual discussion, the
>> reason why we don't have cgroups for all of these things is that some of
>> those limits aren't "real resources" and arguably should all be managed
>> through kmemcg policies.
>> 
>> Right after getting the pids cgroup controller merged, I did mention
>> adding controllers for the other rlimits and Tejun said that they didn't
>> make sense to add ([1] is one of the responses I found through a quick
>> search). The only reason the pids controller was merged is that you
>> could still fork-bomb a system even with modest kmemcg limits.
>> 
>> [1]: https://lore.kernel.org/lkml/20150227114940.GB3964@htj.duckdns.org/
>
> [2] is a more explicit NACK from Tejun in that thread.
>
> [2]: https://lore.kernel.org/lkml/20150227170640.GK3964@htj.duckdns.org/

Right now in /proc/sys/user we have a number of other recursive limits.
So that is the infrastructure I am proposing that we reuse.

Assuming doing so does not cause real world performance issue.


Fundamentally we need to change rlimits if are going to have multiple uids
without root privileges as having multiple uids allows you to escape
rlimits.


I am very confused about what cgroups have become after they were
proposed.  Some of what cgroups are is good, some of what cgroups are is
very awkward.  What cgroups are not is a nice file system based API
on top of OpenVZ beancounters, that plays well with containers.


So for where it makes sense for rlimits and the like I am happy to
include these kinds of limits into user namespaces.  Mostly I am
expecting we aim this functionality at the case where one application
goes crazy that craziness is detected by hitting the limit.  By doing
that we can now things are bad without bringing down the entire system.


It is not the nuanced control that cgroups can provide when setup
properly but it is enough to keep things working, and it is effectively
what rlimits provide today.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-31 13:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-28 19:25 Per user rlimits Eric W. Biederman
2020-08-28 20:33 ` Sargun Dhillon
2020-08-31  8:09   ` Aleksa Sarai
2020-08-31  8:12     ` Aleksa Sarai
2020-08-31 13:35       ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).