All of lore.kernel.org
 help / color / mirror / Atom feed
* Killing cgroups
@ 2021-04-19 15:56 Christian Brauner
  2021-04-19 16:15 ` Roman Gushchin
  2021-04-19 17:08 ` Shakeel Butt
  0 siblings, 2 replies; 7+ messages in thread
From: Christian Brauner @ 2021-04-19 15:56 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

Hey,

It's not as dramatic as it sounds but I've been mulling a cgroup feature
for some time now which I would like to get some input on. :)

So in container-land assuming a conservative layout where we treat a
container as a separate machine we tend to give each container a
delegated cgroup. That has already been the case with cgroup v1 and now
even more so with cgroup v2.

So usually you will have a 1:1 mapping between container and cgroup. If
the container in addition uses a separate pid namespace then killing a
container becomes a simple kill -9 <container-init-pid> from an ancestor
pid namespace.

However, there are quite a few scenarios where one or two of those
assumptions aren't true, i.e. there are containers that share the cgroup
with other processes on purpose that are supposed to be bound to the
lifetime of the container but are not in the same pidns of the
container. Containers that are in a delegated cgroup but share the pid
namespace with the host or other containers.

This is just the container use-case. There are additional use-cases from
systemd services for example.

For such scenarios it would be helpful to have a way to kill/signal all
processes in a given cgroup.

It feels to me that conceptually this is somewhat similar to the freezer
feature. Freezer is now nicely implemented in cgroup.freeze. I would
think we could do something similar for the signal feature I'm thinking
about. So we add a file cgroup.signal which can be opened with O_RDWR
and can be used to send a signal to all processes in a given cgroup:

int fd = open("/sys/fs/cgroup/my/delegated/cgroup", O_RDWR);
write(fd, "SIGKILL", sizeof("SIGKILL") - 1);

with SIGKILL being the only signal supported for a start and we can in
the future extend this to more signals.

I'd like to hear your general thoughts about a feature like this or
similar to this before prototyping it.

Thanks!
Christian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Killing cgroups
  2021-04-19 15:56 Killing cgroups Christian Brauner
@ 2021-04-19 16:15 ` Roman Gushchin
       [not found]   ` <YH2slGErZ7s4t6DC-cx5fftMpWqeCjSd+JxjunQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
  2021-04-19 17:08 ` Shakeel Butt
  1 sibling, 1 reply; 7+ messages in thread
From: Roman Gushchin @ 2021-04-19 16:15 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On Mon, Apr 19, 2021 at 05:56:07PM +0200, Christian Brauner wrote:
> Hey,
> 
> It's not as dramatic as it sounds but I've been mulling a cgroup feature
> for some time now which I would like to get some input on. :)
> 
> So in container-land assuming a conservative layout where we treat a
> container as a separate machine we tend to give each container a
> delegated cgroup. That has already been the case with cgroup v1 and now
> even more so with cgroup v2.
> 
> So usually you will have a 1:1 mapping between container and cgroup. If
> the container in addition uses a separate pid namespace then killing a
> container becomes a simple kill -9 <container-init-pid> from an ancestor
> pid namespace.
> 
> However, there are quite a few scenarios where one or two of those
> assumptions aren't true, i.e. there are containers that share the cgroup
> with other processes on purpose that are supposed to be bound to the
> lifetime of the container but are not in the same pidns of the
> container. Containers that are in a delegated cgroup but share the pid
> namespace with the host or other containers.
> 
> This is just the container use-case. There are additional use-cases from
> systemd services for example.
> 
> For such scenarios it would be helpful to have a way to kill/signal all
> processes in a given cgroup.
> 
> It feels to me that conceptually this is somewhat similar to the freezer
> feature. Freezer is now nicely implemented in cgroup.freeze. I would
> think we could do something similar for the signal feature I'm thinking
> about. So we add a file cgroup.signal which can be opened with O_RDWR
> and can be used to send a signal to all processes in a given cgroup:
> 
> int fd = open("/sys/fs/cgroup/my/delegated/cgroup", O_RDWR);
> write(fd, "SIGKILL", sizeof("SIGKILL") - 1);
> 
> with SIGKILL being the only signal supported for a start and we can in
> the future extend this to more signals.
> 
> I'd like to hear your general thoughts about a feature like this or
> similar to this before prototyping it.

Hello Christian!

Tejun and me discussed a feature like this during my work on the freezer
controller, and we both thought it might be useful. But because there is
a relatively simple userspace way to do it (which is implemented many times),
and systemd and other similar control daemons will need to keep it in a
working state for a quite some time anyway (to work on older kernels),
it was considered a low-prio feature, and it was somewhere on my to-do list
since then.
I'm not sure we need anything beyond SIGKILL and _maybe_ SIGTERM.
Indeed it can be implemented re-using a lot from the freezer code.
Please, let me know if I can help.

Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Killing cgroups
  2021-04-19 15:56 Killing cgroups Christian Brauner
  2021-04-19 16:15 ` Roman Gushchin
@ 2021-04-19 17:08 ` Shakeel Butt
       [not found]   ` <CALvZod6haoRmgp++9sqvZaYCo+gaK6t5MSfSZ7XFpm4p6wACwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Shakeel Butt @ 2021-04-19 17:08 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Tejun Heo, Zefan Li, Johannes Weiner, Cgroups

On Mon, Apr 19, 2021 at 8:56 AM Christian Brauner
<christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
>
> Hey,
>
> It's not as dramatic as it sounds but I've been mulling a cgroup feature
> for some time now which I would like to get some input on. :)
>
> So in container-land assuming a conservative layout where we treat a
> container as a separate machine we tend to give each container a
> delegated cgroup. That has already been the case with cgroup v1 and now
> even more so with cgroup v2.
>
> So usually you will have a 1:1 mapping between container and cgroup. If
> the container in addition uses a separate pid namespace then killing a
> container becomes a simple kill -9 <container-init-pid> from an ancestor
> pid namespace.
>
> However, there are quite a few scenarios where one or two of those
> assumptions aren't true, i.e. there are containers that share the cgroup
> with other processes on purpose that are supposed to be bound to the
> lifetime of the container but are not in the same pidns of the
> container. Containers that are in a delegated cgroup but share the pid
> namespace with the host or other containers.
>
> This is just the container use-case. There are additional use-cases from
> systemd services for example.
>
> For such scenarios it would be helpful to have a way to kill/signal all
> processes in a given cgroup.
>
> It feels to me that conceptually this is somewhat similar to the freezer
> feature. Freezer is now nicely implemented in cgroup.freeze. I would
> think we could do something similar for the signal feature I'm thinking
> about. So we add a file cgroup.signal which can be opened with O_RDWR
> and can be used to send a signal to all processes in a given cgroup:

and the descendant cgroups as well.

>
> int fd = open("/sys/fs/cgroup/my/delegated/cgroup", O_RDWR);
> write(fd, "SIGKILL", sizeof("SIGKILL") - 1);

The userspace oom-killers can also take advantage of this feature.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Killing cgroups
       [not found]   ` <CALvZod6haoRmgp++9sqvZaYCo+gaK6t5MSfSZ7XFpm4p6wACwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2021-04-19 17:17     ` Sargun Dhillon
       [not found]       ` <CAMp4zn9_hgKOmamdzzBy5nzLr5pAXQBbuR1sjso-Wck0_3rEfA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2021-04-20 12:11     ` Christian Brauner
  1 sibling, 1 reply; 7+ messages in thread
From: Sargun Dhillon @ 2021-04-19 17:17 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Christian Brauner, Tejun Heo, Zefan Li, Johannes Weiner, Cgroups

On Mon, Apr 19, 2021 at 10:08 AM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> On Mon, Apr 19, 2021 at 8:56 AM Christian Brauner
> <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > Hey,
> >
> > It's not as dramatic as it sounds but I've been mulling a cgroup feature
> > for some time now which I would like to get some input on. :)
> >
> > So in container-land assuming a conservative layout where we treat a
> > container as a separate machine we tend to give each container a
> > delegated cgroup. That has already been the case with cgroup v1 and now
> > even more so with cgroup v2.
> >
> > So usually you will have a 1:1 mapping between container and cgroup. If
> > the container in addition uses a separate pid namespace then killing a
> > container becomes a simple kill -9 <container-init-pid> from an ancestor
> > pid namespace.
> >
> > However, there are quite a few scenarios where one or two of those
> > assumptions aren't true, i.e. there are containers that share the cgroup
> > with other processes on purpose that are supposed to be bound to the
> > lifetime of the container but are not in the same pidns of the
> > container. Containers that are in a delegated cgroup but share the pid
> > namespace with the host or other containers.
> >
> > This is just the container use-case. There are additional use-cases from
> > systemd services for example.
> >
> > For such scenarios it would be helpful to have a way to kill/signal all
> > processes in a given cgroup.
> >
> > It feels to me that conceptually this is somewhat similar to the freezer
> > feature. Freezer is now nicely implemented in cgroup.freeze. I would
> > think we could do something similar for the signal feature I'm thinking
> > about. So we add a file cgroup.signal which can be opened with O_RDWR
> > and can be used to send a signal to all processes in a given cgroup:
>
> and the descendant cgroups as well.
>
> >
> > int fd = open("/sys/fs/cgroup/my/delegated/cgroup", O_RDWR);
> > write(fd, "SIGKILL", sizeof("SIGKILL") - 1);
>
> The userspace oom-killers can also take advantage of this feature.

This would be nice for the container runtimes that (currently) freeze,
then kill all the pids, and unfreeze. Do you think that this could also
be generalized to sigstop?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Killing cgroups
       [not found]   ` <CALvZod6haoRmgp++9sqvZaYCo+gaK6t5MSfSZ7XFpm4p6wACwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2021-04-19 17:17     ` Sargun Dhillon
@ 2021-04-20 12:11     ` Christian Brauner
  1 sibling, 0 replies; 7+ messages in thread
From: Christian Brauner @ 2021-04-20 12:11 UTC (permalink / raw)
  To: Shakeel Butt; +Cc: Tejun Heo, Zefan Li, Johannes Weiner, Cgroups

On Mon, Apr 19, 2021 at 10:08:19AM -0700, Shakeel Butt wrote:
> On Mon, Apr 19, 2021 at 8:56 AM Christian Brauner
> <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > Hey,
> >
> > It's not as dramatic as it sounds but I've been mulling a cgroup feature
> > for some time now which I would like to get some input on. :)
> >
> > So in container-land assuming a conservative layout where we treat a
> > container as a separate machine we tend to give each container a
> > delegated cgroup. That has already been the case with cgroup v1 and now
> > even more so with cgroup v2.
> >
> > So usually you will have a 1:1 mapping between container and cgroup. If
> > the container in addition uses a separate pid namespace then killing a
> > container becomes a simple kill -9 <container-init-pid> from an ancestor
> > pid namespace.
> >
> > However, there are quite a few scenarios where one or two of those
> > assumptions aren't true, i.e. there are containers that share the cgroup
> > with other processes on purpose that are supposed to be bound to the
> > lifetime of the container but are not in the same pidns of the
> > container. Containers that are in a delegated cgroup but share the pid
> > namespace with the host or other containers.
> >
> > This is just the container use-case. There are additional use-cases from
> > systemd services for example.
> >
> > For such scenarios it would be helpful to have a way to kill/signal all
> > processes in a given cgroup.
> >
> > It feels to me that conceptually this is somewhat similar to the freezer
> > feature. Freezer is now nicely implemented in cgroup.freeze. I would
> > think we could do something similar for the signal feature I'm thinking
> > about. So we add a file cgroup.signal which can be opened with O_RDWR
> > and can be used to send a signal to all processes in a given cgroup:
> 
> and the descendant cgroups as well.

Yes, I think in line with the current design it would need to be
recursive by default. Which I think is fine. The case where we only want
to wipe all processes in a single cgroup might be ok to do manually.

> 
> >
> > int fd = open("/sys/fs/cgroup/my/delegated/cgroup", O_RDWR);
> > write(fd, "SIGKILL", sizeof("SIGKILL") - 1);
> 
> The userspace oom-killers can also take advantage of this feature.

Good to hear that there are more use-cases.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Killing cgroups
       [not found]       ` <CAMp4zn9_hgKOmamdzzBy5nzLr5pAXQBbuR1sjso-Wck0_3rEfA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2021-04-20 12:24         ` Christian Brauner
  0 siblings, 0 replies; 7+ messages in thread
From: Christian Brauner @ 2021-04-20 12:24 UTC (permalink / raw)
  To: Sargun Dhillon
  Cc: Shakeel Butt, Tejun Heo, Zefan Li, Johannes Weiner, Cgroups

On Mon, Apr 19, 2021 at 10:17:29AM -0700, Sargun Dhillon wrote:
> On Mon, Apr 19, 2021 at 10:08 AM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > On Mon, Apr 19, 2021 at 8:56 AM Christian Brauner
> > <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
> > >
> > > Hey,
> > >
> > > It's not as dramatic as it sounds but I've been mulling a cgroup feature
> > > for some time now which I would like to get some input on. :)
> > >
> > > So in container-land assuming a conservative layout where we treat a
> > > container as a separate machine we tend to give each container a
> > > delegated cgroup. That has already been the case with cgroup v1 and now
> > > even more so with cgroup v2.
> > >
> > > So usually you will have a 1:1 mapping between container and cgroup. If
> > > the container in addition uses a separate pid namespace then killing a
> > > container becomes a simple kill -9 <container-init-pid> from an ancestor
> > > pid namespace.
> > >
> > > However, there are quite a few scenarios where one or two of those
> > > assumptions aren't true, i.e. there are containers that share the cgroup
> > > with other processes on purpose that are supposed to be bound to the
> > > lifetime of the container but are not in the same pidns of the
> > > container. Containers that are in a delegated cgroup but share the pid
> > > namespace with the host or other containers.
> > >
> > > This is just the container use-case. There are additional use-cases from
> > > systemd services for example.
> > >
> > > For such scenarios it would be helpful to have a way to kill/signal all
> > > processes in a given cgroup.
> > >
> > > It feels to me that conceptually this is somewhat similar to the freezer
> > > feature. Freezer is now nicely implemented in cgroup.freeze. I would
> > > think we could do something similar for the signal feature I'm thinking
> > > about. So we add a file cgroup.signal which can be opened with O_RDWR
> > > and can be used to send a signal to all processes in a given cgroup:
> >
> > and the descendant cgroups as well.
> >
> > >
> > > int fd = open("/sys/fs/cgroup/my/delegated/cgroup", O_RDWR);
> > > write(fd, "SIGKILL", sizeof("SIGKILL") - 1);
> >
> > The userspace oom-killers can also take advantage of this feature.
> 
> This would be nice for the container runtimes that (currently) freeze,
> then kill all the pids, and unfreeze. Do you think that this could also
> be generalized to sigstop?

As long as we name it cgroup.signal we can technically expand to signals
other than SIGKILL and SIGTERM in the future. The SIG{TERM,KILL} signal
are the most relevant candidates for now.

Though I'm not clear yet what use-case would require us to support
SIGSTOP in this interface given that we have cgroup.freeze which seems
to be an improvement over SIGSTOP in many ways a few of which are
mentioned in the (legacy) freezer controller documentation.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Killing cgroups
       [not found]   ` <YH2slGErZ7s4t6DC-cx5fftMpWqeCjSd+JxjunQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
@ 2021-04-20 12:28     ` Christian Brauner
  0 siblings, 0 replies; 7+ messages in thread
From: Christian Brauner @ 2021-04-20 12:28 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On Mon, Apr 19, 2021 at 09:15:16AM -0700, Roman Gushchin wrote:
> On Mon, Apr 19, 2021 at 05:56:07PM +0200, Christian Brauner wrote:
> > Hey,
> > 
> > It's not as dramatic as it sounds but I've been mulling a cgroup feature
> > for some time now which I would like to get some input on. :)
> > 
> > So in container-land assuming a conservative layout where we treat a
> > container as a separate machine we tend to give each container a
> > delegated cgroup. That has already been the case with cgroup v1 and now
> > even more so with cgroup v2.
> > 
> > So usually you will have a 1:1 mapping between container and cgroup. If
> > the container in addition uses a separate pid namespace then killing a
> > container becomes a simple kill -9 <container-init-pid> from an ancestor
> > pid namespace.
> > 
> > However, there are quite a few scenarios where one or two of those
> > assumptions aren't true, i.e. there are containers that share the cgroup
> > with other processes on purpose that are supposed to be bound to the
> > lifetime of the container but are not in the same pidns of the
> > container. Containers that are in a delegated cgroup but share the pid
> > namespace with the host or other containers.
> > 
> > This is just the container use-case. There are additional use-cases from
> > systemd services for example.
> > 
> > For such scenarios it would be helpful to have a way to kill/signal all
> > processes in a given cgroup.
> > 
> > It feels to me that conceptually this is somewhat similar to the freezer
> > feature. Freezer is now nicely implemented in cgroup.freeze. I would
> > think we could do something similar for the signal feature I'm thinking
> > about. So we add a file cgroup.signal which can be opened with O_RDWR
> > and can be used to send a signal to all processes in a given cgroup:
> > 
> > int fd = open("/sys/fs/cgroup/my/delegated/cgroup", O_RDWR);
> > write(fd, "SIGKILL", sizeof("SIGKILL") - 1);
> > 
> > with SIGKILL being the only signal supported for a start and we can in
> > the future extend this to more signals.
> > 
> > I'd like to hear your general thoughts about a feature like this or
> > similar to this before prototyping it.
> 
> Hello Christian!

Hey Roman,

Thanks for your quick reply!

> 
> Tejun and me discussed a feature like this during my work on the freezer
> controller, and we both thought it might be useful. But because there is
> a relatively simple userspace way to do it (which is implemented many times),
> and systemd and other similar control daemons will need to keep it in a
> working state for a quite some time anyway (to work on older kernels),
> it was considered a low-prio feature, and it was somewhere on my to-do list
> since then.

Totally understandable. I take it though we agree that this interface
should exist as it seems really useful (especially for the recursive
case) and we had a few others point out that they could make use of it.

> I'm not sure we need anything beyond SIGKILL and _maybe_ SIGTERM.

Yeah, my feeling is SIGKILL and SIGTERM might be sufficient with SIGKILL
being the first target. I would think that having more generic name for
the file like cgroup.signal is better than cgroup.kill as I wouldn't be
so sure that we don't end up with a few more signals due to unforseen
use-cases in the future.

> Indeed it can be implemented re-using a lot from the freezer code.

Yeah, that was my feeling too.

> Please, let me know if I can help.

Yes, will do. I'll take a look at the implementation soon and start
working on a patch. I'm sure I'll have questions sooner or later. :)

Christian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-20 12:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-19 15:56 Killing cgroups Christian Brauner
2021-04-19 16:15 ` Roman Gushchin
     [not found]   ` <YH2slGErZ7s4t6DC-cx5fftMpWqeCjSd+JxjunQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2021-04-20 12:28     ` Christian Brauner
2021-04-19 17:08 ` Shakeel Butt
     [not found]   ` <CALvZod6haoRmgp++9sqvZaYCo+gaK6t5MSfSZ7XFpm4p6wACwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-19 17:17     ` Sargun Dhillon
     [not found]       ` <CAMp4zn9_hgKOmamdzzBy5nzLr5pAXQBbuR1sjso-Wck0_3rEfA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-20 12:24         ` Christian Brauner
2021-04-20 12:11     ` Christian Brauner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.