From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sargun Dhillon Subject: Re: Killing cgroups Date: Mon, 19 Apr 2021 10:17:29 -0700 Message-ID: References: <20210419155607.gmwu376cj4nyagyj@wittgenstein> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tIxa5pKZhjkMDZD7tn48a2LRBGPPmQPQiyr/iVjlfqw=; b=0+KNldlgijf3Fpbv/CdK9CemVYKn3Fxwo6WkGaJvXNgi/98kFb1TewHRMDD8Uaukqm QW+8UbzPS/y1awQeK7hb2wci0DWtDhPwwLyu4P9hQOK2jW+zPCZaVLMKgCoF2/CBEAf+ uzc9tC+oqK9milB9dIrV5vGvxWNZCNilYGlDg= In-Reply-To: List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Shakeel Butt Cc: Christian Brauner , Tejun Heo , Zefan Li , Johannes Weiner , Cgroups On Mon, Apr 19, 2021 at 10:08 AM Shakeel Butt wrote: > > On Mon, Apr 19, 2021 at 8:56 AM Christian Brauner > wrote: > > > > Hey, > > > > It's not as dramatic as it sounds but I've been mulling a cgroup feature > > for some time now which I would like to get some input on. :) > > > > So in container-land assuming a conservative layout where we treat a > > container as a separate machine we tend to give each container a > > delegated cgroup. That has already been the case with cgroup v1 and now > > even more so with cgroup v2. > > > > So usually you will have a 1:1 mapping between container and cgroup. If > > the container in addition uses a separate pid namespace then killing a > > container becomes a simple kill -9 from an ancestor > > pid namespace. > > > > However, there are quite a few scenarios where one or two of those > > assumptions aren't true, i.e. there are containers that share the cgroup > > with other processes on purpose that are supposed to be bound to the > > lifetime of the container but are not in the same pidns of the > > container. Containers that are in a delegated cgroup but share the pid > > namespace with the host or other containers. > > > > This is just the container use-case. There are additional use-cases from > > systemd services for example. > > > > For such scenarios it would be helpful to have a way to kill/signal all > > processes in a given cgroup. > > > > It feels to me that conceptually this is somewhat similar to the freezer > > feature. Freezer is now nicely implemented in cgroup.freeze. I would > > think we could do something similar for the signal feature I'm thinking > > about. So we add a file cgroup.signal which can be opened with O_RDWR > > and can be used to send a signal to all processes in a given cgroup: > > and the descendant cgroups as well. > > > > > int fd = open("/sys/fs/cgroup/my/delegated/cgroup", O_RDWR); > > write(fd, "SIGKILL", sizeof("SIGKILL") - 1); > > The userspace oom-killers can also take advantage of this feature. This would be nice for the container runtimes that (currently) freeze, then kill all the pids, and unfreeze. Do you think that this could also be generalized to sigstop?