From: Tejun Heo <tj@kernel.org> To: Christian Brauner <christian.brauner@ubuntu.com> Cc: "taoyi.ty" <escape@linux.alibaba.com>, Greg KH <gregkh@linuxfoundation.org>, lizefan.x@bytedance.com, hannes@cmpxchg.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-fsdevel@vger.kernel.org, shanpeic@linux.alibaba.com Subject: Re: [RFC PATCH 0/2] support cgroup pool in v1 Date: Mon, 13 Sep 2021 06:24:28 -1000 [thread overview] Message-ID: <YT97PAm6kaecvXLX@slm.duckdns.org> (raw) In-Reply-To: <20210913142059.qbypd4vfq6wdzqfw@wittgenstein> Hello, On Mon, Sep 13, 2021 at 04:20:59PM +0200, Christian Brauner wrote: > Afaict, there is currently now way to prevent the deletion of empty > cgroups, especially newly created ones. So for example, if I have a > cgroup manager that prunes the cgroup tree whenever they detect empty > cgroups they can delete cgroups that were pre-allocated. This is > something we have run into before. systemd doesn't mess with cgroups behind a delegation point. > A related problem is a crashed or killed container manager > (segfault, sigkill, etc.). It might not have had the chance to cleanup > cgroups it allocated for the container. If the container manager is > restarted it can't reuse the existing cgroup it found because it has no > way of guaranteeing whether in between the time it crashed and got > restarted another program has just created a cgroup with the same name. > We usually solve this by just creating another cgroup with an index > appended until we we find an unallocated one setting an arbitrary cut > off point until we require manual intervention by the user (e.g. 1000). > > Right now iirc, one can rmdir() an empty cgroup while someone still > holds a file descriptor open for it. This can lead to situation where a > cgroup got created but before moving into the cgroup (via clone3() or > write()) someone else has deleted it. What would already be helpful is > if one had a way to prevent the deletion of cgroups when someone still > has an open reference to it. This would allow a pool of cgroups to be > created that can't simply be deleted. The above are problems common for any entity managing cgroup hierarchy. Beyond the permission and delegation based access control, cgroup doesn't have a mechanism to grant exclusive managerial operations to a specific application. It's the userspace's responsibility to coordinate these operations like in most other kernel interfaces. Thanks. -- tejun
WARNING: multiple messages have this Message-ID (diff)
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> To: Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> Cc: "taoyi.ty" <escape-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>, Greg KH <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>, lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org, yzaikin-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, shanpeic-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org Subject: Re: [RFC PATCH 0/2] support cgroup pool in v1 Date: Mon, 13 Sep 2021 06:24:28 -1000 [thread overview] Message-ID: <YT97PAm6kaecvXLX@slm.duckdns.org> (raw) In-Reply-To: <20210913142059.qbypd4vfq6wdzqfw@wittgenstein> Hello, On Mon, Sep 13, 2021 at 04:20:59PM +0200, Christian Brauner wrote: > Afaict, there is currently now way to prevent the deletion of empty > cgroups, especially newly created ones. So for example, if I have a > cgroup manager that prunes the cgroup tree whenever they detect empty > cgroups they can delete cgroups that were pre-allocated. This is > something we have run into before. systemd doesn't mess with cgroups behind a delegation point. > A related problem is a crashed or killed container manager > (segfault, sigkill, etc.). It might not have had the chance to cleanup > cgroups it allocated for the container. If the container manager is > restarted it can't reuse the existing cgroup it found because it has no > way of guaranteeing whether in between the time it crashed and got > restarted another program has just created a cgroup with the same name. > We usually solve this by just creating another cgroup with an index > appended until we we find an unallocated one setting an arbitrary cut > off point until we require manual intervention by the user (e.g. 1000). > > Right now iirc, one can rmdir() an empty cgroup while someone still > holds a file descriptor open for it. This can lead to situation where a > cgroup got created but before moving into the cgroup (via clone3() or > write()) someone else has deleted it. What would already be helpful is > if one had a way to prevent the deletion of cgroups when someone still > has an open reference to it. This would allow a pool of cgroups to be > created that can't simply be deleted. The above are problems common for any entity managing cgroup hierarchy. Beyond the permission and delegation based access control, cgroup doesn't have a mechanism to grant exclusive managerial operations to a specific application. It's the userspace's responsibility to coordinate these operations like in most other kernel interfaces. Thanks. -- tejun
next prev parent reply other threads:[~2021-09-13 16:24 UTC|newest] Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-08 12:15 [RFC PATCH 0/2] support cgroup pool in v1 Yi Tao 2021-09-08 12:15 ` Yi Tao 2021-09-08 12:15 ` [RFC PATCH 1/2] add pinned flags for kernfs node Yi Tao 2021-09-08 12:15 ` [RFC PATCH 2/2] support cgroup pool in v1 Yi Tao 2021-09-08 12:15 ` Yi Tao 2021-09-08 12:35 ` Greg KH 2021-09-08 12:35 ` Greg KH [not found] ` <084930d2-057a-04a7-76d1-b2a7bd37deb0@linux.alibaba.com> 2021-09-09 13:27 ` Greg KH 2021-09-10 2:20 ` taoyi.ty 2021-09-10 2:15 ` taoyi.ty 2021-09-10 2:15 ` taoyi.ty 2021-09-10 6:01 ` Greg KH 2021-09-10 6:01 ` Greg KH 2021-09-08 15:30 ` kernel test robot 2021-09-08 16:52 ` kernel test robot 2021-09-08 17:39 ` kernel test robot 2021-09-08 17:39 ` [RFC PATCH] cgroup_pool_mutex can be static kernel test robot 2021-09-08 12:35 ` [RFC PATCH 1/2] add pinned flags for kernfs node Greg KH 2021-09-08 12:35 ` Greg KH 2021-09-10 2:14 ` taoyi.ty 2021-09-10 6:00 ` Greg KH 2021-09-10 6:00 ` Greg KH 2021-09-08 16:26 ` kernel test robot 2021-09-08 12:37 ` [RFC PATCH 0/2] support cgroup pool in v1 Greg KH 2021-09-10 2:11 ` taoyi.ty 2021-09-10 6:01 ` Greg KH 2021-09-10 6:01 ` Greg KH 2021-09-10 16:49 ` Tejun Heo 2021-09-10 16:49 ` Tejun Heo 2021-09-13 14:20 ` Christian Brauner 2021-09-13 14:20 ` Christian Brauner 2021-09-13 16:24 ` Tejun Heo [this message] 2021-09-13 16:24 ` Tejun Heo 2021-09-08 16:35 ` Tejun Heo 2021-09-10 2:12 ` taoyi.ty
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YT97PAm6kaecvXLX@slm.duckdns.org \ --to=tj@kernel.org \ --cc=cgroups@vger.kernel.org \ --cc=christian.brauner@ubuntu.com \ --cc=escape@linux.alibaba.com \ --cc=gregkh@linuxfoundation.org \ --cc=hannes@cmpxchg.org \ --cc=keescook@chromium.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lizefan.x@bytedance.com \ --cc=mcgrof@kernel.org \ --cc=shanpeic@linux.alibaba.com \ --cc=yzaikin@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.