Re: [RFC PATCH 0/2] support cgroup pool in v1

From: Tejun Heo <tj@kernel.org>
To: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "taoyi.ty" <escape@linux.alibaba.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	lizefan.x@bytedance.com, hannes@cmpxchg.org, mcgrof@kernel.org,
	keescook@chromium.org, yzaikin@google.com,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, shanpeic@linux.alibaba.com
Subject: Re: [RFC PATCH 0/2] support cgroup pool in v1
Date: Mon, 13 Sep 2021 06:24:28 -1000	[thread overview]
Message-ID: <YT97PAm6kaecvXLX@slm.duckdns.org> (raw)
In-Reply-To: <20210913142059.qbypd4vfq6wdzqfw@wittgenstein>

Hello,

On Mon, Sep 13, 2021 at 04:20:59PM +0200, Christian Brauner wrote:
> Afaict, there is currently now way to prevent the deletion of empty
> cgroups, especially newly created ones. So for example, if I have a
> cgroup manager that prunes the cgroup tree whenever they detect empty
> cgroups they can delete cgroups that were pre-allocated. This is
> something we have run into before.

systemd doesn't mess with cgroups behind a delegation point.

> A related problem is a crashed or killed container manager 
> (segfault, sigkill, etc.). It might not have had the chance to cleanup
> cgroups it allocated for the container. If the container manager is
> restarted it can't reuse the existing cgroup it found because it has no
> way of guaranteeing whether in between the time it crashed and got
> restarted another program has just created a cgroup with the same name.
> We usually solve this by just creating another cgroup with an index
> appended until we we find an unallocated one setting an arbitrary cut
> off point until we require manual intervention by the user (e.g. 1000).
> 
> Right now iirc, one can rmdir() an empty cgroup while someone still
> holds a file descriptor open for it. This can lead to situation where a
> cgroup got created but before moving into the cgroup (via clone3() or
> write()) someone else has deleted it. What would already be helpful is
> if one had a way to prevent the deletion of cgroups when someone still
> has an open reference to it. This would allow a pool of cgroups to be
> created that can't simply be deleted.

The above are problems common for any entity managing cgroup hierarchy.
Beyond the permission and delegation based access control, cgroup doesn't
have a mechanism to grant exclusive managerial operations to a specific
application. It's the userspace's responsibility to coordinate these
operations like in most other kernel interfaces.

Thanks.

-- 
tejun