All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "taoyi.ty" <escape@linux.alibaba.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	lizefan.x@bytedance.com, hannes@cmpxchg.org, mcgrof@kernel.org,
	keescook@chromium.org, yzaikin@google.com,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, shanpeic@linux.alibaba.com
Subject: Re: [RFC PATCH 0/2] support cgroup pool in v1
Date: Mon, 13 Sep 2021 06:24:28 -1000	[thread overview]
Message-ID: <YT97PAm6kaecvXLX@slm.duckdns.org> (raw)
In-Reply-To: <20210913142059.qbypd4vfq6wdzqfw@wittgenstein>

Hello,

On Mon, Sep 13, 2021 at 04:20:59PM +0200, Christian Brauner wrote:
> Afaict, there is currently now way to prevent the deletion of empty
> cgroups, especially newly created ones. So for example, if I have a
> cgroup manager that prunes the cgroup tree whenever they detect empty
> cgroups they can delete cgroups that were pre-allocated. This is
> something we have run into before.

systemd doesn't mess with cgroups behind a delegation point.

> A related problem is a crashed or killed container manager 
> (segfault, sigkill, etc.). It might not have had the chance to cleanup
> cgroups it allocated for the container. If the container manager is
> restarted it can't reuse the existing cgroup it found because it has no
> way of guaranteeing whether in between the time it crashed and got
> restarted another program has just created a cgroup with the same name.
> We usually solve this by just creating another cgroup with an index
> appended until we we find an unallocated one setting an arbitrary cut
> off point until we require manual intervention by the user (e.g. 1000).
> 
> Right now iirc, one can rmdir() an empty cgroup while someone still
> holds a file descriptor open for it. This can lead to situation where a
> cgroup got created but before moving into the cgroup (via clone3() or
> write()) someone else has deleted it. What would already be helpful is
> if one had a way to prevent the deletion of cgroups when someone still
> has an open reference to it. This would allow a pool of cgroups to be
> created that can't simply be deleted.

The above are problems common for any entity managing cgroup hierarchy.
Beyond the permission and delegation based access control, cgroup doesn't
have a mechanism to grant exclusive managerial operations to a specific
application. It's the userspace's responsibility to coordinate these
operations like in most other kernel interfaces.

Thanks.

-- 
tejun

WARNING: multiple messages have this Message-ID (diff)
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Christian Brauner
	<christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
Cc: "taoyi.ty"
	<escape-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>,
	Greg KH
	<gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>,
	lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org,
	hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
	mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org,
	yzaikin-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	shanpeic-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org
Subject: Re: [RFC PATCH 0/2] support cgroup pool in v1
Date: Mon, 13 Sep 2021 06:24:28 -1000	[thread overview]
Message-ID: <YT97PAm6kaecvXLX@slm.duckdns.org> (raw)
In-Reply-To: <20210913142059.qbypd4vfq6wdzqfw@wittgenstein>

Hello,

On Mon, Sep 13, 2021 at 04:20:59PM +0200, Christian Brauner wrote:
> Afaict, there is currently now way to prevent the deletion of empty
> cgroups, especially newly created ones. So for example, if I have a
> cgroup manager that prunes the cgroup tree whenever they detect empty
> cgroups they can delete cgroups that were pre-allocated. This is
> something we have run into before.

systemd doesn't mess with cgroups behind a delegation point.

> A related problem is a crashed or killed container manager 
> (segfault, sigkill, etc.). It might not have had the chance to cleanup
> cgroups it allocated for the container. If the container manager is
> restarted it can't reuse the existing cgroup it found because it has no
> way of guaranteeing whether in between the time it crashed and got
> restarted another program has just created a cgroup with the same name.
> We usually solve this by just creating another cgroup with an index
> appended until we we find an unallocated one setting an arbitrary cut
> off point until we require manual intervention by the user (e.g. 1000).
> 
> Right now iirc, one can rmdir() an empty cgroup while someone still
> holds a file descriptor open for it. This can lead to situation where a
> cgroup got created but before moving into the cgroup (via clone3() or
> write()) someone else has deleted it. What would already be helpful is
> if one had a way to prevent the deletion of cgroups when someone still
> has an open reference to it. This would allow a pool of cgroups to be
> created that can't simply be deleted.

The above are problems common for any entity managing cgroup hierarchy.
Beyond the permission and delegation based access control, cgroup doesn't
have a mechanism to grant exclusive managerial operations to a specific
application. It's the userspace's responsibility to coordinate these
operations like in most other kernel interfaces.

Thanks.

-- 
tejun

  reply	other threads:[~2021-09-13 16:24 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-08 12:15 [RFC PATCH 0/2] support cgroup pool in v1 Yi Tao
2021-09-08 12:15 ` Yi Tao
2021-09-08 12:15 ` [RFC PATCH 1/2] add pinned flags for kernfs node Yi Tao
2021-09-08 12:15   ` [RFC PATCH 2/2] support cgroup pool in v1 Yi Tao
2021-09-08 12:15     ` Yi Tao
2021-09-08 12:35     ` Greg KH
2021-09-08 12:35       ` Greg KH
     [not found]       ` <084930d2-057a-04a7-76d1-b2a7bd37deb0@linux.alibaba.com>
2021-09-09 13:27         ` Greg KH
2021-09-10  2:20           ` taoyi.ty
2021-09-10  2:15       ` taoyi.ty
2021-09-10  2:15         ` taoyi.ty
2021-09-10  6:01         ` Greg KH
2021-09-10  6:01           ` Greg KH
2021-09-08 15:30     ` kernel test robot
2021-09-08 16:52     ` kernel test robot
2021-09-08 17:39     ` kernel test robot
2021-09-08 17:39     ` [RFC PATCH] cgroup_pool_mutex can be static kernel test robot
2021-09-08 12:35   ` [RFC PATCH 1/2] add pinned flags for kernfs node Greg KH
2021-09-08 12:35     ` Greg KH
2021-09-10  2:14     ` taoyi.ty
2021-09-10  6:00       ` Greg KH
2021-09-10  6:00         ` Greg KH
2021-09-08 16:26   ` kernel test robot
2021-09-08 12:37 ` [RFC PATCH 0/2] support cgroup pool in v1 Greg KH
2021-09-10  2:11   ` taoyi.ty
2021-09-10  6:01     ` Greg KH
2021-09-10  6:01       ` Greg KH
2021-09-10 16:49     ` Tejun Heo
2021-09-10 16:49       ` Tejun Heo
2021-09-13 14:20       ` Christian Brauner
2021-09-13 14:20         ` Christian Brauner
2021-09-13 16:24         ` Tejun Heo [this message]
2021-09-13 16:24           ` Tejun Heo
2021-09-08 16:35 ` Tejun Heo
2021-09-10  2:12   ` taoyi.ty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YT97PAm6kaecvXLX@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=christian.brauner@ubuntu.com \
    --cc=escape@linux.alibaba.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mcgrof@kernel.org \
    --cc=shanpeic@linux.alibaba.com \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.