All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michal Koutný" <mkoutny@suse.com>
To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Waiman Long <longman@redhat.com>
Subject: [RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task
Date: Thu, 23 Jun 2022 14:49:44 +0200	[thread overview]
Message-ID: <20220623124944.2753-1-mkoutny@suse.com> (raw)

When we migrate a task between two cgroups, one of the checks is a
verification that we can modify task's scheduler settings
(cap_task_setscheduler()).

An implicit migration occurs also when enabling a controller on the
unified hierarchy (think of parent to child migration). The
aforementioned check may be problematic if the caller of the migration
(enabling a controller) has no permissions over migrated tasks.
For instance, user's cgroup that ends up running a process of a
different user. Although cgroup permissions are configured favorably,
the enablement fails due to the foreign process [1].

Change the behavior by relaxing the permissions check on the unified
hierarchy (or in v2 mode). This is in accordance with unified hierarchy
attachment behavior when permissions of the source to target cgroups are
decisive whereas the migrated task is opaque (for contrast, see more
restrictive check in __cgroup1_procs_write()).

[1] https://github.com/systemd/systemd/issues/18293#issuecomment-831205649

Reasons for RFC:

1) The unified hierarchy attachment behavior -- is that the
   right/consented model that migrated objects don't matter?

2) If 1) is true, have I missed any danger in allowing cpuset'ing a
   possibly privileged processes?

2.2) cpuset may be in v2 mode even on v1 hierarchy with different
   migration control rules (but checking migratee's creds in v1
   eliminates effect of this patch).

3) Alternative approach would be to allow cpuset migrations only when
   nothing effectively changes (which is the case for parent->child
   migration upon controller enablement).

4) This is just idea draft, not tested in the real case.

Thanks.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/cgroup/cpuset.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 71a418858a5e..dbe78577de5b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2232,7 +2232,9 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 
 	percpu_down_write(&cpuset_rwsem);
 
-	/* allow moving tasks into an empty cpuset if on default hierarchy */
+	/* allow moving tasks into an empty cpuset if on default hierarchy,
+	 * also bypass permission check (access is controlled via cgroup(s)
+	 * owner in cgroup_procs_write_permission()) */
 	ret = -ENOSPC;
 	if (!is_in_v2_mode() &&
 	    (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)))
@@ -2242,6 +2244,9 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 		ret = task_can_attach(task, cs->cpus_allowed);
 		if (ret)
 			goto out_unlock;
+
+		if (is_in_v2_mode())
+			continue;
 		ret = security_task_setscheduler(task);
 		if (ret)
 			goto out_unlock;
-- 
2.35.3


WARNING: multiple messages have this Message-ID (diff)
From: "Michal Koutný" <mkoutny-IBi9RG/b67k@public.gmane.org>
To: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Zefan Li <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: [RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task
Date: Thu, 23 Jun 2022 14:49:44 +0200	[thread overview]
Message-ID: <20220623124944.2753-1-mkoutny@suse.com> (raw)

When we migrate a task between two cgroups, one of the checks is a
verification that we can modify task's scheduler settings
(cap_task_setscheduler()).

An implicit migration occurs also when enabling a controller on the
unified hierarchy (think of parent to child migration). The
aforementioned check may be problematic if the caller of the migration
(enabling a controller) has no permissions over migrated tasks.
For instance, user's cgroup that ends up running a process of a
different user. Although cgroup permissions are configured favorably,
the enablement fails due to the foreign process [1].

Change the behavior by relaxing the permissions check on the unified
hierarchy (or in v2 mode). This is in accordance with unified hierarchy
attachment behavior when permissions of the source to target cgroups are
decisive whereas the migrated task is opaque (for contrast, see more
restrictive check in __cgroup1_procs_write()).

[1] https://github.com/systemd/systemd/issues/18293#issuecomment-831205649

Reasons for RFC:

1) The unified hierarchy attachment behavior -- is that the
   right/consented model that migrated objects don't matter?

2) If 1) is true, have I missed any danger in allowing cpuset'ing a
   possibly privileged processes?

2.2) cpuset may be in v2 mode even on v1 hierarchy with different
   migration control rules (but checking migratee's creds in v1
   eliminates effect of this patch).

3) Alternative approach would be to allow cpuset migrations only when
   nothing effectively changes (which is the case for parent->child
   migration upon controller enablement).

4) This is just idea draft, not tested in the real case.

Thanks.

Signed-off-by: Michal Koutn√Ω <mkoutny-IBi9RG/b67k@public.gmane.org>
---
 kernel/cgroup/cpuset.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 71a418858a5e..dbe78577de5b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2232,7 +2232,9 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 
 	percpu_down_write(&cpuset_rwsem);
 
-	/* allow moving tasks into an empty cpuset if on default hierarchy */
+	/* allow moving tasks into an empty cpuset if on default hierarchy,
+	 * also bypass permission check (access is controlled via cgroup(s)
+	 * owner in cgroup_procs_write_permission()) */
 	ret = -ENOSPC;
 	if (!is_in_v2_mode() &&
 	    (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)))
@@ -2242,6 +2244,9 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 		ret = task_can_attach(task, cs->cpus_allowed);
 		if (ret)
 			goto out_unlock;
+
+		if (is_in_v2_mode())
+			continue;
 		ret = security_task_setscheduler(task);
 		if (ret)
 			goto out_unlock;
-- 
2.35.3


             reply	other threads:[~2022-06-23 12:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-23 12:49 Michal Koutný [this message]
2022-06-23 12:49 ` [RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task Michal Koutný
2022-06-23 18:44 ` Waiman Long
2022-06-23 18:44   ` Waiman Long
2022-06-24 12:43   ` Michal Koutný
2022-06-24 12:43     ` Michal Koutný
2022-06-25  4:20 ` Tejun Heo
2022-06-25  4:20   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220623124944.2753-1-mkoutny@suse.com \
    --to=mkoutny@suse.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=longman@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.