[RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task
@ 2022-06-23 12:49 ` Michal Koutný
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Koutný @ 2022-06-23 12:49 UTC (permalink / raw)
  To: cgroups, linux-kernel; +Cc: Tejun Heo, Zefan Li, Johannes Weiner, Waiman Long

When we migrate a task between two cgroups, one of the checks is a
verification that we can modify task's scheduler settings
(cap_task_setscheduler()).

An implicit migration occurs also when enabling a controller on the
unified hierarchy (think of parent to child migration). The
aforementioned check may be problematic if the caller of the migration
(enabling a controller) has no permissions over migrated tasks.
For instance, user's cgroup that ends up running a process of a
different user. Although cgroup permissions are configured favorably,
the enablement fails due to the foreign process [1].

Change the behavior by relaxing the permissions check on the unified
hierarchy (or in v2 mode). This is in accordance with unified hierarchy
attachment behavior when permissions of the source to target cgroups are
decisive whereas the migrated task is opaque (for contrast, see more
restrictive check in __cgroup1_procs_write()).

[1] https://github.com/systemd/systemd/issues/18293#issuecomment-831205649

Reasons for RFC:

1) The unified hierarchy attachment behavior -- is that the
   right/consented model that migrated objects don't matter?

2) If 1) is true, have I missed any danger in allowing cpuset'ing a
   possibly privileged processes?

2.2) cpuset may be in v2 mode even on v1 hierarchy with different
   migration control rules (but checking migratee's creds in v1
   eliminates effect of this patch).

3) Alternative approach would be to allow cpuset migrations only when
   nothing effectively changes (which is the case for parent->child
   migration upon controller enablement).

4) This is just idea draft, not tested in the real case.

Thanks.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/cgroup/cpuset.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 71a418858a5e..dbe78577de5b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2232,7 +2232,9 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)

 	percpu_down_write(&cpuset_rwsem);

-	/* allow moving tasks into an empty cpuset if on default hierarchy */
+	/* allow moving tasks into an empty cpuset if on default hierarchy,
+	 * also bypass permission check (access is controlled via cgroup(s)
+	 * owner in cgroup_procs_write_permission()) */
 	ret = -ENOSPC;
 	if (!is_in_v2_mode() &&
 	    (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)))
@@ -2242,6 +2244,9 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 		ret = task_can_attach(task, cs->cpus_allowed);
 		if (ret)
 			goto out_unlock;
+
+		if (is_in_v2_mode())
+			continue;
 		ret = security_task_setscheduler(task);
 		if (ret)
 			goto out_unlock;
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task
@ 2022-06-23 12:49 ` Michal Koutný
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Koutný @ 2022-06-23 12:49 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Waiman Long

When we migrate a task between two cgroups, one of the checks is a
verification that we can modify task's scheduler settings
(cap_task_setscheduler()).

An implicit migration occurs also when enabling a controller on the
unified hierarchy (think of parent to child migration). The
aforementioned check may be problematic if the caller of the migration
(enabling a controller) has no permissions over migrated tasks.
For instance, user's cgroup that ends up running a process of a
different user. Although cgroup permissions are configured favorably,
the enablement fails due to the foreign process [1].

Change the behavior by relaxing the permissions check on the unified
hierarchy (or in v2 mode). This is in accordance with unified hierarchy
attachment behavior when permissions of the source to target cgroups are
decisive whereas the migrated task is opaque (for contrast, see more
restrictive check in __cgroup1_procs_write()).

[1] https://github.com/systemd/systemd/issues/18293#issuecomment-831205649

Reasons for RFC:

1) The unified hierarchy attachment behavior -- is that the
   right/consented model that migrated objects don't matter?

2) If 1) is true, have I missed any danger in allowing cpuset'ing a
   possibly privileged processes?

2.2) cpuset may be in v2 mode even on v1 hierarchy with different
   migration control rules (but checking migratee's creds in v1
   eliminates effect of this patch).

3) Alternative approach would be to allow cpuset migrations only when
   nothing effectively changes (which is the case for parent->child
   migration upon controller enablement).

4) This is just idea draft, not tested in the real case.

Thanks.

Signed-off-by: Michal Koutn√Ω <mkoutny-IBi9RG/b67k@public.gmane.org>
---
 kernel/cgroup/cpuset.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 71a418858a5e..dbe78577de5b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2232,7 +2232,9 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)

 	percpu_down_write(&cpuset_rwsem);

-	/* allow moving tasks into an empty cpuset if on default hierarchy */
+	/* allow moving tasks into an empty cpuset if on default hierarchy,
+	 * also bypass permission check (access is controlled via cgroup(s)
+	 * owner in cgroup_procs_write_permission()) */
 	ret = -ENOSPC;
 	if (!is_in_v2_mode() &&
 	    (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)))
@@ -2242,6 +2244,9 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 		ret = task_can_attach(task, cs->cpus_allowed);
 		if (ret)
 			goto out_unlock;
+
+		if (is_in_v2_mode())
+			continue;
 		ret = security_task_setscheduler(task);
 		if (ret)
 			goto out_unlock;
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task
@ 2022-06-23 18:44   ` Waiman Long
  0 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2022-06-23 18:44 UTC (permalink / raw)
  To: Michal Koutný, cgroups, linux-kernel
  Cc: Tejun Heo, Zefan Li, Johannes Weiner

On 6/23/22 08:49, Michal Koutný wrote:
> When we migrate a task between two cgroups, one of the checks is a
> verification that we can modify task's scheduler settings
> (cap_task_setscheduler()).
>
> An implicit migration occurs also when enabling a controller on the
> unified hierarchy (think of parent to child migration). The
> aforementioned check may be problematic if the caller of the migration
> (enabling a controller) has no permissions over migrated tasks.
> For instance, user's cgroup that ends up running a process of a
> different user. Although cgroup permissions are configured favorably,
> the enablement fails due to the foreign process [1].
>
> Change the behavior by relaxing the permissions check on the unified
> hierarchy (or in v2 mode). This is in accordance with unified hierarchy
> attachment behavior when permissions of the source to target cgroups are
> decisive whereas the migrated task is opaque (for contrast, see more
> restrictive check in __cgroup1_procs_write()).
>
> [1] https://github.com/systemd/systemd/issues/18293#issuecomment-831205649
>
> Reasons for RFC:
>
> 1) The unified hierarchy attachment behavior -- is that the
>     right/consented model that migrated objects don't matter?
>
> 2) If 1) is true, have I missed any danger in allowing cpuset'ing a
>     possibly privileged processes?
That could be an issue.

> 2.2) cpuset may be in v2 mode even on v1 hierarchy with different
>     migration control rules (but checking migratee's creds in v1
>     eliminates effect of this patch).
>
> 3) Alternative approach would be to allow cpuset migrations only when
>     nothing effectively changes (which is the case for parent->child
>     migration upon controller enablement).
What do you mean by nothing effectively changes?
>
> 4) This is just idea draft, not tested in the real case.

Since the check is done on a taskset level, if only one of the tasks in 
the taskset fails, the whole taskset fails. Maybe we should consider an 
option for task based migration. So all the tasks that can be migrated 
will be migrated and the rests will be left behind in the original 
cpuset. Just a thought.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task
@ 2022-06-23 18:44   ` Waiman Long
  0 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2022-06-23 18:44 UTC (permalink / raw)
  To: Michal Koutný,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Tejun Heo, Zefan Li, Johannes Weiner

On 6/23/22 08:49, Michal KoutnÃ½ wrote:
> When we migrate a task between two cgroups, one of the checks is a
> verification that we can modify task's scheduler settings
> (cap_task_setscheduler()).
>
> An implicit migration occurs also when enabling a controller on the
> unified hierarchy (think of parent to child migration). The
> aforementioned check may be problematic if the caller of the migration
> (enabling a controller) has no permissions over migrated tasks.
> For instance, user's cgroup that ends up running a process of a
> different user. Although cgroup permissions are configured favorably,
> the enablement fails due to the foreign process [1].
>
> Change the behavior by relaxing the permissions check on the unified
> hierarchy (or in v2 mode). This is in accordance with unified hierarchy
> attachment behavior when permissions of the source to target cgroups are
> decisive whereas the migrated task is opaque (for contrast, see more
> restrictive check in __cgroup1_procs_write()).
>
> [1] https://github.com/systemd/systemd/issues/18293#issuecomment-831205649
>
> Reasons for RFC:
>
> 1) The unified hierarchy attachment behavior -- is that the
>     right/consented model that migrated objects don't matter?
>
> 2) If 1) is true, have I missed any danger in allowing cpuset'ing a
>     possibly privileged processes?
That could be an issue.

> 2.2) cpuset may be in v2 mode even on v1 hierarchy with different
>     migration control rules (but checking migratee's creds in v1
>     eliminates effect of this patch).
>
> 3) Alternative approach would be to allow cpuset migrations only when
>     nothing effectively changes (which is the case for parent->child
>     migration upon controller enablement).
What do you mean by nothing effectively changes?
>
> 4) This is just idea draft, not tested in the real case.

Since the check is done on a taskset level, if only one of the tasks in 
the taskset fails, the whole taskset fails. Maybe we should consider an 
option for task based migration. So all the tasks that can be migrated 
will be migrated and the rests will be left behind in the original 
cpuset. Just a thought.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task
@ 2022-06-24 12:43     ` Michal Koutný
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Koutný @ 2022-06-24 12:43 UTC (permalink / raw)
  To: Waiman Long; +Cc: cgroups, linux-kernel, Tejun Heo, Zefan Li, Johannes Weiner

[-- Attachment #1: Type: text/plain, Size: 1164 bytes --]

On Thu, Jun 23, 2022 at 02:44:33PM -0400, Waiman Long <longman@redhat.com> wrote:
> That could be an issue.

The way how I understand here is that the privileged process isn't part
of contrained workload (if it were then, it can modify cpuset itself)
like debugging or tracing a code within a container entered by the
admin. The bypass could not be used to setscheduler (via migration) of
an arbitrary process.

> What do you mean by nothing effectively changes?

It's a freshly created child (after cpuset_css_online()), so it inherits
parent's attributes, so the migration from the parent to this child
doesn't affect CPU affinity etc.

> Since the check is done on a taskset level, if only one of the tasks in the
> taskset fails, the whole taskset fails. Maybe we should consider an option
> for task based migration. So all the tasks that can be migrated will be
> migrated and the rests will be left behind in the original cpuset.

Hm, I haven't thought about that. That might be in theory possible for
threaded controllers (like cpuset) but I imagine it'd a bit messy, in
particular for these implicit migrations upon controller enablement.

Thanks,
Michal

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task
@ 2022-06-24 12:43     ` Michal Koutný
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Koutný @ 2022-06-24 12:43 UTC (permalink / raw)
  To: Waiman Long
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Zefan Li,
	Johannes Weiner

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

On Thu, Jun 23, 2022 at 02:44:33PM -0400, Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> That could be an issue.

The way how I understand here is that the privileged process isn't part
of contrained workload (if it were then, it can modify cpuset itself)
like debugging or tracing a code within a container entered by the
admin. The bypass could not be used to setscheduler (via migration) of
an arbitrary process.

> What do you mean by nothing effectively changes?

It's a freshly created child (after cpuset_css_online()), so it inherits
parent's attributes, so the migration from the parent to this child
doesn't affect CPU affinity etc.

> Since the check is done on a taskset level, if only one of the tasks in the
> taskset fails, the whole taskset fails. Maybe we should consider an option
> for task based migration. So all the tasks that can be migrated will be
> migrated and the rests will be left behind in the original cpuset.

Hm, I haven't thought about that. That might be in theory possible for
threaded controllers (like cpuset) but I imagine it'd a bit messy, in
particular for these implicit migrations upon controller enablement.

Thanks,
Michal

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task
@ 2022-06-25  4:20   ` Tejun Heo
  0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2022-06-25  4:20 UTC (permalink / raw)
  To: Michal Koutný
  Cc: cgroups, linux-kernel, Zefan Li, Johannes Weiner, Waiman Long

Hello,

On Thu, Jun 23, 2022 at 02:49:44PM +0200, Michal Koutný wrote:
> 1) The unified hierarchy attachment behavior -- is that the
>    right/consented model that migrated objects don't matter?

Yes.

> 2) If 1) is true, have I missed any danger in allowing cpuset'ing a
>    possibly privileged processes?

Given that the someone who has write perm on the cgroup or its
ancestors are allowed to change cpuset config itself, I have a hard
time imagining that check being all that useful as a security
mechanism.

> 2.2) cpuset may be in v2 mode even on v1 hierarchy with different
>    migration control rules (but checking migratee's creds in v1
>    eliminates effect of this patch).

Yeah, it should be fine to apply the change only to v2.

> 3) Alternative approach would be to allow cpuset migrations only when
>    nothing effectively changes (which is the case for parent->child
>    migration upon controller enablement).
> 
> 4) This is just idea draft, not tested in the real case.

Unless I'm missing something obvious, I don't see a reason to keep the
check, so please feel free to test and send a SOB'd patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task
@ 2022-06-25  4:20   ` Tejun Heo
  0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2022-06-25  4:20 UTC (permalink / raw)
  To: Michal Koutný
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zefan Li, Johannes Weiner,
	Waiman Long

Hello,

On Thu, Jun 23, 2022 at 02:49:44PM +0200, Michal Koutný wrote:
> 1) The unified hierarchy attachment behavior -- is that the
>    right/consented model that migrated objects don't matter?

Yes.

> 2) If 1) is true, have I missed any danger in allowing cpuset'ing a
>    possibly privileged processes?

Given that the someone who has write perm on the cgroup or its
ancestors are allowed to change cpuset config itself, I have a hard
time imagining that check being all that useful as a security
mechanism.

> 2.2) cpuset may be in v2 mode even on v1 hierarchy with different
>    migration control rules (but checking migratee's creds in v1
>    eliminates effect of this patch).

Yeah, it should be fine to apply the change only to v2.

> 3) Alternative approach would be to allow cpuset migrations only when
>    nothing effectively changes (which is the case for parent->child
>    migration upon controller enablement).
> 
> 4) This is just idea draft, not tested in the real case.

Unless I'm missing something obvious, I don't see a reason to keep the
check, so please feel free to test and send a SOB'd patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-06-25  4:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-23 12:49 [RFC PATCH] cpuset: Allow setscheduler regardless of manipulated task Michal Koutný
2022-06-23 12:49 ` Michal Koutný
2022-06-23 18:44 ` Waiman Long
2022-06-23 18:44   ` Waiman Long
2022-06-24 12:43   ` Michal Koutný
2022-06-24 12:43     ` Michal Koutný
2022-06-25  4:20 ` Tejun Heo
2022-06-25  4:20   ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.