All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] sched/core: Use empty mask to reset cpumasks in sched_setaffinity()
@ 2023-08-04  2:32 Waiman Long
  2023-08-18 18:47 ` Waiman Long
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Waiman Long @ 2023-08-04  2:32 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider
  Cc: linux-kernel, Phil Auld, Brent Rowsell, Peter Hunt, Waiman Long

Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
cpumask"), user provided CPU affinity via sched_setaffinity(2) is
perserved even if the task is being moved to a different cpuset. However,
that affinity is also being inherited by any subsequently created child
processes which may not want or be aware of that affinity.

One way to solve this problem is to provide a way to back off from
that user provided CPU affinity.  This patch implements such a scheme
by using an empty cpumask to signal a reset of the cpumasks to the
default as allowed by the current cpuset.

Before this patch, passing in an empty cpumask to sched_setaffinity(2)
will always return an -EINVAL error. With this patch, an alternative
error of -ENODEV will be returned returned if sched_setaffinity(2)
has been called before to set up user_cpus_ptr. In this case, the
user_cpus_ptr that stores the user provided affinity will be cleared and
the task's CPU affinity will be reset to that of the current cpuset. This
alternative error code of -ENODEV signals that the no CPU is specified
and, at the same time, a side effect of resetting cpu affinity to the
cpuset default.

If sched_setaffinity(2) has not been called previously, an EINVAL error
will be returned with an empty cpumask just like before.  Tests or
tools that rely on the behavior that an empty cpumask will return an
error code will not be affected.

We will have to update the sched_setaffinity(2) manpage to document
this possible side effect of passing in an empty cpumask.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/sched/core.c | 42 +++++++++++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c52c2eba7c73..3ef7397f2a61 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8317,7 +8317,12 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx)
 	}
 
 	cpuset_cpus_allowed(p, cpus_allowed);
-	cpumask_and(new_mask, ctx->new_mask, cpus_allowed);
+
+	/* Default to cpus_allowed with NULL new_mask */
+	if (ctx->new_mask)
+		cpumask_and(new_mask, ctx->new_mask, cpus_allowed);
+	else
+		cpumask_copy(new_mask, cpus_allowed);
 
 	ctx->new_mask = new_mask;
 	ctx->flags |= SCA_CHECK;
@@ -8366,6 +8371,7 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx)
 
 long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 {
+	bool reset_cpumasks = cpumask_empty(in_mask);
 	struct affinity_context ac;
 	struct cpumask *user_mask;
 	struct task_struct *p;
@@ -8403,15 +8409,26 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 		goto out_put_task;
 
 	/*
-	 * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
-	 * alloc_user_cpus_ptr() returns NULL.
+	 * If an empty cpumask is passed in and user_cpus_ptr is set,
+	 * clear user_cpus_ptr and reset the current cpu affinity to the
+	 * default for the current cpuset. If user_cpus_ptr isn't set,
+	 * -EINVAL will be returned as before.
 	 */
-	user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
-	if (user_mask) {
-		cpumask_copy(user_mask, in_mask);
-	} else if (IS_ENABLED(CONFIG_SMP)) {
-		retval = -ENOMEM;
-		goto out_put_task;
+	if (reset_cpumasks && p->user_cpus_ptr) {
+		in_mask = NULL;	/* To be updated in __sched_setaffinity */
+		user_mask = NULL;
+	} else {
+		/*
+		 * With non-SMP configs, user_cpus_ptr/user_mask isn't used
+		 * and alloc_user_cpus_ptr() returns NULL.
+		 */
+		user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
+		if (user_mask) {
+			cpumask_copy(user_mask, in_mask);
+		} else if (IS_ENABLED(CONFIG_SMP)) {
+			retval = -ENOMEM;
+			goto out_put_task;
+		}
 	}
 
 	ac = (struct affinity_context){
@@ -8423,6 +8440,13 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 	retval = __sched_setaffinity(p, &ac);
 	kfree(ac.user_mask);
 
+	/*
+	 * Force an error return (-ENODEV), if no error yet, for the empty
+	 * cpumask case to avoid breaking existing tests.
+	 */
+	if (reset_cpumasks && !retval)
+		retval = -ENODEV;
+
 out_put_task:
 	put_task_struct(p);
 	return retval;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sched/core: Use empty mask to reset cpumasks in sched_setaffinity()
  2023-08-04  2:32 [PATCH v3] sched/core: Use empty mask to reset cpumasks in sched_setaffinity() Waiman Long
@ 2023-08-18 18:47 ` Waiman Long
  2023-10-03  9:17 ` Ingo Molnar
  2023-10-03 10:06 ` Peter Zijlstra
  2 siblings, 0 replies; 7+ messages in thread
From: Waiman Long @ 2023-08-18 18:47 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider
  Cc: linux-kernel, Phil Auld, Brent Rowsell, Peter Hunt

On 8/3/23 22:32, Waiman Long wrote:
> Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
> cpumask"), user provided CPU affinity via sched_setaffinity(2) is
> perserved even if the task is being moved to a different cpuset. However,
> that affinity is also being inherited by any subsequently created child
> processes which may not want or be aware of that affinity.
>
> One way to solve this problem is to provide a way to back off from
> that user provided CPU affinity.  This patch implements such a scheme
> by using an empty cpumask to signal a reset of the cpumasks to the
> default as allowed by the current cpuset.
>
> Before this patch, passing in an empty cpumask to sched_setaffinity(2)
> will always return an -EINVAL error. With this patch, an alternative
> error of -ENODEV will be returned returned if sched_setaffinity(2)
> has been called before to set up user_cpus_ptr. In this case, the
> user_cpus_ptr that stores the user provided affinity will be cleared and
> the task's CPU affinity will be reset to that of the current cpuset. This
> alternative error code of -ENODEV signals that the no CPU is specified
> and, at the same time, a side effect of resetting cpu affinity to the
> cpuset default.
>
> If sched_setaffinity(2) has not been called previously, an EINVAL error
> will be returned with an empty cpumask just like before.  Tests or
> tools that rely on the behavior that an empty cpumask will return an
> error code will not be affected.
>
> We will have to update the sched_setaffinity(2) manpage to document
> this possible side effect of passing in an empty cpumask.
>
> Signed-off-by: Waiman Long <longman@redhat.com>

Ping.

Are there other concerns about this patch? I haven't seen any error 
report from kernel test robot so far.

Cheers,
Longman

> ---
>   kernel/sched/core.c | 42 +++++++++++++++++++++++++++++++++---------
>   1 file changed, 33 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c52c2eba7c73..3ef7397f2a61 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8317,7 +8317,12 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx)
>   	}
>   
>   	cpuset_cpus_allowed(p, cpus_allowed);
> -	cpumask_and(new_mask, ctx->new_mask, cpus_allowed);
> +
> +	/* Default to cpus_allowed with NULL new_mask */
> +	if (ctx->new_mask)
> +		cpumask_and(new_mask, ctx->new_mask, cpus_allowed);
> +	else
> +		cpumask_copy(new_mask, cpus_allowed);
>   
>   	ctx->new_mask = new_mask;
>   	ctx->flags |= SCA_CHECK;
> @@ -8366,6 +8371,7 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx)
>   
>   long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
>   {
> +	bool reset_cpumasks = cpumask_empty(in_mask);
>   	struct affinity_context ac;
>   	struct cpumask *user_mask;
>   	struct task_struct *p;
> @@ -8403,15 +8409,26 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
>   		goto out_put_task;
>   
>   	/*
> -	 * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
> -	 * alloc_user_cpus_ptr() returns NULL.
> +	 * If an empty cpumask is passed in and user_cpus_ptr is set,
> +	 * clear user_cpus_ptr and reset the current cpu affinity to the
> +	 * default for the current cpuset. If user_cpus_ptr isn't set,
> +	 * -EINVAL will be returned as before.
>   	 */
> -	user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
> -	if (user_mask) {
> -		cpumask_copy(user_mask, in_mask);
> -	} else if (IS_ENABLED(CONFIG_SMP)) {
> -		retval = -ENOMEM;
> -		goto out_put_task;
> +	if (reset_cpumasks && p->user_cpus_ptr) {
> +		in_mask = NULL;	/* To be updated in __sched_setaffinity */
> +		user_mask = NULL;
> +	} else {
> +		/*
> +		 * With non-SMP configs, user_cpus_ptr/user_mask isn't used
> +		 * and alloc_user_cpus_ptr() returns NULL.
> +		 */
> +		user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
> +		if (user_mask) {
> +			cpumask_copy(user_mask, in_mask);
> +		} else if (IS_ENABLED(CONFIG_SMP)) {
> +			retval = -ENOMEM;
> +			goto out_put_task;
> +		}
>   	}
>   
>   	ac = (struct affinity_context){
> @@ -8423,6 +8440,13 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
>   	retval = __sched_setaffinity(p, &ac);
>   	kfree(ac.user_mask);
>   
> +	/*
> +	 * Force an error return (-ENODEV), if no error yet, for the empty
> +	 * cpumask case to avoid breaking existing tests.
> +	 */
> +	if (reset_cpumasks && !retval)
> +		retval = -ENODEV;
> +
>   out_put_task:
>   	put_task_struct(p);
>   	return retval;


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sched/core: Use empty mask to reset cpumasks in sched_setaffinity()
  2023-08-04  2:32 [PATCH v3] sched/core: Use empty mask to reset cpumasks in sched_setaffinity() Waiman Long
  2023-08-18 18:47 ` Waiman Long
@ 2023-10-03  9:17 ` Ingo Molnar
  2023-10-03 18:32   ` Waiman Long
  2023-10-03 10:06 ` Peter Zijlstra
  2 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2023-10-03  9:17 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, Brent Rowsell, Peter Hunt


* Waiman Long <longman@redhat.com> wrote:

> Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
> cpumask"), user provided CPU affinity via sched_setaffinity(2) is
> perserved even if the task is being moved to a different cpuset. However,
> that affinity is also being inherited by any subsequently created child
> processes which may not want or be aware of that affinity.
> 
> One way to solve this problem is to provide a way to back off from
> that user provided CPU affinity.  This patch implements such a scheme
> by using an empty cpumask to signal a reset of the cpumasks to the
> default as allowed by the current cpuset.
> 
> Before this patch, passing in an empty cpumask to sched_setaffinity(2)
> will always return an -EINVAL error. With this patch, an alternative
> error of -ENODEV will be returned returned if sched_setaffinity(2)
> has been called before to set up user_cpus_ptr. In this case, the
> user_cpus_ptr that stores the user provided affinity will be cleared and
> the task's CPU affinity will be reset to that of the current cpuset. This
> alternative error code of -ENODEV signals that the no CPU is specified
> and, at the same time, a side effect of resetting cpu affinity to the
> cpuset default.

I agree that this problem needs a solution, but I don't really agree
with the -ENODEV ABI hack.

Why not just return success in that case? The 'reset' of the mask was
successful after all.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sched/core: Use empty mask to reset cpumasks in sched_setaffinity()
  2023-08-04  2:32 [PATCH v3] sched/core: Use empty mask to reset cpumasks in sched_setaffinity() Waiman Long
  2023-08-18 18:47 ` Waiman Long
  2023-10-03  9:17 ` Ingo Molnar
@ 2023-10-03 10:06 ` Peter Zijlstra
  2023-10-03 18:58   ` Waiman Long
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2023-10-03 10:06 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, Brent Rowsell, Peter Hunt

On Thu, Aug 03, 2023 at 10:32:18PM -0400, Waiman Long wrote:
> Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
> cpumask"), user provided CPU affinity via sched_setaffinity(2) is
> perserved even if the task is being moved to a different cpuset. However,
> that affinity is also being inherited by any subsequently created child
> processes which may not want or be aware of that affinity.
> 
> One way to solve this problem is to provide a way to back off from
> that user provided CPU affinity.  This patch implements such a scheme
> by using an empty cpumask to signal a reset of the cpumasks to the
> default as allowed by the current cpuset.

So I still don't like this much, the normal state is all bits set:

  $ grep allowed /proc/self/status
  Cpus_allowed:   ff,ffffffff

The all clear bitmask just feels weird for this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sched/core: Use empty mask to reset cpumasks in sched_setaffinity()
  2023-10-03  9:17 ` Ingo Molnar
@ 2023-10-03 18:32   ` Waiman Long
  0 siblings, 0 replies; 7+ messages in thread
From: Waiman Long @ 2023-10-03 18:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, Brent Rowsell, Peter Hunt


On 10/3/23 05:17, Ingo Molnar wrote:
> * Waiman Long <longman@redhat.com> wrote:
>
>> Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
>> cpumask"), user provided CPU affinity via sched_setaffinity(2) is
>> perserved even if the task is being moved to a different cpuset. However,
>> that affinity is also being inherited by any subsequently created child
>> processes which may not want or be aware of that affinity.
>>
>> One way to solve this problem is to provide a way to back off from
>> that user provided CPU affinity.  This patch implements such a scheme
>> by using an empty cpumask to signal a reset of the cpumasks to the
>> default as allowed by the current cpuset.
>>
>> Before this patch, passing in an empty cpumask to sched_setaffinity(2)
>> will always return an -EINVAL error. With this patch, an alternative
>> error of -ENODEV will be returned returned if sched_setaffinity(2)
>> has been called before to set up user_cpus_ptr. In this case, the
>> user_cpus_ptr that stores the user provided affinity will be cleared and
>> the task's CPU affinity will be reset to that of the current cpuset. This
>> alternative error code of -ENODEV signals that the no CPU is specified
>> and, at the same time, a side effect of resetting cpu affinity to the
>> cpuset default.
> I agree that this problem needs a solution, but I don't really agree
> with the -ENODEV ABI hack.
>
> Why not just return success in that case? The 'reset' of the mask was
> successful after all.

I believe the v1 patch just returns success like what you said. However, 
there are existing tests that assume a sched_setaffinity() call with 
empty cpumask in the valid cpu range will return error. It is also 
sometime used to check if the CPU number is out of the valid range. That 
is the reason why I change the patch to return error as well to avoid 
breaking existing use cases. I purposely return a different error to 
indicate a reset has happened. Let me know if you have other suggestions 
on the best way forward.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sched/core: Use empty mask to reset cpumasks in sched_setaffinity()
  2023-10-03 10:06 ` Peter Zijlstra
@ 2023-10-03 18:58   ` Waiman Long
  2023-10-03 21:48     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2023-10-03 18:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, Brent Rowsell, Peter Hunt


On 10/3/23 06:06, Peter Zijlstra wrote:
> On Thu, Aug 03, 2023 at 10:32:18PM -0400, Waiman Long wrote:
>> Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
>> cpumask"), user provided CPU affinity via sched_setaffinity(2) is
>> perserved even if the task is being moved to a different cpuset. However,
>> that affinity is also being inherited by any subsequently created child
>> processes which may not want or be aware of that affinity.
>>
>> One way to solve this problem is to provide a way to back off from
>> that user provided CPU affinity.  This patch implements such a scheme
>> by using an empty cpumask to signal a reset of the cpumasks to the
>> default as allowed by the current cpuset.
> So I still don't like this much, the normal state is all bits set:
>
>    $ grep allowed /proc/self/status
>    Cpus_allowed:   ff,ffffffff
>
> The all clear bitmask just feels weird for this.

The main reason for using an empty bitmask is the presence of the 
CPU_ZERO() macro that can produce this empty cpumask. It is certainly 
possible to use an all set bitmask for reset purpose. The only problem 
is it is more complicated to generate such a bitmask as there is no 
existing CPU* macros that can be used.

Another possible alternative is to use a cpusetsize of 0 to indicate a 
reset as long as it doesn't cause problem with existing code. Will that 
be acceptable?

Cheers,
Longman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sched/core: Use empty mask to reset cpumasks in sched_setaffinity()
  2023-10-03 18:58   ` Waiman Long
@ 2023-10-03 21:48     ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2023-10-03 21:48 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Phil Auld, Brent Rowsell, Peter Hunt

On Tue, Oct 03, 2023 at 02:58:58PM -0400, Waiman Long wrote:
> 
> On 10/3/23 06:06, Peter Zijlstra wrote:
> > On Thu, Aug 03, 2023 at 10:32:18PM -0400, Waiman Long wrote:
> > > Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
> > > cpumask"), user provided CPU affinity via sched_setaffinity(2) is
> > > perserved even if the task is being moved to a different cpuset. However,
> > > that affinity is also being inherited by any subsequently created child
> > > processes which may not want or be aware of that affinity.
> > > 
> > > One way to solve this problem is to provide a way to back off from
> > > that user provided CPU affinity.  This patch implements such a scheme
> > > by using an empty cpumask to signal a reset of the cpumasks to the
> > > default as allowed by the current cpuset.
> > So I still don't like this much, the normal state is all bits set:
> > 
> >    $ grep allowed /proc/self/status
> >    Cpus_allowed:   ff,ffffffff
> > 
> > The all clear bitmask just feels weird for this.
> 
> The main reason for using an empty bitmask is the presence of the CPU_ZERO()
> macro that can produce this empty cpumask. It is certainly possible to use
> an all set bitmask for reset purpose. The only problem is it is more
> complicated to generate such a bitmask as there is no existing CPU* macros
> that can be used.

Blergh, FreeBSD has CPU_FILL(), but it appears we don't have this.

Still, nothing a memset can't fix. CPU_ZERO() ends up in
__builtin_memset() too. I'm sure our glibc boys can add CPU_FILL()
eventually.

Anyway, I see you sent a v4, I'll go look at that in the am, sleep now.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-10-03 21:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-04  2:32 [PATCH v3] sched/core: Use empty mask to reset cpumasks in sched_setaffinity() Waiman Long
2023-08-18 18:47 ` Waiman Long
2023-10-03  9:17 ` Ingo Molnar
2023-10-03 18:32   ` Waiman Long
2023-10-03 10:06 ` Peter Zijlstra
2023-10-03 18:58   ` Waiman Long
2023-10-03 21:48     ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.