linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org, Phil Auld <pauld@redhat.com>,
	Brent Rowsell <browsell@redhat.com>,
	Peter Hunt <pehunt@redhat.com>, Waiman Long <longman@redhat.com>
Subject: [PATCH v4] sched/core: Use zero length to reset cpumasks in sched_setaffinity()
Date: Tue,  3 Oct 2023 16:57:35 -0400	[thread overview]
Message-ID: <20231003205735.2921964-1-longman@redhat.com> (raw)

Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
cpumask"), user provided CPU affinity via sched_setaffinity(2) is
perserved even if the task is being moved to a different cpuset. However,
that affinity is also being inherited by any subsequently created child
processes which may not want or be aware of that affinity.

One way to solve this problem is to provide a way to back off from that
user provided CPU affinity.  This patch implements such a scheme by
using an input cpumask length of 0 to signal a reset of the cpumasks
to the default as allowed by the current cpuset.  A non-NULL cpumask
should still be provided to avoid problem with older kernel.

If sched_setaffinity(2) has been called previously to set a user
supplied cpumask, a value of 0 will be returned to indicate success.
Otherwise, an error value of -EINVAL will be returned.

We may have to update the sched_setaffinity(2) manpage to document
this new side effect of passing in an input length of 0.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/sched/core.c | 43 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 802551e0009b..a10d507a05df 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8315,7 +8315,12 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx)
 	}
 
 	cpuset_cpus_allowed(p, cpus_allowed);
-	cpumask_and(new_mask, ctx->new_mask, cpus_allowed);
+
+	/* Default to cpus_allowed with NULL new_mask */
+	if (ctx->new_mask)
+		cpumask_and(new_mask, ctx->new_mask, cpus_allowed);
+	else
+		cpumask_copy(new_mask, cpus_allowed);
 
 	ctx->new_mask = new_mask;
 	ctx->flags |= SCA_CHECK;
@@ -8401,15 +8406,29 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 		goto out_put_task;
 
 	/*
-	 * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
-	 * alloc_user_cpus_ptr() returns NULL.
+	 * If a NULL cpumask is passed in and user_cpus_ptr is set,
+	 * clear user_cpus_ptr and reset the current cpu affinity to the
+	 * default for the current cpuset. If user_cpus_ptr isn't set,
+	 * -EINVAL will be returned.
 	 */
-	user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
-	if (user_mask) {
-		cpumask_copy(user_mask, in_mask);
-	} else if (IS_ENABLED(CONFIG_SMP)) {
-		retval = -ENOMEM;
-		goto out_put_task;
+	if (!in_mask) {
+		if (!p->user_cpus_ptr) {
+			retval = -EINVAL;
+			goto out_put_task;
+		}
+		user_mask = NULL;
+	} else {
+		/*
+		 * With non-SMP configs, user_cpus_ptr/user_mask isn't used
+		 * and alloc_user_cpus_ptr() returns NULL.
+		 */
+		user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
+		if (user_mask) {
+			cpumask_copy(user_mask, in_mask);
+		} else if (IS_ENABLED(CONFIG_SMP)) {
+			retval = -ENOMEM;
+			goto out_put_task;
+		}
 	}
 
 	ac = (struct affinity_context){
@@ -8451,6 +8470,12 @@ SYSCALL_DEFINE3(sched_setaffinity, pid_t, pid, unsigned int, len,
 	cpumask_var_t new_mask;
 	int retval;
 
+	/*
+	 * A len of 0 will reset a previously set user cpumask.
+	 */
+	if (!len)
+		return sched_setaffinity(pid, NULL);
+
 	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
 		return -ENOMEM;
 
-- 
2.39.3


             reply	other threads:[~2023-10-03 20:58 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-03 20:57 Waiman Long [this message]
2023-10-04  8:36 ` [PATCH v4] sched/core: Use zero length to reset cpumasks in sched_setaffinity() Peter Zijlstra
2023-10-04  9:23   ` Ingo Molnar
2023-10-04  9:43     ` Peter Zijlstra
2023-10-04 10:06       ` Ingo Molnar
2023-10-04 12:19         ` Waiman Long
2023-10-04 12:34   ` Florian Weimer
2023-10-04 12:41     ` Waiman Long
2023-10-04 12:55       ` Florian Weimer
2023-10-04 16:23         ` Waiman Long
2023-10-04 13:52     ` Peter Zijlstra
2023-10-04 13:54       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231003205735.2921964-1-longman@redhat.com \
    --to=longman@redhat.com \
    --cc=bristot@redhat.com \
    --cc=browsell@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=pehunt@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).