From: Will Deacon <will@kernel.org>
To: Waiman Long <longman@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Lai Jiangshan <jiangshanlai@gmail.com>,
qperret@google.com
Subject: Re: [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity()
Date: Tue, 17 Jan 2023 16:08:26 +0000 [thread overview]
Message-ID: <20230117160825.GA17756@willie-the-truck> (raw)
In-Reply-To: <20220922180041.1768141-3-longman@redhat.com>
Hi Waiman,
On Thu, Sep 22, 2022 at 02:00:38PM -0400, Waiman Long wrote:
> The user_cpus_ptr field is added by commit b90ca8badbd1 ("sched:
> Introduce task_struct::user_cpus_ptr to track requested affinity"). It
> is currently used only by arm64 arch due to possible asymmetric CPU
> setup. This patch extends its usage to save user provided cpumask
> when sched_setaffinity() is called for all arches. With this patch
> applied, user_cpus_ptr, once allocated after a successful call to
> sched_setaffinity(), will only be freed when the task exits.
>
> Since user_cpus_ptr is supposed to be used for "requested
> affinity", there is actually no point to save current cpu affinity in
> restrict_cpus_allowed_ptr() if sched_setaffinity() has never been called.
> Modify the logic to set user_cpus_ptr only in sched_setaffinity() and use
> it in restrict_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
> if defined but not changing it.
>
> This will be some changes in behavior for arm64 systems with asymmetric
> CPUs in some corner cases. For instance, if sched_setaffinity()
> has never been called and there is a cpuset change before
> relax_compatible_cpus_allowed_ptr() is called, its subsequent call will
> follow what the cpuset allows but not what the previous cpu affinity
> setting allows.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/sched/core.c | 82 ++++++++++++++++++++------------------------
> kernel/sched/sched.h | 7 ++++
> 2 files changed, 44 insertions(+), 45 deletions(-)
We've tracked this down as the cause of an arm64 regression in Android and I've
reproduced the issue with mainline.
Basically, if an arm64 system is booted with "allow_mismatched_32bit_el0" on
the command-line, then the arch code will (amongst other things) call
force_compatible_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
when exec()'ing a 32-bit or a 64-bit task respectively.
If you consider a system where everything is 64-bit but the cmdline option
above is present, then the call to relax_compatible_cpus_allowed_ptr() isn't
expected to do anything in this case, and the old code made sure of that:
> @@ -3055,30 +3032,21 @@ __sched_setaffinity(struct task_struct *p, const struct cpumask *mask);
>
> /*
> * Restore the affinity of a task @p which was previously restricted by a
> - * call to force_compatible_cpus_allowed_ptr(). This will clear (and free)
> - * @p->user_cpus_ptr.
> + * call to force_compatible_cpus_allowed_ptr().
> *
> * It is the caller's responsibility to serialise this with any calls to
> * force_compatible_cpus_allowed_ptr(@p).
> */
> void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
> {
> - struct cpumask *user_mask = p->user_cpus_ptr;
> - unsigned long flags;
> + int ret;
>
> /*
> - * Try to restore the old affinity mask. If this fails, then
> - * we free the mask explicitly to avoid it being inherited across
> - * a subsequent fork().
> + * Try to restore the old affinity mask with __sched_setaffinity().
> + * Cpuset masking will be done there too.
> */
> - if (!user_mask || !__sched_setaffinity(p, user_mask))
> - return;
... since it returned early here if '!user_mask' ...
> -
> - raw_spin_lock_irqsave(&p->pi_lock, flags);
> - user_mask = clear_user_cpus_ptr(p);
> - raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> -
> - kfree(user_mask);
> + ret = __sched_setaffinity(p, task_user_cpus(p));
> + WARN_ON_ONCE(ret);
... however, now we end up going down into __sched_setaffinity() with
task_user_cpus(p) giving us the 'cpu_possible_mask'! This can lead to a mixture
of WARN_ON()s and incorrect affinity masks (for example, a newly exec'd task
ends up with the affinity mask of the online CPUs at the point of exec() and is
unable to run on anything onlined later).
I've had a crack at fixing the code above to restore the old behaviour, and it
seems to work for my basic tests (still pending confirmation from others):
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bb1ee6d7bdde..0d4a11384648 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3125,17 +3125,16 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx);
void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
{
struct affinity_context ac = {
- .new_mask = task_user_cpus(p),
+ .new_mask = p->user_cpus_ptr,
.flags = 0,
};
- int ret;
/*
* Try to restore the old affinity mask with __sched_setaffinity().
* Cpuset masking will be done there too.
*/
- ret = __sched_setaffinity(p, &ac);
- WARN_ON_ONCE(ret);
+ if (ac.new_mask)
+ WARN_ON_ONCE(__sched_setaffinity(p, &ac));
}
void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
With this change, task_user_cpus() is only used by restrict_cpus_allowed_ptr()
so I'd be inclined to remove it altogether tbh.
What do you think?
Will
next prev parent reply other threads:[~2023-01-17 16:08 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-22 18:00 [PATCH v10 0/5] sched: Persistent user requested affinity Waiman Long
2022-09-22 18:00 ` [PATCH v10 1/5] sched: Add __releases annotations to affine_move_task() Waiman Long
2022-10-28 6:42 ` [tip: sched/core] " tip-bot2 for Waiman Long
2022-09-22 18:00 ` [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity() Waiman Long
2022-10-28 6:42 ` [tip: sched/core] sched: Always preserve the user requested cpumask tip-bot2 for Waiman Long
2023-01-17 16:08 ` Will Deacon [this message]
2023-01-17 18:13 ` [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity() Waiman Long
2023-01-20 17:59 ` Will Deacon
2023-01-20 18:10 ` Waiman Long
2023-01-26 12:52 ` Linux kernel regression tracking (#adding)
2023-02-10 17:15 ` Linux kernel regression tracking (#update)
2023-01-27 18:36 ` Peter Zijlstra
2023-01-27 19:09 ` Waiman Long
2022-09-22 18:00 ` [PATCH v10 3/5] sched: Enforce user requested affinity Waiman Long
2022-10-07 10:01 ` Peter Zijlstra
2022-10-07 14:57 ` Waiman Long
2022-10-07 15:23 ` Waiman Long
2022-10-28 6:42 ` [tip: sched/core] " tip-bot2 for Waiman Long
2022-09-22 18:00 ` [PATCH v10 4/5] sched: Handle set_cpus_allowed_ptr(), sched_setaffinity() & other races Waiman Long
2022-10-07 12:47 ` Peter Zijlstra
2022-10-07 18:59 ` Waiman Long
2022-10-28 6:42 ` [tip: sched/core] sched: Introduce affinity_context tip-bot2 for Waiman Long
2022-09-22 18:00 ` [PATCH v10 5/5] sched: Always clear user_cpus_ptr in do_set_cpus_allowed() Waiman Long
2022-10-28 6:42 ` [tip: sched/core] " tip-bot2 for Waiman Long
2022-10-06 21:31 ` [PATCH v10 0/5] sched: Persistent user requested affinity Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230117160825.GA17756@willie-the-truck \
--to=will@kernel.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=jiangshanlai@gmail.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan.x@bytedance.com \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=qperret@google.com \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).