linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Waiman Long <longman@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	qperret@google.com
Subject: Re: [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity()
Date: Tue, 17 Jan 2023 16:08:26 +0000	[thread overview]
Message-ID: <20230117160825.GA17756@willie-the-truck> (raw)
In-Reply-To: <20220922180041.1768141-3-longman@redhat.com>

Hi Waiman,

On Thu, Sep 22, 2022 at 02:00:38PM -0400, Waiman Long wrote:
> The user_cpus_ptr field is added by commit b90ca8badbd1 ("sched:
> Introduce task_struct::user_cpus_ptr to track requested affinity"). It
> is currently used only by arm64 arch due to possible asymmetric CPU
> setup. This patch extends its usage to save user provided cpumask
> when sched_setaffinity() is called for all arches. With this patch
> applied, user_cpus_ptr, once allocated after a successful call to
> sched_setaffinity(), will only be freed when the task exits.
> 
> Since user_cpus_ptr is supposed to be used for "requested
> affinity", there is actually no point to save current cpu affinity in
> restrict_cpus_allowed_ptr() if sched_setaffinity() has never been called.
> Modify the logic to set user_cpus_ptr only in sched_setaffinity() and use
> it in restrict_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
> if defined but not changing it.
> 
> This will be some changes in behavior for arm64 systems with asymmetric
> CPUs in some corner cases. For instance, if sched_setaffinity()
> has never been called and there is a cpuset change before
> relax_compatible_cpus_allowed_ptr() is called, its subsequent call will
> follow what the cpuset allows but not what the previous cpu affinity
> setting allows.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/sched/core.c  | 82 ++++++++++++++++++++------------------------
>  kernel/sched/sched.h |  7 ++++
>  2 files changed, 44 insertions(+), 45 deletions(-)

We've tracked this down as the cause of an arm64 regression in Android and I've
reproduced the issue with mainline.

Basically, if an arm64 system is booted with "allow_mismatched_32bit_el0" on
the command-line, then the arch code will (amongst other things) call
force_compatible_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
when exec()'ing a 32-bit or a 64-bit task respectively.

If you consider a system where everything is 64-bit but the cmdline option
above is present, then the call to relax_compatible_cpus_allowed_ptr() isn't
expected to do anything in this case, and the old code made sure of that:

> @@ -3055,30 +3032,21 @@ __sched_setaffinity(struct task_struct *p, const struct cpumask *mask);
>  
>  /*
>   * Restore the affinity of a task @p which was previously restricted by a
> - * call to force_compatible_cpus_allowed_ptr(). This will clear (and free)
> - * @p->user_cpus_ptr.
> + * call to force_compatible_cpus_allowed_ptr().
>   *
>   * It is the caller's responsibility to serialise this with any calls to
>   * force_compatible_cpus_allowed_ptr(@p).
>   */
>  void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
>  {
> -	struct cpumask *user_mask = p->user_cpus_ptr;
> -	unsigned long flags;
> +	int ret;
>  
>  	/*
> -	 * Try to restore the old affinity mask. If this fails, then
> -	 * we free the mask explicitly to avoid it being inherited across
> -	 * a subsequent fork().
> +	 * Try to restore the old affinity mask with __sched_setaffinity().
> +	 * Cpuset masking will be done there too.
>  	 */
> -	if (!user_mask || !__sched_setaffinity(p, user_mask))
> -		return;

... since it returned early here if '!user_mask' ...

> -
> -	raw_spin_lock_irqsave(&p->pi_lock, flags);
> -	user_mask = clear_user_cpus_ptr(p);
> -	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> -
> -	kfree(user_mask);
> +	ret = __sched_setaffinity(p, task_user_cpus(p));
> +	WARN_ON_ONCE(ret);

... however, now we end up going down into __sched_setaffinity() with
task_user_cpus(p) giving us the 'cpu_possible_mask'! This can lead to a mixture
of WARN_ON()s and incorrect affinity masks (for example, a newly exec'd task
ends up with the affinity mask of the online CPUs at the point of exec() and is
unable to run on anything onlined later).

I've had a crack at fixing the code above to restore the old behaviour, and it
seems to work for my basic tests (still pending confirmation from others):


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bb1ee6d7bdde..0d4a11384648 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3125,17 +3125,16 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx);
 void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
 {
        struct affinity_context ac = {
-               .new_mask  = task_user_cpus(p),
+               .new_mask  = p->user_cpus_ptr,
                .flags     = 0,
        };
-       int ret;
 
        /*
         * Try to restore the old affinity mask with __sched_setaffinity().
         * Cpuset masking will be done there too.
         */
-       ret = __sched_setaffinity(p, &ac);
-       WARN_ON_ONCE(ret);
+       if (ac.new_mask)
+               WARN_ON_ONCE(__sched_setaffinity(p, &ac));
 }
 
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)


With this change, task_user_cpus() is only used by restrict_cpus_allowed_ptr()
so I'd be inclined to remove it altogether tbh.

What do you think?

Will

  parent reply	other threads:[~2023-01-17 16:08 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22 18:00 [PATCH v10 0/5] sched: Persistent user requested affinity Waiman Long
2022-09-22 18:00 ` [PATCH v10 1/5] sched: Add __releases annotations to affine_move_task() Waiman Long
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Waiman Long
2022-09-22 18:00 ` [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity() Waiman Long
2022-10-28  6:42   ` [tip: sched/core] sched: Always preserve the user requested cpumask tip-bot2 for Waiman Long
2023-01-17 16:08   ` Will Deacon [this message]
2023-01-17 18:13     ` [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity() Waiman Long
2023-01-20 17:59       ` Will Deacon
2023-01-20 18:10         ` Waiman Long
2023-01-26 12:52     ` Linux kernel regression tracking (#adding)
2023-02-10 17:15       ` Linux kernel regression tracking (#update)
2023-01-27 18:36     ` Peter Zijlstra
2023-01-27 19:09       ` Waiman Long
2022-09-22 18:00 ` [PATCH v10 3/5] sched: Enforce user requested affinity Waiman Long
2022-10-07 10:01   ` Peter Zijlstra
2022-10-07 14:57     ` Waiman Long
2022-10-07 15:23       ` Waiman Long
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Waiman Long
2022-09-22 18:00 ` [PATCH v10 4/5] sched: Handle set_cpus_allowed_ptr(), sched_setaffinity() & other races Waiman Long
2022-10-07 12:47   ` Peter Zijlstra
2022-10-07 18:59     ` Waiman Long
2022-10-28  6:42   ` [tip: sched/core] sched: Introduce affinity_context tip-bot2 for Waiman Long
2022-09-22 18:00 ` [PATCH v10 5/5] sched: Always clear user_cpus_ptr in do_set_cpus_allowed() Waiman Long
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Waiman Long
2022-10-06 21:31 ` [PATCH v10 0/5] sched: Persistent user requested affinity Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230117160825.GA17756@willie-the-truck \
    --to=will@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jiangshanlai@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qperret@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).