All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Waiman Long <longman@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	qperret@google.com
Subject: Re: [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity()
Date: Tue, 17 Jan 2023 16:08:26 +0000	[thread overview]
Message-ID: <20230117160825.GA17756@willie-the-truck> (raw)
In-Reply-To: <20220922180041.1768141-3-longman@redhat.com>

Hi Waiman,

On Thu, Sep 22, 2022 at 02:00:38PM -0400, Waiman Long wrote:
> The user_cpus_ptr field is added by commit b90ca8badbd1 ("sched:
> Introduce task_struct::user_cpus_ptr to track requested affinity"). It
> is currently used only by arm64 arch due to possible asymmetric CPU
> setup. This patch extends its usage to save user provided cpumask
> when sched_setaffinity() is called for all arches. With this patch
> applied, user_cpus_ptr, once allocated after a successful call to
> sched_setaffinity(), will only be freed when the task exits.
> 
> Since user_cpus_ptr is supposed to be used for "requested
> affinity", there is actually no point to save current cpu affinity in
> restrict_cpus_allowed_ptr() if sched_setaffinity() has never been called.
> Modify the logic to set user_cpus_ptr only in sched_setaffinity() and use
> it in restrict_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
> if defined but not changing it.
> 
> This will be some changes in behavior for arm64 systems with asymmetric
> CPUs in some corner cases. For instance, if sched_setaffinity()
> has never been called and there is a cpuset change before
> relax_compatible_cpus_allowed_ptr() is called, its subsequent call will
> follow what the cpuset allows but not what the previous cpu affinity
> setting allows.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/sched/core.c  | 82 ++++++++++++++++++++------------------------
>  kernel/sched/sched.h |  7 ++++
>  2 files changed, 44 insertions(+), 45 deletions(-)

We've tracked this down as the cause of an arm64 regression in Android and I've
reproduced the issue with mainline.

Basically, if an arm64 system is booted with "allow_mismatched_32bit_el0" on
the command-line, then the arch code will (amongst other things) call
force_compatible_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
when exec()'ing a 32-bit or a 64-bit task respectively.

If you consider a system where everything is 64-bit but the cmdline option
above is present, then the call to relax_compatible_cpus_allowed_ptr() isn't
expected to do anything in this case, and the old code made sure of that:

> @@ -3055,30 +3032,21 @@ __sched_setaffinity(struct task_struct *p, const struct cpumask *mask);
>  
>  /*
>   * Restore the affinity of a task @p which was previously restricted by a
> - * call to force_compatible_cpus_allowed_ptr(). This will clear (and free)
> - * @p->user_cpus_ptr.
> + * call to force_compatible_cpus_allowed_ptr().
>   *
>   * It is the caller's responsibility to serialise this with any calls to
>   * force_compatible_cpus_allowed_ptr(@p).
>   */
>  void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
>  {
> -	struct cpumask *user_mask = p->user_cpus_ptr;
> -	unsigned long flags;
> +	int ret;
>  
>  	/*
> -	 * Try to restore the old affinity mask. If this fails, then
> -	 * we free the mask explicitly to avoid it being inherited across
> -	 * a subsequent fork().
> +	 * Try to restore the old affinity mask with __sched_setaffinity().
> +	 * Cpuset masking will be done there too.
>  	 */
> -	if (!user_mask || !__sched_setaffinity(p, user_mask))
> -		return;

... since it returned early here if '!user_mask' ...

> -
> -	raw_spin_lock_irqsave(&p->pi_lock, flags);
> -	user_mask = clear_user_cpus_ptr(p);
> -	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> -
> -	kfree(user_mask);
> +	ret = __sched_setaffinity(p, task_user_cpus(p));
> +	WARN_ON_ONCE(ret);

... however, now we end up going down into __sched_setaffinity() with
task_user_cpus(p) giving us the 'cpu_possible_mask'! This can lead to a mixture
of WARN_ON()s and incorrect affinity masks (for example, a newly exec'd task
ends up with the affinity mask of the online CPUs at the point of exec() and is
unable to run on anything onlined later).

I've had a crack at fixing the code above to restore the old behaviour, and it
seems to work for my basic tests (still pending confirmation from others):


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bb1ee6d7bdde..0d4a11384648 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3125,17 +3125,16 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx);
 void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
 {
        struct affinity_context ac = {
-               .new_mask  = task_user_cpus(p),
+               .new_mask  = p->user_cpus_ptr,
                .flags     = 0,
        };
-       int ret;
 
        /*
         * Try to restore the old affinity mask with __sched_setaffinity().
         * Cpuset masking will be done there too.
         */
-       ret = __sched_setaffinity(p, &ac);
-       WARN_ON_ONCE(ret);
+       if (ac.new_mask)
+               WARN_ON_ONCE(__sched_setaffinity(p, &ac));
 }
 
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)


With this change, task_user_cpus() is only used by restrict_cpus_allowed_ptr()
so I'd be inclined to remove it altogether tbh.

What do you think?

Will

  parent reply	other threads:[~2023-01-17 16:08 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22 18:00 [PATCH v10 0/5] sched: Persistent user requested affinity Waiman Long
2022-09-22 18:00 ` [PATCH v10 1/5] sched: Add __releases annotations to affine_move_task() Waiman Long
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Waiman Long
2022-09-22 18:00 ` [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity() Waiman Long
2022-10-28  6:42   ` [tip: sched/core] sched: Always preserve the user requested cpumask tip-bot2 for Waiman Long
2023-01-17 16:08   ` Will Deacon [this message]
2023-01-17 18:13     ` [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity() Waiman Long
2023-01-20 17:59       ` Will Deacon
2023-01-20 18:10         ` Waiman Long
2023-01-26 12:52     ` Linux kernel regression tracking (#adding)
2023-02-10 17:15       ` Linux kernel regression tracking (#update)
2023-01-27 18:36     ` Peter Zijlstra
2023-01-27 19:09       ` Waiman Long
2022-09-22 18:00 ` [PATCH v10 3/5] sched: Enforce user requested affinity Waiman Long
2022-10-07 10:01   ` Peter Zijlstra
2022-10-07 14:57     ` Waiman Long
2022-10-07 15:23       ` Waiman Long
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Waiman Long
2022-09-22 18:00 ` [PATCH v10 4/5] sched: Handle set_cpus_allowed_ptr(), sched_setaffinity() & other races Waiman Long
2022-10-07 12:47   ` Peter Zijlstra
2022-10-07 18:59     ` Waiman Long
2022-10-28  6:42   ` [tip: sched/core] sched: Introduce affinity_context tip-bot2 for Waiman Long
2022-09-22 18:00 ` [PATCH v10 5/5] sched: Always clear user_cpus_ptr in do_set_cpus_allowed() Waiman Long
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Waiman Long
2022-10-06 21:31 ` [PATCH v10 0/5] sched: Persistent user requested affinity Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230117160825.GA17756@willie-the-truck \
    --to=will@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jiangshanlai@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qperret@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.