All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Will Deacon <will@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	qperret@google.com
Subject: Re: [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity()
Date: Fri, 20 Jan 2023 13:10:38 -0500	[thread overview]
Message-ID: <f269370f-d72c-08e3-da06-3cd70cdf75f2@redhat.com> (raw)
In-Reply-To: <20230120175931.GA22417@willie-the-truck>

On 1/20/23 12:59, Will Deacon wrote:
> Hey Waiman,
>
> Cheers for the quick reply.
>
> On Tue, Jan 17, 2023 at 01:13:31PM -0500, Waiman Long wrote:
>> On 1/17/23 11:08, Will Deacon wrote:
>>> On Thu, Sep 22, 2022 at 02:00:38PM -0400, Waiman Long wrote:
>>>> The user_cpus_ptr field is added by commit b90ca8badbd1 ("sched:
>>>> Introduce task_struct::user_cpus_ptr to track requested affinity"). It
>>>> is currently used only by arm64 arch due to possible asymmetric CPU
>>>> setup. This patch extends its usage to save user provided cpumask
>>>> when sched_setaffinity() is called for all arches. With this patch
>>>> applied, user_cpus_ptr, once allocated after a successful call to
>>>> sched_setaffinity(), will only be freed when the task exits.
> [...]
>
>>> We've tracked this down as the cause of an arm64 regression in Android and I've
>>> reproduced the issue with mainline.
>>>
>>> Basically, if an arm64 system is booted with "allow_mismatched_32bit_el0" on
>>> the command-line, then the arch code will (amongst other things) call
>>> force_compatible_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
>>> when exec()'ing a 32-bit or a 64-bit task respectively.
>> IOW, relax_compatible_cpus_allowed_ptr() can be called without a previous
>> force_compatible_cpus_allowed_ptr(). Right?
> In practice, these functions are only called by arm64 during exec. As above,
> exec()'ing a 32-bit task calls force_compatible_cpus_allowed_ptr() and
> exec()'ing a 64-bit task calls relax_compatible_cpus_allowed_ptr(). So
> they don't come in pairs at all; it's just that calling relax_[...] should
> try to restore the affinity mask if it was previously clobbered by
> force_[...].
>
That was what I thought.
>> A possible optimization in this case is to add a bit flag in the task_struct
>> to indicate a previous call to force_compatible_cpus_allowed_ptr(). Without
>> that flag set, relax_compatible_cpus_allowed_ptr() can return immediately.
> How is this an optimisation over a pointer comparison?

The sched_setaffinity() patch had repurposed user_cpus_ptr as a user 
requested cpu affinity mask irrespective if 
force_compatible_cpus_allowed_ptr() has been called or not. So checking 
against user_cpus_ptr will no longer serve its purpose as an indicator 
if force_compatible_cpus_allowed_ptr() has been called or not.


>>> I've had a crack at fixing the code above to restore the old behaviour, and it
>>> seems to work for my basic tests (still pending confirmation from others):
>>>
>>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index bb1ee6d7bdde..0d4a11384648 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -3125,17 +3125,16 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx);
>>>    void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
>>>    {
>>>           struct affinity_context ac = {
>>> -               .new_mask  = task_user_cpus(p),
>>> +               .new_mask  = p->user_cpus_ptr,
>>>                   .flags     = 0,
>>>           };
>>> -       int ret;
>>>           /*
>>>            * Try to restore the old affinity mask with __sched_setaffinity().
>>>            * Cpuset masking will be done there too.
>>>            */
>>> -       ret = __sched_setaffinity(p, &ac);
>>> -       WARN_ON_ONCE(ret);
>>> +       if (ac.new_mask)
>>> +               WARN_ON_ONCE(__sched_setaffinity(p, &ac));
>>>    }
>>>    void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>>>
>>>
>>> With this change, task_user_cpus() is only used by restrict_cpus_allowed_ptr()
>>> so I'd be inclined to remove it altogether tbh.
>>>
>>> What do you think?
>> The problem here is that force_compatible_cpus_allowed_ptr() can be called
>> without a matching relax_compatible_cpus_allowed_ptr() at the end. So we may
>> end up artificially restrict the number of cpus that can be used when
>> running a 64-bit binary.
> Hmm, is this because an intervening call to sched_setaffinity() could've
> set ->user_cpus_ptr? If so, I'd have thought that would also point to a
> superset of the effective affinity -- is that not the case?
>
>> What do you think about the idea of having a bit flag to track that?
> I'm not hugely happy with that approach because it's adding additional state
> which is only needed for arm64, and only when operating in this funny
> asymmetric mode. I also don't understand how it would interact with the new
> sched_setaffinity() behaviour; would we need to clear the flag when that
> function updates the mask?
The new flag bit will be independent of sched_setaffinity() call. It is 
set when restrict_cpus_allowed_ptr() is called and cleared in 
relax_compatible_cpus_allowed_ptr() if it is set before. I will post a 
patch for your evaluation.
>
> Since I'm basically trying to re-instate the v6.1 behaviour to fix the arm64
> regression, I'm happy to review/test any proposal you have, but as we get
> closer to the 6.2 release I'm wondering whether it would make more sense to
> revert the sched_setaffinity() changes for now and I can help you with arm64
> review and testing if we bring the changes back for e.g. 6.4.

The purpose of the bit flag is to reinstate 6.1 behavior.

Cheers,
Longman

>
> Will
>


  reply	other threads:[~2023-01-20 18:11 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22 18:00 [PATCH v10 0/5] sched: Persistent user requested affinity Waiman Long
2022-09-22 18:00 ` [PATCH v10 1/5] sched: Add __releases annotations to affine_move_task() Waiman Long
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Waiman Long
2022-09-22 18:00 ` [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity() Waiman Long
2022-10-28  6:42   ` [tip: sched/core] sched: Always preserve the user requested cpumask tip-bot2 for Waiman Long
2023-01-17 16:08   ` [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity() Will Deacon
2023-01-17 18:13     ` Waiman Long
2023-01-20 17:59       ` Will Deacon
2023-01-20 18:10         ` Waiman Long [this message]
2023-01-26 12:52     ` Linux kernel regression tracking (#adding)
2023-02-10 17:15       ` Linux kernel regression tracking (#update)
2023-01-27 18:36     ` Peter Zijlstra
2023-01-27 19:09       ` Waiman Long
2022-09-22 18:00 ` [PATCH v10 3/5] sched: Enforce user requested affinity Waiman Long
2022-10-07 10:01   ` Peter Zijlstra
2022-10-07 14:57     ` Waiman Long
2022-10-07 15:23       ` Waiman Long
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Waiman Long
2022-09-22 18:00 ` [PATCH v10 4/5] sched: Handle set_cpus_allowed_ptr(), sched_setaffinity() & other races Waiman Long
2022-10-07 12:47   ` Peter Zijlstra
2022-10-07 18:59     ` Waiman Long
2022-10-28  6:42   ` [tip: sched/core] sched: Introduce affinity_context tip-bot2 for Waiman Long
2022-09-22 18:00 ` [PATCH v10 5/5] sched: Always clear user_cpus_ptr in do_set_cpus_allowed() Waiman Long
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Waiman Long
2022-10-06 21:31 ` [PATCH v10 0/5] sched: Persistent user requested affinity Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f269370f-d72c-08e3-da06-3cd70cdf75f2@redhat.com \
    --to=longman@redhat.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jiangshanlai@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qperret@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.