linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>,
	linux-kernel@vger.kernel.org, kernel-team@android.com,
	Zefan Li <lizefan.x@bytedance.com>, Tejun Heo <tj@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	cgroups@vger.kernel.org
Subject: Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
Date: Wed, 1 Feb 2023 10:14:01 +0100	[thread overview]
Message-ID: <Y9otWX+MGOLDKU6t@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <6b068916-5e1b-a943-1aad-554964d8b746@redhat.com>

On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
> On 1/31/23 17:17, Will Deacon wrote:
> > From: Peter Zijlstra <peterz@infradead.org>
> > 
> > There is a difference in behaviour between CPUSET={y,n} that is now
> > wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
> > 
> > Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
> > user requested cpumask")  relax_compatible_cpus_allowed_ptr() is
> > calling __sched_setaffinity() unconditionally.
> > 
> > But the underlying problem goes back a lot further, possibly to
> > commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
> > switched cpuset_cpus_allowed() from cs->cpus_allowed to
> > cs->effective_cpus.
> > 
> > The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
> > all offline CPUs. For tasks that are part of a (!root) cpuset this is
> > then later fixed up by the cpuset hotplug notifiers that re-evaluate
> > and re-apply cs->effective_cpus, but for (normal) tasks in the root
> > cpuset this does not happen and they will forever after be excluded
> > from CPUs onlined later.
> > 
> > As such, rewrite cpuset_cpus_allowed() to return a wider mask,
> > including the offline CPUs.
> > 
> > Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
> > Reported-by: Will Deacon <will@kernel.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
> > Signed-off-by: Will Deacon <will@kernel.org>
> 
> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
> tracked online cpus and ignored the offline ones. It behaves more like
> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed and
> effective_cpus. When cpuset v1 is mounted, cpus_allowed and effective_cpus
> are effectively the same and track online cpus. With cpuset v2, cpus_allowed
> contains what the user has written into and it won't be changed until
> another write happen. However, what the user written may not be what the
> system can give it and effective_cpus is what the system decides a cpuset
> can use.
> 
> Cpuset v2 is able to handle hotplug correctly and update the task's cpumask
> accordingly. So missing previously offline cpus won't happen with v2.
> 
> Since v1 keeps the old behavior, previously offlined cpus are lost in the
> cpuset's cpus_allowed. However tasks in the root cpuset will still be fine
> with cpu hotplug as its cpus_allowed should track cpu_online_mask. IOW, only
> tasks in a non-root cpuset suffer this problem.
> 
> It was a known issue in v1 and I believe is one of the major reasons of the
> cpuset v2 redesign.
> 
> A major concern I have is the overhead of creating a poor man version of v2
> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
> handling. Alternatively we may be able to provide a config option to make
> this the default for v1 without the special mount option, if necessary.

You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT*
mask offline cpus for root cgroup tasks, ever. (And the only reason it
gets away with masking offline for !root is that it re-applies the mask
every time it changes.)

Yes it did that for a fair while -- but it is wrong and broken and a
very big behavioural difference between CONFIG_CPUSET={y,n}. This must
not be.

Arguably cpuset-v2 is still wrong for masking offline cpus in it's
effective_cpus mask, but I really didn't want to go rewrite cpuset.c for
something that needs to go into /urgent *now*.

Hence this minimal patch that at least lets sched_setaffinity() work as
intended.



  reply	other threads:[~2023-02-01  9:14 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-31 22:17 [PATCH 0/2] Fix broken cpuset affinity handling on heterogeneous systems Will Deacon
2023-01-31 22:17 ` [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs Will Deacon
2023-02-01  4:14   ` Waiman Long
2023-02-01  9:14     ` Peter Zijlstra [this message]
2023-02-01 15:16       ` Waiman Long
2023-02-01 18:46         ` Waiman Long
2023-02-01 19:14           ` Waiman Long
2023-02-01 19:17             ` Waiman Long
2023-02-01 21:10           ` Peter Zijlstra
2023-02-02  3:34             ` Waiman Long
2023-02-03 11:50               ` Will Deacon
2023-02-03 15:13                 ` Waiman Long
2023-02-03 15:26                   ` Peter Zijlstra
2023-02-03 15:35                     ` Waiman Long
2023-02-02  8:34     ` Peter Zijlstra
2023-02-02 16:06       ` Waiman Long
2023-02-02 19:42         ` Peter Zijlstra
2023-02-02 20:46           ` Waiman Long
2023-02-02 20:48             ` Tejun Heo
2023-02-02 20:53               ` Waiman Long
2023-02-02 21:05                 ` Waiman Long
2023-02-02 21:50                   ` Tejun Heo
2023-02-03  0:54                     ` Waiman Long
2023-02-03 16:31                     ` Will Deacon
2023-01-31 22:17 ` [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task Will Deacon
2023-02-01  2:22   ` Waiman Long
2023-02-01  9:15     ` Peter Zijlstra
2023-02-01 15:03       ` Waiman Long
2023-02-01  9:27   ` Peter Zijlstra
2023-02-03 17:55   ` Waiman Long
2023-02-06 20:21   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9otWX+MGOLDKU6t@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=longman@redhat.com \
    --cc=tj@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).