linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qais Yousef <qais.yousef@arm.com>
To: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Quentin Perret <qperret@google.com>, Tejun Heo <tj@kernel.org>,
	Li Zefan <lizefan@huawei.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	kernel-team@android.com
Subject: Re: [PATCH v4 09/14] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
Date: Mon, 30 Nov 2020 17:05:31 +0000	[thread overview]
Message-ID: <20201130170531.qo67rai5lftskmk2@e107158-lin.cambridge.arm.com> (raw)
In-Reply-To: <20201127133245.4hbx65mo3zinawvo@e107158-lin.cambridge.arm.com>

On 11/27/20 13:32, Qais Yousef wrote:
> On 11/24/20 15:50, Will Deacon wrote:
> > If the scheduler cannot find an allowed CPU for a task,
> > cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask
> > if cgroup v1 is in use.
> > 
> > In preparation for allowing architectures to provide their own fallback
> > mask, just return early if we're not using cgroup v2 and allow
> > select_fallback_rq() to figure out the mask by itself.
> 
> What about cpuset_attach()? When a task attaches to a new group its affinity
> could be reset to possible_cpu_mask if it moves to top_cpuset or the
> intersection of effective_cpus & cpu_online_mask.
> 
> Probably handled with later patches.
> 
> /me continue to look at the rest
> 
> Okay so in patch 11 we make set_cpus_allowed_ptr() fail. cpuset_attach will
> just do WARN_ON_ONCE() and carry on.
> 
> I think we can fix that by making sure cpuset_can_attach() will fail. Which can
> be done by fixing task_can_attach() to take into account the new arch
> task_cpu_possible_mask()?

I ran some experiments and indeed we have some problems here :(

I create 3 cpusets: 64bit, 32bit and mix. As the name indicates, 64bit contains
all 64bit-only cpus, 32bit contains 32bit-capable ones and mix has a mixture of
both.

If I try to move my test binary to 64bit cpuset, it moves there and I see the
WARN_ON_ONCE() triggered. The task has attached to the new cpuset but
set_allowed_cpus_ptr() has failed and we end up with whatever affinity we had
previously. Breaking cpusets effectively.

This can be easily fixable with my suggestion to educate task_can_attach() to
take into account the new arch_task_cpu_cpu_possible_mask(). The attachment
will fail and userspace will get an error then.

But this introduces another issue..

If I use cpumask_subset() in task_can_attach(), then an attempt to move to
'mix' cpuset will fail. In Android foreground cpuset usually contains a mixture
of cpus. So we want this to work.

Also 'background' tasks are usually bound to littles. If the littles are 64bit
only, then user space must ensure to add a 32bit capable cpu to the list. If we
can't attach to mixed cpuset, then this workaround won't work and Android will
have to create different cpusets for 32bit and 64bit apps. Which would be
difficult since 64bit apps do embed 32bit binaries and it'd be hard for
userspace to deal with that. Unless we extend execve syscall to take a cgroup
as an arg to attach to (like clone3 syscall), then potentially we can do
something about it, I think, by 'fixing' the libc wrapper.

If I use cpumask_intersects() to validate the attachment, then the 64bit
attachment will fail but the 'mix' one will succeed, as desired. BUT we'll
re-introduce the WARN_ON_ONCE() again since __set_cpus_allowed_ptr_locked()
validates against arch_task_cpu_possible_mask() with cpumask_subset() :-/
ie: cpuset_attach()::set_cpus_allowed_ptr() will fail with just a warning
printed once when attaching to 'mix'.

We can fix this by making __set_cpus_allowed_ptr_locked() perform cpumask_and()
like restrict_cpus_allowed_ptr() does.

Though this will not only truncate cpuset mask with the intersection of
arch_task_cpu_possible_mask(), but also truncate it for sched_setaffinity()
syscall. Something to keep in mind.

Finally there's the ugly problem that I'm not sure how to address of modifying
the cpuset.cpus of, for example 'mix' cpuset, such that we end up with 64bit
only cpus. If the 32bit task has already attached, then we'll end up with
a broken cpuset with a task attached having 'invalid' cpumask.

We can teach cpuset.c:update_cpumask()::validate_change() to do something about
it. By either walking the attached tasks and validating the new mask against
the arch one. Or by storing the arch mask in struct cpuset. Though for the
latter if we want to be truly generic, we need to ensure we handle the case of
2 tasks having different arch_task_cpu_possible_mask().

Long email. Hopefully it makes sense!

I can share my test script, binary/code and fixup patches if you like.

Thanks

--
Qais Yousef

  reply	other threads:[~2020-11-30 17:06 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-24 15:50 [PATCH v4 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
2020-11-24 15:50 ` [PATCH v4 01/14] arm64: cpuinfo: Split AArch32 registers out into a separate struct Will Deacon
2020-11-24 15:50 ` [PATCH v4 02/14] arm64: Allow mismatched 32-bit EL0 support Will Deacon
2020-11-27 10:25   ` Marc Zyngier
2020-11-27 11:50     ` Will Deacon
2020-11-27 13:09   ` Qais Yousef
2020-12-01 16:56     ` Will Deacon
2020-12-02 13:16       ` Qais Yousef
2020-11-24 15:50 ` [PATCH v4 03/14] KVM: arm64: Kill 32-bit vCPUs on systems with mismatched " Will Deacon
2020-11-27 10:26   ` Marc Zyngier
2020-11-27 11:53     ` Will Deacon
2020-11-27 17:14       ` Marc Zyngier
2020-11-27 17:24         ` Quentin Perret
2020-11-27 18:16           ` Marc Zyngier
2020-12-01 16:57             ` Will Deacon
2020-12-02  8:18               ` Marc Zyngier
2020-12-02 17:27                 ` Will Deacon
2020-11-24 15:50 ` [PATCH v4 04/14] arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs Will Deacon
2020-11-27 13:12   ` Qais Yousef
2020-12-01 16:56     ` Will Deacon
2020-12-02 13:52       ` Qais Yousef
2020-12-02 17:42         ` Will Deacon
2020-11-24 15:50 ` [PATCH v4 05/14] arm64: Advertise CPUs capable of running 32-bit applications in sysfs Will Deacon
2020-11-24 15:50 ` [PATCH v4 06/14] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 Will Deacon
2020-11-27 13:17   ` Qais Yousef
2020-12-01 16:56     ` Will Deacon
2020-11-24 15:50 ` [PATCH v4 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity Will Deacon
2020-11-27  9:49   ` Quentin Perret
2020-11-27 13:19   ` Qais Yousef
2020-12-01 16:56     ` Will Deacon
2020-12-02 13:06       ` Qais Yousef
2020-11-24 15:50 ` [PATCH v4 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
2020-11-27 10:01   ` Quentin Perret
2020-11-27 13:23   ` Qais Yousef
2020-12-01 16:55     ` Will Deacon
2020-12-02 14:07       ` Qais Yousef
2020-11-24 15:50 ` [PATCH v4 09/14] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
2020-11-27 13:32   ` Qais Yousef
2020-11-30 17:05     ` Qais Yousef [this message]
2020-11-30 17:36       ` Quentin Perret
2020-12-01 11:58         ` Qais Yousef
2020-12-01 12:37           ` Quentin Perret
2020-12-01 14:11             ` Qais Yousef
2020-12-01 15:56               ` Quentin Perret
2020-12-01 22:30                 ` Will Deacon
2020-12-02 11:34                   ` Qais Yousef
2020-12-02 11:33                 ` Qais Yousef
2020-11-24 15:50 ` [PATCH v4 10/14] sched: Introduce arch_task_cpu_possible_mask() to limit fallback rq selection Will Deacon
2020-11-24 15:50 ` [PATCH v4 11/14] sched: Reject CPU affinity changes based on arch_task_cpu_possible_mask() Will Deacon
2020-11-27  9:54   ` Quentin Perret
2020-11-24 15:50 ` [PATCH v4 12/14] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system Will Deacon
2020-11-27 13:41   ` Qais Yousef
2020-12-01 22:13     ` Will Deacon
2020-12-02 12:59       ` Qais Yousef
2020-12-02 17:42         ` Will Deacon
2020-12-02 18:08           ` Qais Yousef
2020-11-24 15:50 ` [PATCH v4 13/14] arm64: Implement arch_task_cpu_possible_mask() Will Deacon
2020-11-27 13:41   ` Qais Yousef
2020-11-24 15:50 ` [PATCH v4 14/14] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores Will Deacon
2020-11-27 13:58 ` [PATCH v4 00/14] An alternative series for asymmetric AArch32 systems Qais Yousef
2020-12-05 20:43 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201130170531.qo67rai5lftskmk2@e107158-lin.cambridge.arm.com \
    --to=qais.yousef@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@android.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=qperret@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).