linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems
@ 2020-12-08 13:28 Will Deacon
  2020-12-08 13:28 ` [PATCH v5 01/15] arm64: cpuinfo: Split AArch32 registers out into a separate struct Will Deacon
                   ` (16 more replies)
  0 siblings, 17 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Hi all,

Christmas has come early: it's time for version five of these patches
which have previously appeared here:

  v1: https://lore.kernel.org/r/20201027215118.27003-1-will@kernel.org
  v2: https://lore.kernel.org/r/20201109213023.15092-1-will@kernel.org
  v3: https://lore.kernel.org/r/20201113093720.21106-1-will@kernel.org
  v4: https://lore.kernel.org/r/20201124155039.13804-1-will@kernel.org

and which started life as a reimplementation of some patches from Qais:

  https://lore.kernel.org/r/20201021104611.2744565-1-qais.yousef@arm.com

There's also now a nice writeup on LWN:

  https://lwn.net/Articles/838339/

and rumours of a feature film are doing the rounds.

[subscriber-only, but if you're reading this then you should really
 subscribe.]

The aim of this series is to allow 32-bit ARM applications to run on
arm64 SoCs where not all of the CPUs support the 32-bit instruction set.
Unfortunately, such SoCs are real and will continue to be productised
over the next few years at least. I can assure you that I'm not just
doing this for fun.

Changes in v5 include:

  * Teach cpuset_cpus_allowed() about task_cpu_possible_mask() so that
    we can avoid returning incompatible CPUs for a given task. This
    means that sched_setaffinity() can be used with larger masks (like
    the online mask) from userspace and also allows us to take into
    account the cpuset hierarchy when forcefully overriding the affinity
    for a task on execve().

  * Honour task_cpu_possible_mask() when attaching a task to a cpuset,
    so that the resulting affinity mask does not contain any incompatible
    CPUs (since it would be rejected by set_cpus_allowed_ptr() otherwise).

  * Moved overriding of the affinity mask into the scheduler core rather
    than munge affinity masks directly in the architecture backend.

  * Extended comments and documentation.

  * Some renaming and cosmetic changes.

I'm pretty happy with this now, although it still needs review and will
require rebasing to play nicely with the SCA changes in -next.

Cheers,

Will

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Qais Yousef <qais.yousef@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: kernel-team@android.com

--->8


Will Deacon (15):
  arm64: cpuinfo: Split AArch32 registers out into a separate struct
  arm64: Allow mismatched 32-bit EL0 support
  KVM: arm64: Kill 32-bit vCPUs on systems with mismatched EL0 support
  arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs
  arm64: Advertise CPUs capable of running 32-bit applications in sysfs
  sched: Introduce task_cpu_possible_mask() to limit fallback rq
    selection
  cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()
  sched: Reject CPU affinity changes based on task_cpu_possible_mask()
  sched: Introduce force_compatible_cpus_allowed_ptr() to limit CPU
    affinity
  arm64: Implement task_cpu_possible_mask()
  arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit
    EL0
  arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched
    system
  arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0
  arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores

 .../ABI/testing/sysfs-devices-system-cpu      |   9 +
 .../admin-guide/kernel-parameters.txt         |   8 +
 arch/arm64/include/asm/cpu.h                  |  44 ++--
 arch/arm64/include/asm/cpucaps.h              |   3 +-
 arch/arm64/include/asm/cpufeature.h           |   8 +-
 arch/arm64/include/asm/mmu_context.h          |  13 ++
 arch/arm64/kernel/cpufeature.c                | 219 ++++++++++++++----
 arch/arm64/kernel/cpuinfo.c                   |  53 +++--
 arch/arm64/kernel/process.c                   |  19 +-
 arch/arm64/kvm/arm.c                          |  11 +-
 include/linux/cpuset.h                        |   3 +-
 include/linux/mmu_context.h                   |   8 +
 include/linux/sched.h                         |   1 +
 kernel/cgroup/cpuset.c                        |  39 ++--
 kernel/sched/core.c                           | 112 +++++++--
 15 files changed, 426 insertions(+), 124 deletions(-)

-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v5 01/15] arm64: cpuinfo: Split AArch32 registers out into a separate struct
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-08 13:28 ` [PATCH v5 02/15] arm64: Allow mismatched 32-bit EL0 support Will Deacon
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

In preparation for late initialisation of the "sanitised" AArch32 register
state, move the AArch32 registers out of 'struct cpuinfo' and into their
own struct definition.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/cpu.h   | 44 +++++++++++----------
 arch/arm64/kernel/cpufeature.c | 71 ++++++++++++++++++----------------
 arch/arm64/kernel/cpuinfo.c    | 53 +++++++++++++------------
 3 files changed, 89 insertions(+), 79 deletions(-)

diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index 7faae6ff3ab4..f4e01aa0f442 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -12,26 +12,7 @@
 /*
  * Records attributes of an individual CPU.
  */
-struct cpuinfo_arm64 {
-	struct cpu	cpu;
-	struct kobject	kobj;
-	u32		reg_ctr;
-	u32		reg_cntfrq;
-	u32		reg_dczid;
-	u32		reg_midr;
-	u32		reg_revidr;
-
-	u64		reg_id_aa64dfr0;
-	u64		reg_id_aa64dfr1;
-	u64		reg_id_aa64isar0;
-	u64		reg_id_aa64isar1;
-	u64		reg_id_aa64mmfr0;
-	u64		reg_id_aa64mmfr1;
-	u64		reg_id_aa64mmfr2;
-	u64		reg_id_aa64pfr0;
-	u64		reg_id_aa64pfr1;
-	u64		reg_id_aa64zfr0;
-
+struct cpuinfo_32bit {
 	u32		reg_id_dfr0;
 	u32		reg_id_dfr1;
 	u32		reg_id_isar0;
@@ -54,6 +35,29 @@ struct cpuinfo_arm64 {
 	u32		reg_mvfr0;
 	u32		reg_mvfr1;
 	u32		reg_mvfr2;
+};
+
+struct cpuinfo_arm64 {
+	struct cpu	cpu;
+	struct kobject	kobj;
+	u32		reg_ctr;
+	u32		reg_cntfrq;
+	u32		reg_dczid;
+	u32		reg_midr;
+	u32		reg_revidr;
+
+	u64		reg_id_aa64dfr0;
+	u64		reg_id_aa64dfr1;
+	u64		reg_id_aa64isar0;
+	u64		reg_id_aa64isar1;
+	u64		reg_id_aa64mmfr0;
+	u64		reg_id_aa64mmfr1;
+	u64		reg_id_aa64mmfr2;
+	u64		reg_id_aa64pfr0;
+	u64		reg_id_aa64pfr1;
+	u64		reg_id_aa64zfr0;
+
+	struct cpuinfo_32bit	aarch32;
 
 	/* pseudo-ZCR for recording maximum ZCR_EL1 LEN value: */
 	u64		reg_zcr;
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6f36c4f62f69..5009dc5b1e85 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -819,6 +819,31 @@ static void __init init_cpu_hwcaps_indirect_list(void)
 
 static void __init setup_boot_cpu_capabilities(void);
 
+static void __init init_32bit_cpu_features(struct cpuinfo_32bit *info)
+{
+	init_cpu_ftr_reg(SYS_ID_DFR0_EL1, info->reg_id_dfr0);
+	init_cpu_ftr_reg(SYS_ID_DFR1_EL1, info->reg_id_dfr1);
+	init_cpu_ftr_reg(SYS_ID_ISAR0_EL1, info->reg_id_isar0);
+	init_cpu_ftr_reg(SYS_ID_ISAR1_EL1, info->reg_id_isar1);
+	init_cpu_ftr_reg(SYS_ID_ISAR2_EL1, info->reg_id_isar2);
+	init_cpu_ftr_reg(SYS_ID_ISAR3_EL1, info->reg_id_isar3);
+	init_cpu_ftr_reg(SYS_ID_ISAR4_EL1, info->reg_id_isar4);
+	init_cpu_ftr_reg(SYS_ID_ISAR5_EL1, info->reg_id_isar5);
+	init_cpu_ftr_reg(SYS_ID_ISAR6_EL1, info->reg_id_isar6);
+	init_cpu_ftr_reg(SYS_ID_MMFR0_EL1, info->reg_id_mmfr0);
+	init_cpu_ftr_reg(SYS_ID_MMFR1_EL1, info->reg_id_mmfr1);
+	init_cpu_ftr_reg(SYS_ID_MMFR2_EL1, info->reg_id_mmfr2);
+	init_cpu_ftr_reg(SYS_ID_MMFR3_EL1, info->reg_id_mmfr3);
+	init_cpu_ftr_reg(SYS_ID_MMFR4_EL1, info->reg_id_mmfr4);
+	init_cpu_ftr_reg(SYS_ID_MMFR5_EL1, info->reg_id_mmfr5);
+	init_cpu_ftr_reg(SYS_ID_PFR0_EL1, info->reg_id_pfr0);
+	init_cpu_ftr_reg(SYS_ID_PFR1_EL1, info->reg_id_pfr1);
+	init_cpu_ftr_reg(SYS_ID_PFR2_EL1, info->reg_id_pfr2);
+	init_cpu_ftr_reg(SYS_MVFR0_EL1, info->reg_mvfr0);
+	init_cpu_ftr_reg(SYS_MVFR1_EL1, info->reg_mvfr1);
+	init_cpu_ftr_reg(SYS_MVFR2_EL1, info->reg_mvfr2);
+}
+
 void __init init_cpu_features(struct cpuinfo_arm64 *info)
 {
 	/* Before we start using the tables, make sure it is sorted */
@@ -838,29 +863,8 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
 	init_cpu_ftr_reg(SYS_ID_AA64PFR1_EL1, info->reg_id_aa64pfr1);
 	init_cpu_ftr_reg(SYS_ID_AA64ZFR0_EL1, info->reg_id_aa64zfr0);
 
-	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) {
-		init_cpu_ftr_reg(SYS_ID_DFR0_EL1, info->reg_id_dfr0);
-		init_cpu_ftr_reg(SYS_ID_DFR1_EL1, info->reg_id_dfr1);
-		init_cpu_ftr_reg(SYS_ID_ISAR0_EL1, info->reg_id_isar0);
-		init_cpu_ftr_reg(SYS_ID_ISAR1_EL1, info->reg_id_isar1);
-		init_cpu_ftr_reg(SYS_ID_ISAR2_EL1, info->reg_id_isar2);
-		init_cpu_ftr_reg(SYS_ID_ISAR3_EL1, info->reg_id_isar3);
-		init_cpu_ftr_reg(SYS_ID_ISAR4_EL1, info->reg_id_isar4);
-		init_cpu_ftr_reg(SYS_ID_ISAR5_EL1, info->reg_id_isar5);
-		init_cpu_ftr_reg(SYS_ID_ISAR6_EL1, info->reg_id_isar6);
-		init_cpu_ftr_reg(SYS_ID_MMFR0_EL1, info->reg_id_mmfr0);
-		init_cpu_ftr_reg(SYS_ID_MMFR1_EL1, info->reg_id_mmfr1);
-		init_cpu_ftr_reg(SYS_ID_MMFR2_EL1, info->reg_id_mmfr2);
-		init_cpu_ftr_reg(SYS_ID_MMFR3_EL1, info->reg_id_mmfr3);
-		init_cpu_ftr_reg(SYS_ID_MMFR4_EL1, info->reg_id_mmfr4);
-		init_cpu_ftr_reg(SYS_ID_MMFR5_EL1, info->reg_id_mmfr5);
-		init_cpu_ftr_reg(SYS_ID_PFR0_EL1, info->reg_id_pfr0);
-		init_cpu_ftr_reg(SYS_ID_PFR1_EL1, info->reg_id_pfr1);
-		init_cpu_ftr_reg(SYS_ID_PFR2_EL1, info->reg_id_pfr2);
-		init_cpu_ftr_reg(SYS_MVFR0_EL1, info->reg_mvfr0);
-		init_cpu_ftr_reg(SYS_MVFR1_EL1, info->reg_mvfr1);
-		init_cpu_ftr_reg(SYS_MVFR2_EL1, info->reg_mvfr2);
-	}
+	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0))
+		init_32bit_cpu_features(&info->aarch32);
 
 	if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) {
 		init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
@@ -931,20 +935,12 @@ static void relax_cpu_ftr_reg(u32 sys_id, int field)
 	WARN_ON(!ftrp->width);
 }
 
-static int update_32bit_cpu_features(int cpu, struct cpuinfo_arm64 *info,
-				     struct cpuinfo_arm64 *boot)
+static int update_32bit_cpu_features(int cpu, struct cpuinfo_32bit *info,
+				     struct cpuinfo_32bit *boot)
 {
 	int taint = 0;
 	u64 pfr0 = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
 
-	/*
-	 * If we don't have AArch32 at all then skip the checks entirely
-	 * as the register values may be UNKNOWN and we're not going to be
-	 * using them for anything.
-	 */
-	if (!id_aa64pfr0_32bit_el0(pfr0))
-		return taint;
-
 	/*
 	 * If we don't have AArch32 at EL1, then relax the strictness of
 	 * EL1-dependent register fields to avoid spurious sanity check fails.
@@ -1091,10 +1087,17 @@ void update_cpu_features(int cpu,
 	}
 
 	/*
+	 * If we don't have AArch32 at all then skip the checks entirely
+	 * as the register values may be UNKNOWN and we're not going to be
+	 * using them for anything.
+	 *
 	 * This relies on a sanitised view of the AArch64 ID registers
 	 * (e.g. SYS_ID_AA64PFR0_EL1), so we call it last.
 	 */
-	taint |= update_32bit_cpu_features(cpu, info, boot);
+	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) {
+		taint |= update_32bit_cpu_features(cpu, &info->aarch32,
+						   &boot->aarch32);
+	}
 
 	/*
 	 * Mismatched CPU features are a recipe for disaster. Don't even
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 77605aec25fe..8ce33742ad6a 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -344,6 +344,32 @@ static void cpuinfo_detect_icache_policy(struct cpuinfo_arm64 *info)
 	pr_info("Detected %s I-cache on CPU%d\n", icache_policy_str[l1ip], cpu);
 }
 
+static void __cpuinfo_store_cpu_32bit(struct cpuinfo_32bit *info)
+{
+	info->reg_id_dfr0 = read_cpuid(ID_DFR0_EL1);
+	info->reg_id_dfr1 = read_cpuid(ID_DFR1_EL1);
+	info->reg_id_isar0 = read_cpuid(ID_ISAR0_EL1);
+	info->reg_id_isar1 = read_cpuid(ID_ISAR1_EL1);
+	info->reg_id_isar2 = read_cpuid(ID_ISAR2_EL1);
+	info->reg_id_isar3 = read_cpuid(ID_ISAR3_EL1);
+	info->reg_id_isar4 = read_cpuid(ID_ISAR4_EL1);
+	info->reg_id_isar5 = read_cpuid(ID_ISAR5_EL1);
+	info->reg_id_isar6 = read_cpuid(ID_ISAR6_EL1);
+	info->reg_id_mmfr0 = read_cpuid(ID_MMFR0_EL1);
+	info->reg_id_mmfr1 = read_cpuid(ID_MMFR1_EL1);
+	info->reg_id_mmfr2 = read_cpuid(ID_MMFR2_EL1);
+	info->reg_id_mmfr3 = read_cpuid(ID_MMFR3_EL1);
+	info->reg_id_mmfr4 = read_cpuid(ID_MMFR4_EL1);
+	info->reg_id_mmfr5 = read_cpuid(ID_MMFR5_EL1);
+	info->reg_id_pfr0 = read_cpuid(ID_PFR0_EL1);
+	info->reg_id_pfr1 = read_cpuid(ID_PFR1_EL1);
+	info->reg_id_pfr2 = read_cpuid(ID_PFR2_EL1);
+
+	info->reg_mvfr0 = read_cpuid(MVFR0_EL1);
+	info->reg_mvfr1 = read_cpuid(MVFR1_EL1);
+	info->reg_mvfr2 = read_cpuid(MVFR2_EL1);
+}
+
 static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 {
 	info->reg_cntfrq = arch_timer_get_cntfrq();
@@ -371,31 +397,8 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 	info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1);
 	info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1);
 
-	/* Update the 32bit ID registers only if AArch32 is implemented */
-	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) {
-		info->reg_id_dfr0 = read_cpuid(ID_DFR0_EL1);
-		info->reg_id_dfr1 = read_cpuid(ID_DFR1_EL1);
-		info->reg_id_isar0 = read_cpuid(ID_ISAR0_EL1);
-		info->reg_id_isar1 = read_cpuid(ID_ISAR1_EL1);
-		info->reg_id_isar2 = read_cpuid(ID_ISAR2_EL1);
-		info->reg_id_isar3 = read_cpuid(ID_ISAR3_EL1);
-		info->reg_id_isar4 = read_cpuid(ID_ISAR4_EL1);
-		info->reg_id_isar5 = read_cpuid(ID_ISAR5_EL1);
-		info->reg_id_isar6 = read_cpuid(ID_ISAR6_EL1);
-		info->reg_id_mmfr0 = read_cpuid(ID_MMFR0_EL1);
-		info->reg_id_mmfr1 = read_cpuid(ID_MMFR1_EL1);
-		info->reg_id_mmfr2 = read_cpuid(ID_MMFR2_EL1);
-		info->reg_id_mmfr3 = read_cpuid(ID_MMFR3_EL1);
-		info->reg_id_mmfr4 = read_cpuid(ID_MMFR4_EL1);
-		info->reg_id_mmfr5 = read_cpuid(ID_MMFR5_EL1);
-		info->reg_id_pfr0 = read_cpuid(ID_PFR0_EL1);
-		info->reg_id_pfr1 = read_cpuid(ID_PFR1_EL1);
-		info->reg_id_pfr2 = read_cpuid(ID_PFR2_EL1);
-
-		info->reg_mvfr0 = read_cpuid(MVFR0_EL1);
-		info->reg_mvfr1 = read_cpuid(MVFR1_EL1);
-		info->reg_mvfr2 = read_cpuid(MVFR2_EL1);
-	}
+	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0))
+		__cpuinfo_store_cpu_32bit(&info->aarch32);
 
 	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
 	    id_aa64pfr0_sve(info->reg_id_aa64pfr0))
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 02/15] arm64: Allow mismatched 32-bit EL0 support
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
  2020-12-08 13:28 ` [PATCH v5 01/15] arm64: cpuinfo: Split AArch32 registers out into a separate struct Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-08 13:28 ` [PATCH v5 03/15] KVM: arm64: Kill 32-bit vCPUs on systems with mismatched " Will Deacon
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

When confronted with a mixture of CPUs, some of which support 32-bit
applications and others which don't, we quite sensibly treat the system
as 64-bit only for userspace and prevent execve() of 32-bit binaries.

Unfortunately, some crazy folks have decided to build systems like this
with the intention of running 32-bit applications, so relax our
sanitisation logic to continue to advertise 32-bit support to userspace
on these systems and track the real 32-bit capable cores in a cpumask
instead. For now, the default behaviour remains but will be tied to
a command-line option in a later patch.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/cpucaps.h    |   3 +-
 arch/arm64/include/asm/cpufeature.h |   8 ++-
 arch/arm64/kernel/cpufeature.c      | 106 ++++++++++++++++++++++++++--
 3 files changed, 108 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index e7d98997c09c..d689a1318741 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -20,7 +20,8 @@
 #define ARM64_ALT_PAN_NOT_UAO			10
 #define ARM64_HAS_VIRT_HOST_EXTN		11
 #define ARM64_WORKAROUND_CAVIUM_27456		12
-#define ARM64_HAS_32BIT_EL0			13
+/* Unreliable: use system_supports_32bit_el0() instead. */
+#define ARM64_HAS_32BIT_EL0_DO_NOT_USE		13
 #define ARM64_HARDEN_EL2_VECTORS		14
 #define ARM64_HAS_CNP				15
 #define ARM64_HAS_NO_FPSIMD			16
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index da250e4741bd..85eab54949a8 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -606,9 +606,15 @@ static inline bool cpu_supports_mixed_endian_el0(void)
 	return id_aa64mmfr0_mixed_endian_el0(read_cpuid(ID_AA64MMFR0_EL1));
 }
 
+const struct cpumask *system_32bit_el0_cpumask(void);
+DECLARE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0);
+
 static inline bool system_supports_32bit_el0(void)
 {
-	return cpus_have_const_cap(ARM64_HAS_32BIT_EL0);
+	u64 pfr0 = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
+
+	return static_branch_unlikely(&arm64_mismatched_32bit_el0) ||
+	       id_aa64pfr0_32bit_el0(pfr0);
 }
 
 static inline bool system_supports_4kb_granule(void)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 5009dc5b1e85..bb53af53ce8d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -104,6 +104,24 @@ DECLARE_BITMAP(boot_capabilities, ARM64_NPATCHABLE);
 bool arm64_use_ng_mappings = false;
 EXPORT_SYMBOL(arm64_use_ng_mappings);
 
+/*
+ * Permit PER_LINUX32 and execve() of 32-bit binaries even if not all CPUs
+ * support it?
+ */
+static bool __read_mostly allow_mismatched_32bit_el0;
+
+/*
+ * Static branch enabled only if allow_mismatched_32bit_el0 is set and we have
+ * seen at least one CPU capable of 32-bit EL0.
+ */
+DEFINE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0);
+
+/*
+ * Mask of CPUs supporting 32-bit EL0.
+ * Only valid if arm64_mismatched_32bit_el0 is enabled.
+ */
+static cpumask_var_t cpu_32bit_el0_mask __cpumask_var_read_mostly;
+
 /*
  * Flag to indicate if we have computed the system wide
  * capabilities based on the boot time active CPUs. This
@@ -756,7 +774,7 @@ static void __init sort_ftr_regs(void)
  * Any bits that are not covered by an arm64_ftr_bits entry are considered
  * RES0 for the system-wide value, and must strictly match.
  */
-static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new)
+static void init_cpu_ftr_reg(u32 sys_reg, u64 new)
 {
 	u64 val = 0;
 	u64 strict_mask = ~0x0ULL;
@@ -819,7 +837,7 @@ static void __init init_cpu_hwcaps_indirect_list(void)
 
 static void __init setup_boot_cpu_capabilities(void);
 
-static void __init init_32bit_cpu_features(struct cpuinfo_32bit *info)
+static void init_32bit_cpu_features(struct cpuinfo_32bit *info)
 {
 	init_cpu_ftr_reg(SYS_ID_DFR0_EL1, info->reg_id_dfr0);
 	init_cpu_ftr_reg(SYS_ID_DFR1_EL1, info->reg_id_dfr1);
@@ -935,6 +953,25 @@ static void relax_cpu_ftr_reg(u32 sys_id, int field)
 	WARN_ON(!ftrp->width);
 }
 
+static void update_compat_elf_hwcaps(void);
+
+static void update_mismatched_32bit_el0_cpu_features(struct cpuinfo_arm64 *info,
+						     struct cpuinfo_arm64 *boot)
+{
+	static bool boot_cpu_32bit_regs_overridden = false;
+
+	if (!allow_mismatched_32bit_el0 || boot_cpu_32bit_regs_overridden)
+		return;
+
+	if (id_aa64pfr0_32bit_el0(boot->reg_id_aa64pfr0))
+		return;
+
+	boot->aarch32 = info->aarch32;
+	init_32bit_cpu_features(&boot->aarch32);
+	update_compat_elf_hwcaps();
+	boot_cpu_32bit_regs_overridden = true;
+}
+
 static int update_32bit_cpu_features(int cpu, struct cpuinfo_32bit *info,
 				     struct cpuinfo_32bit *boot)
 {
@@ -1095,6 +1132,7 @@ void update_cpu_features(int cpu,
 	 * (e.g. SYS_ID_AA64PFR0_EL1), so we call it last.
 	 */
 	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) {
+		update_mismatched_32bit_el0_cpu_features(info, boot);
 		taint |= update_32bit_cpu_features(cpu, &info->aarch32,
 						   &boot->aarch32);
 	}
@@ -1196,6 +1234,55 @@ has_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
 	return feature_matches(val, entry);
 }
 
+static int enable_mismatched_32bit_el0(unsigned int cpu)
+{
+	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
+	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
+
+	if (cpu_32bit) {
+		cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
+		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
+	}
+
+	return 0;
+}
+
+static int __init init_32bit_el0_mask(void)
+{
+	if (!allow_mismatched_32bit_el0)
+		return 0;
+
+	if (!zalloc_cpumask_var(&cpu_32bit_el0_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	return cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
+				 "arm64/mismatched_32bit_el0:online",
+				 enable_mismatched_32bit_el0, NULL);
+}
+subsys_initcall_sync(init_32bit_el0_mask);
+
+const struct cpumask *system_32bit_el0_cpumask(void)
+{
+	if (!system_supports_32bit_el0())
+		return cpu_none_mask;
+
+	if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
+		return cpu_32bit_el0_mask;
+
+	return cpu_possible_mask;
+}
+
+static bool has_32bit_el0(const struct arm64_cpu_capabilities *entry, int scope)
+{
+	if (!has_cpuid_feature(entry, scope))
+		return allow_mismatched_32bit_el0;
+
+	if (scope == SCOPE_SYSTEM)
+		pr_info("detected: 32-bit EL0 Support\n");
+
+	return true;
+}
+
 static bool has_useable_gicv3_cpuif(const struct arm64_cpu_capabilities *entry, int scope)
 {
 	bool has_sre;
@@ -1805,10 +1892,9 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 	},
 #endif	/* CONFIG_ARM64_VHE */
 	{
-		.desc = "32-bit EL0 Support",
-		.capability = ARM64_HAS_32BIT_EL0,
+		.capability = ARM64_HAS_32BIT_EL0_DO_NOT_USE,
 		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
-		.matches = has_cpuid_feature,
+		.matches = has_32bit_el0,
 		.sys_reg = SYS_ID_AA64PFR0_EL1,
 		.sign = FTR_UNSIGNED,
 		.field_pos = ID_AA64PFR0_EL0_SHIFT,
@@ -2301,7 +2387,7 @@ static const struct arm64_cpu_capabilities compat_elf_hwcaps[] = {
 	{},
 };
 
-static void __init cap_set_elf_hwcap(const struct arm64_cpu_capabilities *cap)
+static void cap_set_elf_hwcap(const struct arm64_cpu_capabilities *cap)
 {
 	switch (cap->hwcap_type) {
 	case CAP_HWCAP:
@@ -2346,7 +2432,7 @@ static bool cpus_have_elf_hwcap(const struct arm64_cpu_capabilities *cap)
 	return rc;
 }
 
-static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps)
+static void setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps)
 {
 	/* We support emulation of accesses to CPU ID feature registers */
 	cpu_set_named_feature(CPUID);
@@ -2355,6 +2441,12 @@ static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps)
 			cap_set_elf_hwcap(hwcaps);
 }
 
+static void update_compat_elf_hwcaps(void)
+{
+	if (system_capabilities_finalized())
+		setup_elf_hwcaps(compat_elf_hwcaps);
+}
+
 static void update_cpu_capabilities(u16 scope_mask)
 {
 	int i;
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 03/15] KVM: arm64: Kill 32-bit vCPUs on systems with mismatched EL0 support
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
  2020-12-08 13:28 ` [PATCH v5 01/15] arm64: cpuinfo: Split AArch32 registers out into a separate struct Will Deacon
  2020-12-08 13:28 ` [PATCH v5 02/15] arm64: Allow mismatched 32-bit EL0 support Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-08 13:28 ` [PATCH v5 04/15] arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs Will Deacon
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

If a vCPU is caught running 32-bit code on a system with mismatched
support at EL0, then we should kill it.

Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/arm.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c0ffb019ca8b..e63e8feae836 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -649,6 +649,15 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 	}
 }
 
+static bool vcpu_mode_is_bad_32bit(struct kvm_vcpu *vcpu)
+{
+	if (likely(!vcpu_mode_is_32bit(vcpu)))
+		return false;
+
+	return !system_supports_32bit_el0() ||
+		static_branch_unlikely(&arm64_mismatched_32bit_el0);
+}
+
 /**
  * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
  * @vcpu:	The VCPU pointer
@@ -832,7 +841,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 * with the asymmetric AArch32 case), return to userspace with
 		 * a fatal error.
 		 */
-		if (!system_supports_32bit_el0() && vcpu_mode_is_32bit(vcpu)) {
+		if (vcpu_mode_is_bad_32bit(vcpu)) {
 			/*
 			 * As we have caught the guest red-handed, decide that
 			 * it isn't fit for purpose anymore by making the vcpu
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 04/15] arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (2 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 03/15] KVM: arm64: Kill 32-bit vCPUs on systems with mismatched " Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-08 13:28 ` [PATCH v5 05/15] arm64: Advertise CPUs capable of running 32-bit applications in sysfs Will Deacon
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Scheduling a 32-bit application on a 64-bit-only CPU is a bad idea.

Ensure that 32-bit applications always take the slow-path when returning
to userspace on a system with mismatched support at EL0, so that we can
avoid trying to run on a 64-bit-only CPU and force a SIGKILL instead.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/process.c | 19 ++++++++++++++++++-
 arch/arm64/kernel/signal.c  | 26 ++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index ed919f633ed8..9a2532d848f0 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -541,6 +541,15 @@ static void erratum_1418040_thread_switch(struct task_struct *prev,
 	write_sysreg(val, cntkctl_el1);
 }
 
+static void compat_thread_switch(struct task_struct *next)
+{
+	if (!is_compat_thread(task_thread_info(next)))
+		return;
+
+	if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
+		set_tsk_thread_flag(next, TIF_NOTIFY_RESUME);
+}
+
 /*
  * Thread switching.
  */
@@ -557,6 +566,7 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	uao_thread_switch(next);
 	ssbs_thread_switch(next);
 	erratum_1418040_thread_switch(prev, next);
+	compat_thread_switch(next);
 
 	/*
 	 * Complete any pending TLB or cache maintenance on this CPU in case
@@ -619,8 +629,15 @@ unsigned long arch_align_stack(unsigned long sp)
  */
 void arch_setup_new_exec(void)
 {
-	current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0;
+	unsigned long mmflags = 0;
+
+	if (is_compat_task()) {
+		mmflags = MMCF_AARCH32;
+		if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
+			set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
+	}
 
+	current->mm->context.flags = mmflags;
 	ptrauth_thread_init_user(current);
 
 	if (task_spec_ssb_noexec(current)) {
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index a8184cad8890..bcb6ca2d9a7c 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -911,6 +911,19 @@ static void do_signal(struct pt_regs *regs)
 	restore_saved_sigmask();
 }
 
+static bool cpu_affinity_invalid(struct pt_regs *regs)
+{
+	if (!compat_user_mode(regs))
+		return false;
+
+	/*
+	 * We're preemptible, but a reschedule will cause us to check the
+	 * affinity again.
+	 */
+	return !cpumask_test_cpu(raw_smp_processor_id(),
+				 system_32bit_el0_cpumask());
+}
+
 asmlinkage void do_notify_resume(struct pt_regs *regs,
 				 unsigned long thread_flags)
 {
@@ -948,6 +961,19 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 			if (thread_flags & _TIF_NOTIFY_RESUME) {
 				tracehook_notify_resume(regs);
 				rseq_handle_notify_resume(NULL, regs);
+
+				/*
+				 * If we reschedule after checking the affinity
+				 * then we must ensure that TIF_NOTIFY_RESUME
+				 * is set so that we check the affinity again.
+				 * Since tracehook_notify_resume() clears the
+				 * flag, ensure that the compiler doesn't move
+				 * it after the affinity check.
+				 */
+				barrier();
+
+				if (cpu_affinity_invalid(regs))
+					force_sig(SIGKILL);
 			}
 
 			if (thread_flags & _TIF_FOREIGN_FPSTATE)
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 05/15] arm64: Advertise CPUs capable of running 32-bit applications in sysfs
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (3 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 04/15] arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-08 13:28 ` [PATCH v5 06/15] sched: Introduce task_cpu_possible_mask() to limit fallback rq selection Will Deacon
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Since 32-bit applications will be killed if they are caught trying to
execute on a 64-bit-only CPU in a mismatched system, advertise the set
of 32-bit capable CPUs to userspace in sysfs.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 .../ABI/testing/sysfs-devices-system-cpu      |  9 +++++++++
 arch/arm64/kernel/cpufeature.c                | 19 +++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1a04ca8162ad..8a2e377b0dde 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -493,6 +493,15 @@ Description:	AArch64 CPU registers
 		'identification' directory exposes the CPU ID registers for
 		identifying model and revision of the CPU.
 
+What:		/sys/devices/system/cpu/aarch32_el0
+Date:		November 2020
+Contact:	Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org>
+Description:	Identifies the subset of CPUs in the system that can execute
+		AArch32 (32-bit ARM) applications. If present, the same format as
+		/sys/devices/system/cpu/{offline,online,possible,present} is used.
+		If absent, then all or none of the CPUs can execute AArch32
+		applications and execve() will behave accordingly.
+
 What:		/sys/devices/system/cpu/cpu#/cpu_capacity
 Date:		December 2016
 Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index bb53af53ce8d..088bf668cbe7 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -67,6 +67,7 @@
 #include <linux/crash_dump.h>
 #include <linux/sort.h>
 #include <linux/stop_machine.h>
+#include <linux/sysfs.h>
 #include <linux/types.h>
 #include <linux/mm.h>
 #include <linux/cpu.h>
@@ -1272,6 +1273,24 @@ const struct cpumask *system_32bit_el0_cpumask(void)
 	return cpu_possible_mask;
 }
 
+static ssize_t aarch32_el0_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	const struct cpumask *mask = system_32bit_el0_cpumask();
+
+	return sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(mask));
+}
+static const DEVICE_ATTR_RO(aarch32_el0);
+
+static int __init aarch32_el0_sysfs_init(void)
+{
+	if (!allow_mismatched_32bit_el0)
+		return 0;
+
+	return device_create_file(cpu_subsys.dev_root, &dev_attr_aarch32_el0);
+}
+device_initcall(aarch32_el0_sysfs_init);
+
 static bool has_32bit_el0(const struct arm64_cpu_capabilities *entry, int scope)
 {
 	if (!has_cpuid_feature(entry, scope))
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 06/15] sched: Introduce task_cpu_possible_mask() to limit fallback rq selection
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (4 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 05/15] arm64: Advertise CPUs capable of running 32-bit applications in sysfs Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-08 13:28 ` [PATCH v5 07/15] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

On such a system, we must take care not to migrate a task to an
unsupported CPU when forcefully moving tasks in select_fallback_rq()
in response to a CPU hot-unplug operation.

Introduce a task_cpu_possible_mask() hook which, given a task argument,
allows an architecture to return a cpumask of CPUs that are capable of
executing that task. The default implementation returns the
cpu_possible_mask, since sane machines do not suffer from per-cpu ISA
limitations that affect scheduling. The new mask is used when selecting
the fallback runqueue as a last resort before forcing a migration to the
first active CPU.

Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/mmu_context.h | 8 ++++++++
 kernel/sched/core.c         | 8 +++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmu_context.h b/include/linux/mmu_context.h
index 03dee12d2b61..bc4ac3c525e6 100644
--- a/include/linux/mmu_context.h
+++ b/include/linux/mmu_context.h
@@ -14,4 +14,12 @@
 static inline void leave_mm(int cpu) { }
 #endif
 
+/*
+ * CPUs that are capable of running task @p. By default, we assume a sane,
+ * homogeneous system. Must contain at least one active CPU.
+ */
+#ifndef task_cpu_possible_mask
+# define task_cpu_possible_mask(p)	cpu_possible_mask
+#endif
+
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e7e453492cff..58474569a2ea 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1708,7 +1708,10 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
 	if (is_per_cpu_kthread(p))
 		return cpu_online(cpu);
 
-	return cpu_active(cpu);
+	if (!cpu_active(cpu))
+		return false;
+
+	return cpumask_test_cpu(cpu, task_cpu_possible_mask(p));
 }
 
 /*
@@ -2318,10 +2321,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 			}
 			fallthrough;
 		case possible:
-			do_set_cpus_allowed(p, cpu_possible_mask);
+			do_set_cpus_allowed(p, task_cpu_possible_mask(p));
 			state = fail;
 			break;
-
 		case fail:
 			BUG();
 			break;
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 07/15] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (5 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 06/15] sched: Introduce task_cpu_possible_mask() to limit fallback rq selection Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-17 12:15   ` Qais Yousef
  2020-12-08 13:28 ` [PATCH v5 08/15] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() Will Deacon
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

If the scheduler cannot find an allowed CPU for a task,
cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask
if cgroup v1 is in use.

In preparation for allowing architectures to provide their own fallback
mask, just return early if we're not using cgroup v2 and allow
select_fallback_rq() to figure out the mask by itself.

Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/cgroup/cpuset.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 57b5b5d0a5fd..e970737c3ed2 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3299,9 +3299,11 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 
 void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 {
+	if (!is_in_v2_mode())
+		return; /* select_fallback_rq will try harder */
+
 	rcu_read_lock();
-	do_set_cpus_allowed(tsk, is_in_v2_mode() ?
-		task_cs(tsk)->cpus_allowed : cpu_possible_mask);
+	do_set_cpus_allowed(tsk, task_cs(tsk)->cpus_allowed);
 	rcu_read_unlock();
 
 	/*
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 08/15] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (6 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 07/15] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-28  3:54   ` Suren Baghdasaryan
  2020-12-08 13:28 ` [PATCH v5 09/15] sched: Reject CPU affinity changes based on task_cpu_possible_mask() Will Deacon
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

Modify guarantee_online_cpus() to take task_cpu_possible_mask() into
account when trying to find a suitable set of online CPUs for a given
task. This will avoid passing an invalid mask to set_cpus_allowed_ptr()
during ->attach() and will subsequently allow the cpuset hierarchy to be
taken into account when forcefully overriding the affinity mask for a
task which requires migration to a compatible CPU.

Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/cpuset.h |  3 ++-
 kernel/cgroup/cpuset.c | 33 +++++++++++++++++++--------------
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 04c20de66afc..414a8e694413 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -15,6 +15,7 @@
 #include <linux/cpumask.h>
 #include <linux/nodemask.h>
 #include <linux/mm.h>
+#include <linux/mmu_context.h>
 #include <linux/jump_label.h>
 
 #ifdef CONFIG_CPUSETS
@@ -184,7 +185,7 @@ static inline void cpuset_read_unlock(void) { }
 static inline void cpuset_cpus_allowed(struct task_struct *p,
 				       struct cpumask *mask)
 {
-	cpumask_copy(mask, cpu_possible_mask);
+	cpumask_copy(mask, task_cpu_possible_mask(p));
 }
 
 static inline void cpuset_cpus_allowed_fallback(struct task_struct *p)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index e970737c3ed2..d30febf1f69f 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -372,18 +372,26 @@ static inline bool is_in_v2_mode(void)
 }
 
 /*
- * Return in pmask the portion of a cpusets's cpus_allowed that
- * are online.  If none are online, walk up the cpuset hierarchy
- * until we find one that does have some online cpus.
+ * Return in pmask the portion of a task's cpusets's cpus_allowed that
+ * are online and are capable of running the task.  If none are found,
+ * walk up the cpuset hierarchy until we find one that does have some
+ * appropriate cpus.
  *
  * One way or another, we guarantee to return some non-empty subset
  * of cpu_online_mask.
  *
  * Call with callback_lock or cpuset_mutex held.
  */
-static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
+static void guarantee_online_cpus(struct task_struct *tsk,
+				  struct cpumask *pmask)
 {
-	while (!cpumask_intersects(cs->effective_cpus, cpu_online_mask)) {
+	struct cpuset *cs = task_cs(tsk);
+	const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
+
+	if (WARN_ON(!cpumask_and(pmask, possible_mask, cpu_online_mask)))
+		cpumask_copy(pmask, cpu_online_mask);
+
+	while (!cpumask_intersects(cs->effective_cpus, pmask)) {
 		cs = parent_cs(cs);
 		if (unlikely(!cs)) {
 			/*
@@ -393,11 +401,10 @@ static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
 			 * cpuset's effective_cpus is on its way to be
 			 * identical to cpu_online_mask.
 			 */
-			cpumask_copy(pmask, cpu_online_mask);
 			return;
 		}
 	}
-	cpumask_and(pmask, cs->effective_cpus, cpu_online_mask);
+	cpumask_and(pmask, pmask, cs->effective_cpus);
 }
 
 /*
@@ -2176,15 +2183,13 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 
 	percpu_down_write(&cpuset_rwsem);
 
-	/* prepare for attach */
-	if (cs == &top_cpuset)
-		cpumask_copy(cpus_attach, cpu_possible_mask);
-	else
-		guarantee_online_cpus(cs, cpus_attach);
-
 	guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
 
 	cgroup_taskset_for_each(task, css, tset) {
+		if (cs != &top_cpuset)
+			guarantee_online_cpus(task, cpus_attach);
+		else
+			cpumask_copy(cpus_attach, task_cpu_possible_mask(task));
 		/*
 		 * can_attach beforehand should guarantee that this doesn't
 		 * fail.  TODO: have a better way to handle failure here
@@ -3280,7 +3285,7 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 
 	spin_lock_irqsave(&callback_lock, flags);
 	rcu_read_lock();
-	guarantee_online_cpus(task_cs(tsk), pmask);
+	guarantee_online_cpus(tsk, pmask);
 	rcu_read_unlock();
 	spin_unlock_irqrestore(&callback_lock, flags);
 }
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 09/15] sched: Reject CPU affinity changes based on task_cpu_possible_mask()
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (7 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 08/15] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-08 13:28 ` [PATCH v5 10/15] sched: Introduce force_compatible_cpus_allowed_ptr() to limit CPU affinity Will Deacon
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Reject explicit requests to change the affinity mask of a task via
set_cpus_allowed_ptr() if the requested mask is not a subset of the
mask returned by task_cpu_possible_mask(). This ensures that the
'cpus_mask' for a given task cannot contain CPUs which are incapable of
executing it, except in cases where the affinity is forced.

Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/sched/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 58474569a2ea..92ac3e53f50a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1875,6 +1875,7 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 				  const struct cpumask *new_mask, bool check)
 {
 	const struct cpumask *cpu_valid_mask = cpu_active_mask;
+	const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
 	unsigned int dest_cpu;
 	struct rq_flags rf;
 	struct rq *rq;
@@ -1888,6 +1889,9 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 		 * Kernel threads are allowed on online && !active CPUs
 		 */
 		cpu_valid_mask = cpu_online_mask;
+	} else if (!cpumask_subset(new_mask, cpu_allowed_mask)) {
+		ret = -EINVAL;
+		goto out;
 	}
 
 	/*
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 10/15] sched: Introduce force_compatible_cpus_allowed_ptr() to limit CPU affinity
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (8 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 09/15] sched: Reject CPU affinity changes based on task_cpu_possible_mask() Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-28  4:29   ` Suren Baghdasaryan
  2020-12-08 13:28 ` [PATCH v5 11/15] arm64: Implement task_cpu_possible_mask() Will Deacon
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

Although userspace can carefully manage the affinity masks for such
tasks, one place where it is particularly problematic is execve()
because the CPU on which the execve() is occurring may be incompatible
with the new application image. In such a situation, it is desirable to
restrict the affinity mask of the task and ensure that the new image is
entered on a compatible CPU. From userspace's point of view, this looks
the same as if the incompatible CPUs have been hotplugged off in the
task's affinity mask.

In preparation for restricting the affinity mask for compat tasks on
arm64 systems without uniform support for 32-bit applications, introduce
force_compatible_cpus_allowed_ptr(), which restricts the affinity mask
for a task to contain only compatible CPUs.

Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/sched.h |   1 +
 kernel/sched/core.c   | 100 +++++++++++++++++++++++++++++++++++-------
 2 files changed, 86 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 76cd21fa5501..e42dd0fb85c5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1653,6 +1653,7 @@ extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
+extern void force_compatible_cpus_allowed_ptr(struct task_struct *p);
 #else
 static inline void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 92ac3e53f50a..1cfc94be18a9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1863,25 +1863,19 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 }
 
 /*
- * Change a given task's CPU affinity. Migrate the thread to a
- * proper CPU and schedule it away if the CPU it's executing on
- * is removed from the allowed bitmask.
- *
- * NOTE: the caller must have a valid reference to the task, the
- * task must not exit() & deallocate itself prematurely. The
- * call is not atomic; no spinlocks may be held.
+ * Called with both p->pi_lock and rq->lock held; drops both before returning.
  */
-static int __set_cpus_allowed_ptr(struct task_struct *p,
-				  const struct cpumask *new_mask, bool check)
+static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
+					 const struct cpumask *new_mask,
+					 bool check,
+					 struct rq *rq,
+					 struct rq_flags *rf)
 {
 	const struct cpumask *cpu_valid_mask = cpu_active_mask;
 	const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
 	unsigned int dest_cpu;
-	struct rq_flags rf;
-	struct rq *rq;
 	int ret = 0;
 
-	rq = task_rq_lock(p, &rf);
 	update_rq_clock(rq);
 
 	if (p->flags & PF_KTHREAD) {
@@ -1936,7 +1930,7 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 	if (task_running(rq, p) || p->state == TASK_WAKING) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
-		task_rq_unlock(rq, p, &rf);
+		task_rq_unlock(rq, p, rf);
 		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
 		return 0;
 	} else if (task_on_rq_queued(p)) {
@@ -1944,20 +1938,96 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 		 * OK, since we're going to drop the lock immediately
 		 * afterwards anyway.
 		 */
-		rq = move_queued_task(rq, &rf, p, dest_cpu);
+		rq = move_queued_task(rq, rf, p, dest_cpu);
 	}
 out:
-	task_rq_unlock(rq, p, &rf);
+	task_rq_unlock(rq, p, rf);
 
 	return ret;
 }
 
+/*
+ * Change a given task's CPU affinity. Migrate the thread to a
+ * proper CPU and schedule it away if the CPU it's executing on
+ * is removed from the allowed bitmask.
+ *
+ * NOTE: the caller must have a valid reference to the task, the
+ * task must not exit() & deallocate itself prematurely. The
+ * call is not atomic; no spinlocks may be held.
+ */
+static int __set_cpus_allowed_ptr(struct task_struct *p,
+				  const struct cpumask *new_mask, bool check)
+{
+	struct rq_flags rf;
+	struct rq *rq;
+
+	rq = task_rq_lock(p, &rf);
+	return __set_cpus_allowed_ptr_locked(p, new_mask, check, rq, &rf);
+}
+
 int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 {
 	return __set_cpus_allowed_ptr(p, new_mask, false);
 }
 EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);
 
+/*
+ * Change a given task's CPU affinity to the intersection of its current
+ * affinity mask and @subset_mask, writing the resulting mask to @new_mask.
+ * If the resulting mask is empty, leave the affinity unchanged and return
+ * -EINVAL.
+ */
+static int restrict_cpus_allowed_ptr(struct task_struct *p,
+				     struct cpumask *new_mask,
+				     const struct cpumask *subset_mask)
+{
+	struct rq_flags rf;
+	struct rq *rq;
+
+	rq = task_rq_lock(p, &rf);
+	if (!cpumask_and(new_mask, &p->cpus_mask, subset_mask)) {
+		task_rq_unlock(rq, p, &rf);
+		return -EINVAL;
+	}
+
+	return __set_cpus_allowed_ptr_locked(p, new_mask, false, rq, &rf);
+}
+
+/*
+ * Restrict a given task's CPU affinity so that it is a subset of
+ * task_cpu_possible_mask(). If the resulting mask is empty, we warn and
+ * walk up the cpuset hierarchy until we find a suitable mask.
+ */
+void force_compatible_cpus_allowed_ptr(struct task_struct *p)
+{
+	cpumask_var_t new_mask;
+	const struct cpumask *override_mask = task_cpu_possible_mask(p);
+
+	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
+		goto out_set_mask;
+
+	if (!restrict_cpus_allowed_ptr(p, new_mask, override_mask))
+		goto out_free_mask;
+
+	/*
+	 * We failed to find a valid subset of the affinity mask for the
+	 * task, so override it based on its cpuset hierarchy.
+	 */
+	cpuset_cpus_allowed(p, new_mask);
+	override_mask = new_mask;
+
+out_set_mask:
+	if (printk_ratelimit()) {
+		printk_deferred("Overriding affinity for process %d (%s) to CPUs %*pbl\n",
+				task_pid_nr(p), p->comm,
+				cpumask_pr_args(override_mask));
+	}
+
+	set_cpus_allowed_ptr(p, override_mask);
+out_free_mask:
+	free_cpumask_var(new_mask);
+}
+
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
 #ifdef CONFIG_SCHED_DEBUG
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 11/15] arm64: Implement task_cpu_possible_mask()
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (9 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 10/15] sched: Introduce force_compatible_cpus_allowed_ptr() to limit CPU affinity Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-08 13:28 ` [PATCH v5 12/15] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Provide an implementation of task_cpu_possible_mask() so that we can
prevent 64-bit-only cores being added to the 'cpus_mask' for compat
tasks on systems with mismatched 32-bit support at EL0,

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/mmu_context.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 0672236e1aea..a5c917fa49aa 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -251,6 +251,19 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 #define deactivate_mm(tsk,mm)	do { } while (0)
 #define activate_mm(prev,next)	switch_mm(prev, next, current)
 
+static inline const struct cpumask *
+task_cpu_possible_mask(struct task_struct *p)
+{
+	if (!static_branch_unlikely(&arm64_mismatched_32bit_el0))
+		return cpu_possible_mask;
+
+	if (!is_compat_thread(task_thread_info(p)))
+		return cpu_possible_mask;
+
+	return system_32bit_el0_cpumask();
+}
+#define task_cpu_possible_mask	task_cpu_possible_mask
+
 void verify_cpu_asid_bits(void);
 void post_ttbr_update_workaround(void);
 
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 12/15] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (10 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 11/15] arm64: Implement task_cpu_possible_mask() Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-08 13:28 ` [PATCH v5 13/15] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system Will Deacon
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

When exec'ing a 32-bit task on a system with mismatched support for
32-bit EL0, try to ensure that it starts life on a CPU that can actually
run it.

Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/process.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 9a2532d848f0..da313b738c7c 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -633,8 +633,20 @@ void arch_setup_new_exec(void)
 
 	if (is_compat_task()) {
 		mmflags = MMCF_AARCH32;
-		if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
+
+		/*
+		 * Restrict the CPU affinity mask for a 32-bit task so that
+		 * it contains only 32-bit-capable CPUs.
+		 *
+		 * From the perspective of the task, this looks similar to
+		 * what would happen if the 64-bit-only CPUs were hot-unplugged
+		 * at the point of execve(), although we try a bit harder to
+		 * honour the cpuset hierarchy.
+		 */
+		if (static_branch_unlikely(&arm64_mismatched_32bit_el0)) {
+			force_compatible_cpus_allowed_ptr(current);
 			set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
+		}
 	}
 
 	current->mm->context.flags = mmflags;
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 13/15] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (11 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 12/15] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-08 13:28 ` [PATCH v5 14/15] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 Will Deacon
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

If we want to support 32-bit applications, then when we identify a CPU
with mismatched 32-bit EL0 support we must ensure that we will always
have an active 32-bit CPU available to us from then on. This is important
for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
CPUs for compat tasks and forced migration due to a hotplug event will
hang if no 32-bit CPUs are available.

On detecting a mismatch, prevent offlining of either the mismatching CPU
if it is 32-bit capable, or find the first active 32-bit capable CPU
otherwise.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/cpufeature.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 088bf668cbe7..08b558a221b7 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1237,6 +1237,8 @@ has_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
 
 static int enable_mismatched_32bit_el0(unsigned int cpu)
 {
+	static int lucky_winner = -1;
+
 	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
 	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
 
@@ -1245,6 +1247,22 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
 		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
 	}
 
+	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
+		return 0;
+
+	if (lucky_winner >= 0)
+		return 0;
+
+	/*
+	 * We've detected a mismatch. We need to keep one of our CPUs with
+	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
+	 * every CPU in the system for a 32-bit task.
+	 */
+	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
+							 cpu_active_mask);
+	get_cpu_device(lucky_winner)->offline_disabled = true;
+	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
+		cpu, lucky_winner);
 	return 0;
 }
 
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 14/15] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (12 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 13/15] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-08 13:28 ` [PATCH v5 15/15] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores Will Deacon
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Allow systems with mismatched 32-bit support at EL0 to run 32-bit
applications based on a new kernel parameter.

Signed-off-by: Will Deacon <will@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
 arch/arm64/kernel/cpufeature.c                  | 7 +++++++
 2 files changed, 15 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 44fde25bb221..9d191e6e020b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -289,6 +289,14 @@
 			do not want to use tracing_snapshot_alloc() as it needs
 			to be done where GFP_KERNEL allocations are allowed.
 
+	allow_mismatched_32bit_el0 [ARM64]
+			Allow execve() of 32-bit applications and setting of the
+			PER_LINUX32 personality on systems where only a strict
+			subset of the CPUs support 32-bit EL0. When this
+			parameter is present, the set of CPUs supporting 32-bit
+			EL0 is indicated by /sys/devices/system/cpu/aarch32_el0
+			and hot-unplug operations may be restricted.
+
 	amd_iommu=	[HW,X86-64]
 			Pass parameters to the AMD IOMMU driver in the system.
 			Possible values are:
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 08b558a221b7..fea0f213d55c 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1291,6 +1291,13 @@ const struct cpumask *system_32bit_el0_cpumask(void)
 	return cpu_possible_mask;
 }
 
+static int __init parse_32bit_el0_param(char *str)
+{
+	allow_mismatched_32bit_el0 = true;
+	return 0;
+}
+early_param("allow_mismatched_32bit_el0", parse_32bit_el0_param);
+
 static ssize_t aarch32_el0_show(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 15/15] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (13 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 14/15] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 Will Deacon
@ 2020-12-08 13:28 ` Will Deacon
  2020-12-15 17:36 ` [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Peter Zijlstra
  2020-12-16 11:16 ` Qais Yousef
  16 siblings, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-08 13:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

The scheduler now knows enough about these braindead systems to place
32-bit tasks accordingly, so throw out the safety checks and allow the
ret-to-user path to avoid do_notify_resume() if there is nothing to do.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/process.c | 14 +-------------
 arch/arm64/kernel/signal.c  | 26 --------------------------
 2 files changed, 1 insertion(+), 39 deletions(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index da313b738c7c..3b08938c7d9d 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -541,15 +541,6 @@ static void erratum_1418040_thread_switch(struct task_struct *prev,
 	write_sysreg(val, cntkctl_el1);
 }
 
-static void compat_thread_switch(struct task_struct *next)
-{
-	if (!is_compat_thread(task_thread_info(next)))
-		return;
-
-	if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
-		set_tsk_thread_flag(next, TIF_NOTIFY_RESUME);
-}
-
 /*
  * Thread switching.
  */
@@ -566,7 +557,6 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	uao_thread_switch(next);
 	ssbs_thread_switch(next);
 	erratum_1418040_thread_switch(prev, next);
-	compat_thread_switch(next);
 
 	/*
 	 * Complete any pending TLB or cache maintenance on this CPU in case
@@ -643,10 +633,8 @@ void arch_setup_new_exec(void)
 		 * at the point of execve(), although we try a bit harder to
 		 * honour the cpuset hierarchy.
 		 */
-		if (static_branch_unlikely(&arm64_mismatched_32bit_el0)) {
+		if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
 			force_compatible_cpus_allowed_ptr(current);
-			set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
-		}
 	}
 
 	current->mm->context.flags = mmflags;
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index bcb6ca2d9a7c..a8184cad8890 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -911,19 +911,6 @@ static void do_signal(struct pt_regs *regs)
 	restore_saved_sigmask();
 }
 
-static bool cpu_affinity_invalid(struct pt_regs *regs)
-{
-	if (!compat_user_mode(regs))
-		return false;
-
-	/*
-	 * We're preemptible, but a reschedule will cause us to check the
-	 * affinity again.
-	 */
-	return !cpumask_test_cpu(raw_smp_processor_id(),
-				 system_32bit_el0_cpumask());
-}
-
 asmlinkage void do_notify_resume(struct pt_regs *regs,
 				 unsigned long thread_flags)
 {
@@ -961,19 +948,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 			if (thread_flags & _TIF_NOTIFY_RESUME) {
 				tracehook_notify_resume(regs);
 				rseq_handle_notify_resume(NULL, regs);
-
-				/*
-				 * If we reschedule after checking the affinity
-				 * then we must ensure that TIF_NOTIFY_RESUME
-				 * is set so that we check the affinity again.
-				 * Since tracehook_notify_resume() clears the
-				 * flag, ensure that the compiler doesn't move
-				 * it after the affinity check.
-				 */
-				barrier();
-
-				if (cpu_affinity_invalid(regs))
-					force_sig(SIGKILL);
 			}
 
 			if (thread_flags & _TIF_FOREIGN_FPSTATE)
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (14 preceding siblings ...)
  2020-12-08 13:28 ` [PATCH v5 15/15] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores Will Deacon
@ 2020-12-15 17:36 ` Peter Zijlstra
  2020-12-15 18:50   ` Will Deacon
  2020-12-16 11:16 ` Qais Yousef
  16 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2020-12-15 17:36 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Li Zefan,
	Johannes Weiner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	kernel-team

On Tue, Dec 08, 2020 at 01:28:20PM +0000, Will Deacon wrote:
> The aim of this series is to allow 32-bit ARM applications to run on
> arm64 SoCs where not all of the CPUs support the 32-bit instruction set.
> Unfortunately, such SoCs are real and will continue to be productised
> over the next few years at least. I can assure you that I'm not just
> doing this for fun.
> 
> Changes in v5 include:
> 
>   * Teach cpuset_cpus_allowed() about task_cpu_possible_mask() so that
>     we can avoid returning incompatible CPUs for a given task. This
>     means that sched_setaffinity() can be used with larger masks (like
>     the online mask) from userspace and also allows us to take into
>     account the cpuset hierarchy when forcefully overriding the affinity
>     for a task on execve().
> 
>   * Honour task_cpu_possible_mask() when attaching a task to a cpuset,
>     so that the resulting affinity mask does not contain any incompatible
>     CPUs (since it would be rejected by set_cpus_allowed_ptr() otherwise).
> 
>   * Moved overriding of the affinity mask into the scheduler core rather
>     than munge affinity masks directly in the architecture backend.

Hurmph... so if I can still read, this thing will auto truncate the
affinity mask to something that only contains compatible CPUs, right?

Assuming our system has 8 CPUs (0xFF), half of which are 32bit capable
(0x0F), then, when our native task (with affinity 0x3c) does a
fork()+execve() of a 32bit thingy the resulting task has 0x0c.

If that in turn does fork()+execve() of a native task, it will retain
the trucated affinity mask (0x0c), instead of returning to the wider
mask (0x3c).

IOW, any (accidental or otherwise) trip through a 32bit helper, will
destroy user state (the affinity mask: 0x3c).


Should we perhaps split task_struct::cpus_mask, one to keep an original
copy of the user state, and one to be an effective cpumask for the task?
That way, the moment a task constricts or widens it's
task_cpu_possible_mask() we can re-compute the effective mask without
loss of information.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems
  2020-12-15 17:36 ` [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Peter Zijlstra
@ 2020-12-15 18:50   ` Will Deacon
  2020-12-17 10:55     ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Will Deacon @ 2020-12-15 18:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Li Zefan,
	Johannes Weiner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	kernel-team

Hi Peter,

Cheers for taking a look.

On Tue, Dec 15, 2020 at 06:36:45PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 08, 2020 at 01:28:20PM +0000, Will Deacon wrote:
> > The aim of this series is to allow 32-bit ARM applications to run on
> > arm64 SoCs where not all of the CPUs support the 32-bit instruction set.
> > Unfortunately, such SoCs are real and will continue to be productised
> > over the next few years at least. I can assure you that I'm not just
> > doing this for fun.
> > 
> > Changes in v5 include:
> > 
> >   * Teach cpuset_cpus_allowed() about task_cpu_possible_mask() so that
> >     we can avoid returning incompatible CPUs for a given task. This
> >     means that sched_setaffinity() can be used with larger masks (like
> >     the online mask) from userspace and also allows us to take into
> >     account the cpuset hierarchy when forcefully overriding the affinity
> >     for a task on execve().
> > 
> >   * Honour task_cpu_possible_mask() when attaching a task to a cpuset,
> >     so that the resulting affinity mask does not contain any incompatible
> >     CPUs (since it would be rejected by set_cpus_allowed_ptr() otherwise).
> > 
> >   * Moved overriding of the affinity mask into the scheduler core rather
> >     than munge affinity masks directly in the architecture backend.
> 
> Hurmph... so if I can still read, this thing will auto truncate the
> affinity mask to something that only contains compatible CPUs, right?
> 
> Assuming our system has 8 CPUs (0xFF), half of which are 32bit capable
> (0x0F), then, when our native task (with affinity 0x3c) does a
> fork()+execve() of a 32bit thingy the resulting task has 0x0c.
> 
> If that in turn does fork()+execve() of a native task, it will retain
> the trucated affinity mask (0x0c), instead of returning to the wider
> mask (0x3c).
> 
> IOW, any (accidental or otherwise) trip through a 32bit helper, will
> destroy user state (the affinity mask: 0x3c).

Yes, that's correct, and I agree that it's a rough edge. If you're happy
with the idea of adding an extra mask to make this work, then I can start
hacking that up (although I doubt I'll get something out before the new
year at this point).

> Should we perhaps split task_struct::cpus_mask, one to keep an original
> copy of the user state, and one to be an effective cpumask for the task?
> That way, the moment a task constricts or widens it's
> task_cpu_possible_mask() we can re-compute the effective mask without
> loss of information.

Hmm, we might already have most of the pieces in place for this (modulo
the extra field), since cpuset_cpus_allowed() provides the limiting mask
now so this might be relatively straightforward.

Famous last words...

Will

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems
  2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (15 preceding siblings ...)
  2020-12-15 17:36 ` [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Peter Zijlstra
@ 2020-12-16 11:16 ` Qais Yousef
  2020-12-16 14:14   ` Will Deacon
  16 siblings, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2020-12-16 11:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Suren Baghdasaryan, Quentin Perret, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

Hi Will

On 12/08/20 13:28, Will Deacon wrote:
> Hi all,
> 
> Christmas has come early: it's time for version five of these patches
> which have previously appeared here:
> 
>   v1: https://lore.kernel.org/r/20201027215118.27003-1-will@kernel.org
>   v2: https://lore.kernel.org/r/20201109213023.15092-1-will@kernel.org
>   v3: https://lore.kernel.org/r/20201113093720.21106-1-will@kernel.org
>   v4: https://lore.kernel.org/r/20201124155039.13804-1-will@kernel.org
> 
> and which started life as a reimplementation of some patches from Qais:
> 
>   https://lore.kernel.org/r/20201021104611.2744565-1-qais.yousef@arm.com
> 
> There's also now a nice writeup on LWN:
> 
>   https://lwn.net/Articles/838339/
> 
> and rumours of a feature film are doing the rounds.
> 
> [subscriber-only, but if you're reading this then you should really
>  subscribe.]
> 
> The aim of this series is to allow 32-bit ARM applications to run on
> arm64 SoCs where not all of the CPUs support the 32-bit instruction set.
> Unfortunately, such SoCs are real and will continue to be productised
> over the next few years at least. I can assure you that I'm not just
> doing this for fun.
> 
> Changes in v5 include:
> 
>   * Teach cpuset_cpus_allowed() about task_cpu_possible_mask() so that
>     we can avoid returning incompatible CPUs for a given task. This
>     means that sched_setaffinity() can be used with larger masks (like
>     the online mask) from userspace and also allows us to take into
>     account the cpuset hierarchy when forcefully overriding the affinity
>     for a task on execve().
> 
>   * Honour task_cpu_possible_mask() when attaching a task to a cpuset,
>     so that the resulting affinity mask does not contain any incompatible
>     CPUs (since it would be rejected by set_cpus_allowed_ptr() otherwise).
> 
>   * Moved overriding of the affinity mask into the scheduler core rather
>     than munge affinity masks directly in the architecture backend.
> 
>   * Extended comments and documentation.
> 
>   * Some renaming and cosmetic changes.
> 
> I'm pretty happy with this now, although it still needs review and will
> require rebasing to play nicely with the SCA changes in -next.

I still have concerns about the cpuset v1 handling. Specifically:

	1. Attaching a 32bit task to 64bit only cpuset is allowed.

	   I think the right behavior here is to prevent that as the
	   intersection will appear as offline cpus for the 32bit tasks. So it
	   shouldn't be allowed to move there.

	2. Modifying cpuset.cpus could result with empty set for 32bit tasks.

	   It is a variation of the above, it's just the cpuset transforms into
	   64bit only after we attach.

	   I think the right behavior here is to move the 32bit tasks to the
	   nearest ancestor like we do when all cpuset.cpus are hotplugged out.

	   We could too return an error if the new set will result an empty set
	   for the 32bit tasks. In a similar manner to how it fails if you
	   write a cpu that is offline.

	3. If a 64bit task belongs to 64bit-only-cpuset execs a 32bit binary,
	   the 32 tasks will inherit the cgroup setting.

	   Like above, we should move this to the nearest ancestor.

I was worried if in a hierarchy the parent cpuset.cpus is modified such that
the childs no longer have a valid cpu for 32bit tasks. But I checked for v1 and
this isn't a problem. You'll get an error if you try to change it in a way that
ends up with an empty cpuset.

I played with v2, and the model allows tasks to remain attached even if cpus
are hotplugged, or cpusets.cpus is modified in such a way we end up with an
empty cpuset. So I think breaking the affinity of the cpuset for v2 is okay.

To simplify the problem for v1, we could say that asym ISA tasks can only live
in the root cpuset for v1. This will simplify the solution too since we will
only need to ensure that these tasks are moved to the root group on exec and
block any future move to anything else. Of course this dictates that such
systems must use cpuset v2 if they care. Not a terrible restriction IMO.

I hacked a patch to fix the exec scenario and it was easy to do. I just need to
block clone3 (cgroup_post_fork()) and task_can_attach() from allowing these
tasks from moving anywhere else.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems
  2020-12-16 11:16 ` Qais Yousef
@ 2020-12-16 14:14   ` Will Deacon
  2020-12-16 16:48     ` Qais Yousef
  0 siblings, 1 reply; 29+ messages in thread
From: Will Deacon @ 2020-12-16 14:14 UTC (permalink / raw)
  To: Qais Yousef, surenb
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Quentin Perret, Tejun Heo, Li Zefan,
	Johannes Weiner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	kernel-team

Hi Qais,

On Wed, Dec 16, 2020 at 11:16:46AM +0000, Qais Yousef wrote:
> On 12/08/20 13:28, Will Deacon wrote:
> > Changes in v5 include:
> > 
> >   * Teach cpuset_cpus_allowed() about task_cpu_possible_mask() so that
> >     we can avoid returning incompatible CPUs for a given task. This
> >     means that sched_setaffinity() can be used with larger masks (like
> >     the online mask) from userspace and also allows us to take into
> >     account the cpuset hierarchy when forcefully overriding the affinity
> >     for a task on execve().
> > 
> >   * Honour task_cpu_possible_mask() when attaching a task to a cpuset,
> >     so that the resulting affinity mask does not contain any incompatible
> >     CPUs (since it would be rejected by set_cpus_allowed_ptr() otherwise).
> > 
> >   * Moved overriding of the affinity mask into the scheduler core rather
> >     than munge affinity masks directly in the architecture backend.
> > 
> >   * Extended comments and documentation.
> > 
> >   * Some renaming and cosmetic changes.
> > 
> > I'm pretty happy with this now, although it still needs review and will
> > require rebasing to play nicely with the SCA changes in -next.
> 
> I still have concerns about the cpuset v1 handling. Specifically:
> 
> 	1. Attaching a 32bit task to 64bit only cpuset is allowed.
> 
> 	   I think the right behavior here is to prevent that as the
> 	   intersection will appear as offline cpus for the 32bit tasks. So it
> 	   shouldn't be allowed to move there.

Suren or Quantin can correct me if I'm wrong I'm here, but I think Android
relies on this working so it's not an option for us to prevent the attach.
I also don't think it really achieves much, since as you point out, the same
problem exists in other cases such as execve() of a 32-bit binary, or
hotplugging off all 32-bit CPUs within a mixed cpuset. Allowing the attach
and immediately reparenting would probably be better, but see below.

> 	2. Modifying cpuset.cpus could result with empty set for 32bit tasks.
> 
> 	   It is a variation of the above, it's just the cpuset transforms into
> 	   64bit only after we attach.
> 
> 	   I think the right behavior here is to move the 32bit tasks to the
> 	   nearest ancestor like we do when all cpuset.cpus are hotplugged out.
> 
> 	   We could too return an error if the new set will result an empty set
> 	   for the 32bit tasks. In a similar manner to how it fails if you
> 	   write a cpu that is offline.
> 
> 	3. If a 64bit task belongs to 64bit-only-cpuset execs a 32bit binary,
> 	   the 32 tasks will inherit the cgroup setting.
> 
> 	   Like above, we should move this to the nearest ancestor.

I considered this when I was writing the patches, but the reality is that
by allowing 32-bit tasks to attach to a 64-bit only cpuset (which is required
by Android), we have no choice but to expose a new ABI to userspace. This is
all gated behind a command-line option, so I think that's fine, but then why
not just have the same behaviour as cgroup v2? I don't see the point in
creating two new ABIs (for cgroup v1 and v2 respectively) if we don't need
to. If it was _identical_ to the hotplug case, then we would surely just
follow the existing behaviour, but it's really quite different in this
situation because the cpuset is not empty.

One thing we should definitely do though is add this to the documentation
for the command-line option.

> To simplify the problem for v1, we could say that asym ISA tasks can only live
> in the root cpuset for v1. This will simplify the solution too since we will
> only need to ensure that these tasks are moved to the root group on exec and
> block any future move to anything else. Of course this dictates that such
> systems must use cpuset v2 if they care. Not a terrible restriction IMO.

Sadly, I think Android is still on cgroup v1 for cpuset, but Suren will know
better the status of cgroup v2 for cpusets. If it's just around the corner,
then maybe we could simplify things here. Suren?

Will

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems
  2020-12-16 14:14   ` Will Deacon
@ 2020-12-16 16:48     ` Qais Yousef
  2020-12-16 18:21       ` Suren Baghdasaryan
  0 siblings, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2020-12-16 16:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: surenb, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Peter Zijlstra, Morten Rasmussen, Quentin Perret, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On 12/16/20 14:14, Will Deacon wrote:
> Hi Qais,
> 
> On Wed, Dec 16, 2020 at 11:16:46AM +0000, Qais Yousef wrote:
> > On 12/08/20 13:28, Will Deacon wrote:
> > > Changes in v5 include:
> > > 
> > >   * Teach cpuset_cpus_allowed() about task_cpu_possible_mask() so that
> > >     we can avoid returning incompatible CPUs for a given task. This
> > >     means that sched_setaffinity() can be used with larger masks (like
> > >     the online mask) from userspace and also allows us to take into
> > >     account the cpuset hierarchy when forcefully overriding the affinity
> > >     for a task on execve().
> > > 
> > >   * Honour task_cpu_possible_mask() when attaching a task to a cpuset,
> > >     so that the resulting affinity mask does not contain any incompatible
> > >     CPUs (since it would be rejected by set_cpus_allowed_ptr() otherwise).
> > > 
> > >   * Moved overriding of the affinity mask into the scheduler core rather
> > >     than munge affinity masks directly in the architecture backend.
> > > 
> > >   * Extended comments and documentation.
> > > 
> > >   * Some renaming and cosmetic changes.
> > > 
> > > I'm pretty happy with this now, although it still needs review and will
> > > require rebasing to play nicely with the SCA changes in -next.
> > 
> > I still have concerns about the cpuset v1 handling. Specifically:
> > 
> > 	1. Attaching a 32bit task to 64bit only cpuset is allowed.
> > 
> > 	   I think the right behavior here is to prevent that as the
> > 	   intersection will appear as offline cpus for the 32bit tasks. So it
> > 	   shouldn't be allowed to move there.
> 
> Suren or Quantin can correct me if I'm wrong I'm here, but I think Android
> relies on this working so it's not an option for us to prevent the attach.

I don't think so. It's just a matter who handles the error. ie: kernel fix it
up silently and effectively make the cpuset a NOP since we don't respect the
affinity of the cpuset, or user space pick the next best thing. Since this
could return an error anyway, likely user space already handles this.

> I also don't think it really achieves much, since as you point out, the same
> problem exists in other cases such as execve() of a 32-bit binary, or
> hotplugging off all 32-bit CPUs within a mixed cpuset. Allowing the attach
> and immediately reparenting would probably be better, but see below.

I am just wary that we're introducing a generic asymmetric ISA support, so my
concerns have been related to making sure the behavior is sane generally. When
this gets merged, I can bet more 'fun' hardware will appear all over the place.
We're opening the flood gates I'm afraid :p

> > 	2. Modifying cpuset.cpus could result with empty set for 32bit tasks.
> > 
> > 	   It is a variation of the above, it's just the cpuset transforms into
> > 	   64bit only after we attach.
> > 
> > 	   I think the right behavior here is to move the 32bit tasks to the
> > 	   nearest ancestor like we do when all cpuset.cpus are hotplugged out.
> > 
> > 	   We could too return an error if the new set will result an empty set
> > 	   for the 32bit tasks. In a similar manner to how it fails if you
> > 	   write a cpu that is offline.
> > 
> > 	3. If a 64bit task belongs to 64bit-only-cpuset execs a 32bit binary,
> > 	   the 32 tasks will inherit the cgroup setting.
> > 
> > 	   Like above, we should move this to the nearest ancestor.
> 
> I considered this when I was writing the patches, but the reality is that
> by allowing 32-bit tasks to attach to a 64-bit only cpuset (which is required
> by Android), we have no choice but to expose a new ABI to userspace. This is
> all gated behind a command-line option, so I think that's fine, but then why
> not just have the same behaviour as cgroup v2? I don't see the point in
> creating two new ABIs (for cgroup v1 and v2 respectively) if we don't need

Ultimately it's up to Tejun and Peter I guess. I thought we need to preserve
the v1 behavior for the new class of tasks. I won't object to the new ABI
myself. Maybe we just need to make the commit messages and cgroup-v1
documentation reflect that explicitly.

> to. If it was _identical_ to the hotplug case, then we would surely just
> follow the existing behaviour, but it's really quite different in this
> situation because the cpuset is not empty.

It is actually effectively empty for those tasks. But I see that one could look
at it from two different angles.

> One thing we should definitely do though is add this to the documentation
> for the command-line option.

+1

By the way, should the command-line option be renamed to something more
generic? This has already grown beyond just enabling the support for one
isolated case. No strong opinion, just a suggestion.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems
  2020-12-16 16:48     ` Qais Yousef
@ 2020-12-16 18:21       ` Suren Baghdasaryan
  0 siblings, 0 replies; 29+ messages in thread
From: Suren Baghdasaryan @ 2020-12-16 18:21 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Will Deacon, linux-arm-kernel, linux-arch, LKML, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Quentin Perret, Tejun Heo, Li Zefan,
	Johannes Weiner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	kernel-team

On Wed, Dec 16, 2020 at 8:48 AM Qais Yousef <qais.yousef@arm.com> wrote:
>
> On 12/16/20 14:14, Will Deacon wrote:
> > Hi Qais,
> >
> > On Wed, Dec 16, 2020 at 11:16:46AM +0000, Qais Yousef wrote:
> > > On 12/08/20 13:28, Will Deacon wrote:
> > > > Changes in v5 include:
> > > >
> > > >   * Teach cpuset_cpus_allowed() about task_cpu_possible_mask() so that
> > > >     we can avoid returning incompatible CPUs for a given task. This
> > > >     means that sched_setaffinity() can be used with larger masks (like
> > > >     the online mask) from userspace and also allows us to take into
> > > >     account the cpuset hierarchy when forcefully overriding the affinity
> > > >     for a task on execve().
> > > >
> > > >   * Honour task_cpu_possible_mask() when attaching a task to a cpuset,
> > > >     so that the resulting affinity mask does not contain any incompatible
> > > >     CPUs (since it would be rejected by set_cpus_allowed_ptr() otherwise).
> > > >
> > > >   * Moved overriding of the affinity mask into the scheduler core rather
> > > >     than munge affinity masks directly in the architecture backend.
> > > >
> > > >   * Extended comments and documentation.
> > > >
> > > >   * Some renaming and cosmetic changes.
> > > >
> > > > I'm pretty happy with this now, although it still needs review and will
> > > > require rebasing to play nicely with the SCA changes in -next.
> > >
> > > I still have concerns about the cpuset v1 handling. Specifically:
> > >
> > >     1. Attaching a 32bit task to 64bit only cpuset is allowed.
> > >
> > >        I think the right behavior here is to prevent that as the
> > >        intersection will appear as offline cpus for the 32bit tasks. So it
> > >        shouldn't be allowed to move there.
> >
> > Suren or Quantin can correct me if I'm wrong I'm here, but I think Android
> > relies on this working so it's not an option for us to prevent the attach.
>
> I don't think so. It's just a matter who handles the error. ie: kernel fix it
> up silently and effectively make the cpuset a NOP since we don't respect the
> affinity of the cpuset, or user space pick the next best thing. Since this
> could return an error anyway, likely user space already handles this.

Moving a 32bit task around the hierarchy when it lost the last 32bit
capable CPU in its affinity mask would not work for Android. We move
the tasks in the hierarchy only when they change their role
(background/foreground/etc) and does not expect the tasks to migrate
by themselves. I think the current approach of adjusting affinity
without migration while not ideal is much better. Consistency with
cgroup v2 is a big plus as well.
We do plan on moving cpuset controller to cgroup v2 but the transition
is slow, so my guess is that we will stick to it for another Android
release.

> > I also don't think it really achieves much, since as you point out, the same
> > problem exists in other cases such as execve() of a 32-bit binary, or
> > hotplugging off all 32-bit CPUs within a mixed cpuset. Allowing the attach
> > and immediately reparenting would probably be better, but see below.
>
> I am just wary that we're introducing a generic asymmetric ISA support, so my
> concerns have been related to making sure the behavior is sane generally. When
> this gets merged, I can bet more 'fun' hardware will appear all over the place.
> We're opening the flood gates I'm afraid :p
>
> > >     2. Modifying cpuset.cpus could result with empty set for 32bit tasks.
> > >
> > >        It is a variation of the above, it's just the cpuset transforms into
> > >        64bit only after we attach.
> > >
> > >        I think the right behavior here is to move the 32bit tasks to the
> > >        nearest ancestor like we do when all cpuset.cpus are hotplugged out.
> > >
> > >        We could too return an error if the new set will result an empty set
> > >        for the 32bit tasks. In a similar manner to how it fails if you
> > >        write a cpu that is offline.
> > >
> > >     3. If a 64bit task belongs to 64bit-only-cpuset execs a 32bit binary,
> > >        the 32 tasks will inherit the cgroup setting.
> > >
> > >        Like above, we should move this to the nearest ancestor.
> >
> > I considered this when I was writing the patches, but the reality is that
> > by allowing 32-bit tasks to attach to a 64-bit only cpuset (which is required
> > by Android), we have no choice but to expose a new ABI to userspace. This is
> > all gated behind a command-line option, so I think that's fine, but then why
> > not just have the same behaviour as cgroup v2? I don't see the point in
> > creating two new ABIs (for cgroup v1 and v2 respectively) if we don't need
>
> Ultimately it's up to Tejun and Peter I guess. I thought we need to preserve
> the v1 behavior for the new class of tasks. I won't object to the new ABI
> myself. Maybe we just need to make the commit messages and cgroup-v1
> documentation reflect that explicitly.
>
> > to. If it was _identical_ to the hotplug case, then we would surely just
> > follow the existing behaviour, but it's really quite different in this
> > situation because the cpuset is not empty.
>
> It is actually effectively empty for those tasks. But I see that one could look
> at it from two different angles.
>
> > One thing we should definitely do though is add this to the documentation
> > for the command-line option.
>
> +1
>
> By the way, should the command-line option be renamed to something more
> generic? This has already grown beyond just enabling the support for one
> isolated case. No strong opinion, just a suggestion.
>
> Thanks
>
> --
> Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems
  2020-12-15 18:50   ` Will Deacon
@ 2020-12-17 10:55     ` Peter Zijlstra
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2020-12-17 10:55 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Li Zefan,
	Johannes Weiner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	kernel-team

On Tue, Dec 15, 2020 at 06:50:12PM +0000, Will Deacon wrote:
> On Tue, Dec 15, 2020 at 06:36:45PM +0100, Peter Zijlstra wrote:

> > IOW, any (accidental or otherwise) trip through a 32bit helper, will
> > destroy user state (the affinity mask: 0x3c).
> 
> Yes, that's correct, and I agree that it's a rough edge. If you're happy
> with the idea of adding an extra mask to make this work, then I can start
> hacking that up

Yeah, I'm afraid we'll have to, this asymmetric muck is only going to
get worse from here on.

Anyway, I think we can avoid adding another cpumask_t to task_struct and
do with a cpumask_t * insteads. After all, for 'normal' tasks, the
task_cpu_possible_mask() will be cpu_possible_mask and we don't need to
carry anything extra.

Only once we hit one of these assymetric ISA things, can the task
allocate the additional cpumask and retain the full mask.

> (although I doubt I'll get something out before the new
> year at this point).

Yeah, we're all about to shut down for a bit, I'll not be looking at
email for 2 weeks either, so even if you send it, I might not see it
until the next year.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 07/15] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  2020-12-08 13:28 ` [PATCH v5 07/15] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
@ 2020-12-17 12:15   ` Qais Yousef
  2020-12-17 13:44     ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2020-12-17 12:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Suren Baghdasaryan, Quentin Perret, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On 12/08/20 13:28, Will Deacon wrote:
> If the scheduler cannot find an allowed CPU for a task,
> cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask
> if cgroup v1 is in use.
> 
> In preparation for allowing architectures to provide their own fallback
> mask, just return early if we're not using cgroup v2 and allow
> select_fallback_rq() to figure out the mask by itself.
> 
> Cc: Li Zefan <lizefan@huawei.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  kernel/cgroup/cpuset.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 57b5b5d0a5fd..e970737c3ed2 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -3299,9 +3299,11 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
>  
>  void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
>  {
> +	if (!is_in_v2_mode())
> +		return; /* select_fallback_rq will try harder */
> +
>  	rcu_read_lock();
> -	do_set_cpus_allowed(tsk, is_in_v2_mode() ?
> -		task_cs(tsk)->cpus_allowed : cpu_possible_mask);
> +	do_set_cpus_allowed(tsk, task_cs(tsk)->cpus_allowed);

Why is it safe to return that for cpuset v2? task_cs(tsk)->cpus_allowed is the
original user configured settings of the cpuset.cpus; which could have empty
intersection with task_cpu_possible_mask(), no?

do_set_cpus_allowed() will call set_cpus_allowed_common() which will end up
copying the mask as-is.

So unless I missed something there's a risk a 32bit task ends up having a 64bit
only cpu_mask when using cpuset v2.

Thanks

--
Qais Yousef

>  	rcu_read_unlock();
>  
>  	/*
> -- 
> 2.29.2.576.ga3fc446d84-goog
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 07/15] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  2020-12-17 12:15   ` Qais Yousef
@ 2020-12-17 13:44     ` Peter Zijlstra
  2020-12-17 14:59       ` Will Deacon
  2020-12-17 15:00       ` Qais Yousef
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2020-12-17 13:44 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Will Deacon, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Morten Rasmussen, Suren Baghdasaryan, Quentin Perret, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thu, Dec 17, 2020 at 12:15:52PM +0000, Qais Yousef wrote:
> On 12/08/20 13:28, Will Deacon wrote:
> > If the scheduler cannot find an allowed CPU for a task,
> > cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask
> > if cgroup v1 is in use.
> > 
> > In preparation for allowing architectures to provide their own fallback
> > mask, just return early if we're not using cgroup v2 and allow
> > select_fallback_rq() to figure out the mask by itself.
> > 
> > Cc: Li Zefan <lizefan@huawei.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Reviewed-by: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  kernel/cgroup/cpuset.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> > index 57b5b5d0a5fd..e970737c3ed2 100644
> > --- a/kernel/cgroup/cpuset.c
> > +++ b/kernel/cgroup/cpuset.c
> > @@ -3299,9 +3299,11 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
> >  
> >  void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
> >  {
> > +	if (!is_in_v2_mode())
> > +		return; /* select_fallback_rq will try harder */
> > +
> >  	rcu_read_lock();
> > -	do_set_cpus_allowed(tsk, is_in_v2_mode() ?
> > -		task_cs(tsk)->cpus_allowed : cpu_possible_mask);
> > +	do_set_cpus_allowed(tsk, task_cs(tsk)->cpus_allowed);
> 
> Why is it safe to return that for cpuset v2?

v1

Because in that case it does cpu_possible_mask, which, if you look at
select_fallback_rq(), is exactly what happens when cpuset 'fails' to
find a candidate.

Or at least, that's how I read the patch.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 07/15] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  2020-12-17 13:44     ` Peter Zijlstra
@ 2020-12-17 14:59       ` Will Deacon
  2020-12-17 15:00       ` Qais Yousef
  1 sibling, 0 replies; 29+ messages in thread
From: Will Deacon @ 2020-12-17 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Qais Yousef, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Morten Rasmussen, Suren Baghdasaryan, Quentin Perret, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thu, Dec 17, 2020 at 02:44:01PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2020 at 12:15:52PM +0000, Qais Yousef wrote:
> > On 12/08/20 13:28, Will Deacon wrote:
> > > If the scheduler cannot find an allowed CPU for a task,
> > > cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask
> > > if cgroup v1 is in use.
> > > 
> > > In preparation for allowing architectures to provide their own fallback
> > > mask, just return early if we're not using cgroup v2 and allow
> > > select_fallback_rq() to figure out the mask by itself.
> > > 
> > > Cc: Li Zefan <lizefan@huawei.com>
> > > Cc: Tejun Heo <tj@kernel.org>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Reviewed-by: Quentin Perret <qperret@google.com>
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  kernel/cgroup/cpuset.c | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> > > index 57b5b5d0a5fd..e970737c3ed2 100644
> > > --- a/kernel/cgroup/cpuset.c
> > > +++ b/kernel/cgroup/cpuset.c
> > > @@ -3299,9 +3299,11 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
> > >  
> > >  void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
> > >  {
> > > +	if (!is_in_v2_mode())
> > > +		return; /* select_fallback_rq will try harder */
> > > +
> > >  	rcu_read_lock();
> > > -	do_set_cpus_allowed(tsk, is_in_v2_mode() ?
> > > -		task_cs(tsk)->cpus_allowed : cpu_possible_mask);
> > > +	do_set_cpus_allowed(tsk, task_cs(tsk)->cpus_allowed);
> > 
> > Why is it safe to return that for cpuset v2?
> 
> v1
> 
> Because in that case it does cpu_possible_mask, which, if you look at
> select_fallback_rq(), is exactly what happens when cpuset 'fails' to
> find a candidate.
> 
> Or at least, that's how I read the patch.

I think Qais a point with v2 though: if task_cs(tsk)->cpus_allowed
contains 64-bit-only CPUs, then we're in trouble here. I should be
taking the intersection with the task_cpu_possible_mask() for the task.

Will

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 07/15] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  2020-12-17 13:44     ` Peter Zijlstra
  2020-12-17 14:59       ` Will Deacon
@ 2020-12-17 15:00       ` Qais Yousef
  1 sibling, 0 replies; 29+ messages in thread
From: Qais Yousef @ 2020-12-17 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Morten Rasmussen, Suren Baghdasaryan, Quentin Perret, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On 12/17/20 14:44, Peter Zijlstra wrote:
> On Thu, Dec 17, 2020 at 12:15:52PM +0000, Qais Yousef wrote:
> > On 12/08/20 13:28, Will Deacon wrote:
> > > If the scheduler cannot find an allowed CPU for a task,
> > > cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask
> > > if cgroup v1 is in use.
> > > 
> > > In preparation for allowing architectures to provide their own fallback
> > > mask, just return early if we're not using cgroup v2 and allow
> > > select_fallback_rq() to figure out the mask by itself.
> > > 
> > > Cc: Li Zefan <lizefan@huawei.com>
> > > Cc: Tejun Heo <tj@kernel.org>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Reviewed-by: Quentin Perret <qperret@google.com>
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  kernel/cgroup/cpuset.c | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> > > index 57b5b5d0a5fd..e970737c3ed2 100644
> > > --- a/kernel/cgroup/cpuset.c
> > > +++ b/kernel/cgroup/cpuset.c
> > > @@ -3299,9 +3299,11 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
> > >  
> > >  void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
> > >  {
> > > +	if (!is_in_v2_mode())
> > > +		return; /* select_fallback_rq will try harder */
> > > +
> > >  	rcu_read_lock();
> > > -	do_set_cpus_allowed(tsk, is_in_v2_mode() ?
> > > -		task_cs(tsk)->cpus_allowed : cpu_possible_mask);
> > > +	do_set_cpus_allowed(tsk, task_cs(tsk)->cpus_allowed);
> > 
> > Why is it safe to return that for cpuset v2?
> 
> v1
> 
> Because in that case it does cpu_possible_mask, which, if you look at
> select_fallback_rq(), is exactly what happens when cpuset 'fails' to
> find a candidate.
> 
> Or at least, that's how I read the patch.

Okay I can see that if v2 has effectively empty mask for the 32bit tasks, then
we'll fallback to the 'possible' switch case where we set
task_cpu_possible_mask().

But how about when task_cs(tsk)->cpus_allowed contains partially invalid cpus?

The search for a candidate cpu will return a correct dest_cpu, but the actual
cpu_mask of the task will contain invalid cpus that could be picked up later,
no? Shouldn't we

	cpumask_and(mask, task_cs(tsk)->cpus_allowed, task_cpu_possible_mask())

to remove those invalid cpus?

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 08/15] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()
  2020-12-08 13:28 ` [PATCH v5 08/15] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() Will Deacon
@ 2020-12-28  3:54   ` Suren Baghdasaryan
  0 siblings, 0 replies; 29+ messages in thread
From: Suren Baghdasaryan @ 2020-12-28  3:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, LKML, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Quentin Perret, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Tue, Dec 8, 2020 at 5:29 AM Will Deacon <will@kernel.org> wrote:
>
> Asymmetric systems may not offer the same level of userspace ISA support
> across all CPUs, meaning that some applications cannot be executed by
> some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
> not feature support for 32-bit applications on both clusters.
>
> Modify guarantee_online_cpus() to take task_cpu_possible_mask() into
> account when trying to find a suitable set of online CPUs for a given
> task. This will avoid passing an invalid mask to set_cpus_allowed_ptr()
> during ->attach() and will subsequently allow the cpuset hierarchy to be
> taken into account when forcefully overriding the affinity mask for a
> task which requires migration to a compatible CPU.
>
> Cc: Li Zefan <lizefan@huawei.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  include/linux/cpuset.h |  3 ++-
>  kernel/cgroup/cpuset.c | 33 +++++++++++++++++++--------------
>  2 files changed, 21 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> index 04c20de66afc..414a8e694413 100644
> --- a/include/linux/cpuset.h
> +++ b/include/linux/cpuset.h
> @@ -15,6 +15,7 @@
>  #include <linux/cpumask.h>
>  #include <linux/nodemask.h>
>  #include <linux/mm.h>
> +#include <linux/mmu_context.h>
>  #include <linux/jump_label.h>
>
>  #ifdef CONFIG_CPUSETS
> @@ -184,7 +185,7 @@ static inline void cpuset_read_unlock(void) { }
>  static inline void cpuset_cpus_allowed(struct task_struct *p,
>                                        struct cpumask *mask)
>  {
> -       cpumask_copy(mask, cpu_possible_mask);
> +       cpumask_copy(mask, task_cpu_possible_mask(p));
>  }
>
>  static inline void cpuset_cpus_allowed_fallback(struct task_struct *p)
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index e970737c3ed2..d30febf1f69f 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -372,18 +372,26 @@ static inline bool is_in_v2_mode(void)
>  }
>
>  /*
> - * Return in pmask the portion of a cpusets's cpus_allowed that
> - * are online.  If none are online, walk up the cpuset hierarchy
> - * until we find one that does have some online cpus.
> + * Return in pmask the portion of a task's cpusets's cpus_allowed that
> + * are online and are capable of running the task.  If none are found,
> + * walk up the cpuset hierarchy until we find one that does have some
> + * appropriate cpus.
>   *
>   * One way or another, we guarantee to return some non-empty subset
>   * of cpu_online_mask.
>   *
>   * Call with callback_lock or cpuset_mutex held.
>   */
> -static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
> +static void guarantee_online_cpus(struct task_struct *tsk,
> +                                 struct cpumask *pmask)
>  {
> -       while (!cpumask_intersects(cs->effective_cpus, cpu_online_mask)) {
> +       struct cpuset *cs = task_cs(tsk);
> +       const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
> +
> +       if (WARN_ON(!cpumask_and(pmask, possible_mask, cpu_online_mask)))

IIUC, this represents the case when there is no online CPU that can
run this task. In this situation guarantee_online_cpus() will return
an online CPU which can't run the task (because we ignore
possible_mask). I don't think this can be considered a valid fallback
path. However I think patch [13/15] ensures that we never end up in
this situation by disallowing to offline the last 32-bit capable CPU.
If that's true then maybe the patches can be reordered so that [13/15]
comes before this one and this condition can be treated as a bug here?


> +               cpumask_copy(pmask, cpu_online_mask);
> +
> +       while (!cpumask_intersects(cs->effective_cpus, pmask)) {
>                 cs = parent_cs(cs);
>                 if (unlikely(!cs)) {
>                         /*
> @@ -393,11 +401,10 @@ static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
>                          * cpuset's effective_cpus is on its way to be
>                          * identical to cpu_online_mask.
>                          */
> -                       cpumask_copy(pmask, cpu_online_mask);
>                         return;
>                 }
>         }
> -       cpumask_and(pmask, cs->effective_cpus, cpu_online_mask);
> +       cpumask_and(pmask, pmask, cs->effective_cpus);
>  }
>
>  /*
> @@ -2176,15 +2183,13 @@ static void cpuset_attach(struct cgroup_taskset *tset)
>
>         percpu_down_write(&cpuset_rwsem);
>
> -       /* prepare for attach */
> -       if (cs == &top_cpuset)
> -               cpumask_copy(cpus_attach, cpu_possible_mask);
> -       else
> -               guarantee_online_cpus(cs, cpus_attach);
> -
>         guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
>
>         cgroup_taskset_for_each(task, css, tset) {
> +               if (cs != &top_cpuset)
> +                       guarantee_online_cpus(task, cpus_attach);
> +               else
> +                       cpumask_copy(cpus_attach, task_cpu_possible_mask(task));
>                 /*
>                  * can_attach beforehand should guarantee that this doesn't
>                  * fail.  TODO: have a better way to handle failure here
> @@ -3280,7 +3285,7 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
>
>         spin_lock_irqsave(&callback_lock, flags);
>         rcu_read_lock();
> -       guarantee_online_cpus(task_cs(tsk), pmask);
> +       guarantee_online_cpus(tsk, pmask);
>         rcu_read_unlock();
>         spin_unlock_irqrestore(&callback_lock, flags);
>  }
> --
> 2.29.2.576.ga3fc446d84-goog
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 10/15] sched: Introduce force_compatible_cpus_allowed_ptr() to limit CPU affinity
  2020-12-08 13:28 ` [PATCH v5 10/15] sched: Introduce force_compatible_cpus_allowed_ptr() to limit CPU affinity Will Deacon
@ 2020-12-28  4:29   ` Suren Baghdasaryan
  0 siblings, 0 replies; 29+ messages in thread
From: Suren Baghdasaryan @ 2020-12-28  4:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, LKML, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Quentin Perret, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

Just a couple minor nits.

On Tue, Dec 8, 2020 at 5:29 AM Will Deacon <will@kernel.org> wrote:
>
> Asymmetric systems may not offer the same level of userspace ISA support
> across all CPUs, meaning that some applications cannot be executed by
> some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
> not feature support for 32-bit applications on both clusters.
>
> Although userspace can carefully manage the affinity masks for such
> tasks, one place where it is particularly problematic is execve()
> because the CPU on which the execve() is occurring may be incompatible
> with the new application image. In such a situation, it is desirable to
> restrict the affinity mask of the task and ensure that the new image is
> entered on a compatible CPU. From userspace's point of view, this looks
> the same as if the incompatible CPUs have been hotplugged off in the
> task's affinity mask.
>
> In preparation for restricting the affinity mask for compat tasks on
> arm64 systems without uniform support for 32-bit applications, introduce
> force_compatible_cpus_allowed_ptr(), which restricts the affinity mask
> for a task to contain only compatible CPUs.
>
> Reviewed-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  include/linux/sched.h |   1 +
>  kernel/sched/core.c   | 100 +++++++++++++++++++++++++++++++++++-------
>  2 files changed, 86 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 76cd21fa5501..e42dd0fb85c5 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1653,6 +1653,7 @@ extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_
>  #ifdef CONFIG_SMP
>  extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
>  extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
> +extern void force_compatible_cpus_allowed_ptr(struct task_struct *p);
>  #else
>  static inline void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
>  {
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 92ac3e53f50a..1cfc94be18a9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1863,25 +1863,19 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
>  }
>
>  /*
> - * Change a given task's CPU affinity. Migrate the thread to a
> - * proper CPU and schedule it away if the CPU it's executing on
> - * is removed from the allowed bitmask.
> - *
> - * NOTE: the caller must have a valid reference to the task, the
> - * task must not exit() & deallocate itself prematurely. The
> - * call is not atomic; no spinlocks may be held.
> + * Called with both p->pi_lock and rq->lock held; drops both before returning.

Maybe annotate with __releases()?

>   */
> -static int __set_cpus_allowed_ptr(struct task_struct *p,
> -                                 const struct cpumask *new_mask, bool check)
> +static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
> +                                        const struct cpumask *new_mask,
> +                                        bool check,
> +                                        struct rq *rq,
> +                                        struct rq_flags *rf)
>  {
>         const struct cpumask *cpu_valid_mask = cpu_active_mask;
>         const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
>         unsigned int dest_cpu;
> -       struct rq_flags rf;
> -       struct rq *rq;
>         int ret = 0;
>
> -       rq = task_rq_lock(p, &rf);
>         update_rq_clock(rq);
>
>         if (p->flags & PF_KTHREAD) {
> @@ -1936,7 +1930,7 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
>         if (task_running(rq, p) || p->state == TASK_WAKING) {
>                 struct migration_arg arg = { p, dest_cpu };
>                 /* Need help from migration thread: drop lock and wait. */
> -               task_rq_unlock(rq, p, &rf);
> +               task_rq_unlock(rq, p, rf);
>                 stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>                 return 0;
>         } else if (task_on_rq_queued(p)) {
> @@ -1944,20 +1938,96 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
>                  * OK, since we're going to drop the lock immediately
>                  * afterwards anyway.
>                  */
> -               rq = move_queued_task(rq, &rf, p, dest_cpu);
> +               rq = move_queued_task(rq, rf, p, dest_cpu);
>         }
>  out:
> -       task_rq_unlock(rq, p, &rf);
> +       task_rq_unlock(rq, p, rf);
>
>         return ret;
>  }
>
> +/*
> + * Change a given task's CPU affinity. Migrate the thread to a
> + * proper CPU and schedule it away if the CPU it's executing on
> + * is removed from the allowed bitmask.
> + *
> + * NOTE: the caller must have a valid reference to the task, the
> + * task must not exit() & deallocate itself prematurely. The
> + * call is not atomic; no spinlocks may be held.
> + */
> +static int __set_cpus_allowed_ptr(struct task_struct *p,
> +                                 const struct cpumask *new_mask, bool check)
> +{
> +       struct rq_flags rf;
> +       struct rq *rq;
> +
> +       rq = task_rq_lock(p, &rf);
> +       return __set_cpus_allowed_ptr_locked(p, new_mask, check, rq, &rf);
> +}
> +
>  int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>  {
>         return __set_cpus_allowed_ptr(p, new_mask, false);
>  }
>  EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);
>
> +/*
> + * Change a given task's CPU affinity to the intersection of its current
> + * affinity mask and @subset_mask, writing the resulting mask to @new_mask.
> + * If the resulting mask is empty, leave the affinity unchanged and return
> + * -EINVAL.
> + */
> +static int restrict_cpus_allowed_ptr(struct task_struct *p,
> +                                    struct cpumask *new_mask,
> +                                    const struct cpumask *subset_mask)
> +{
> +       struct rq_flags rf;
> +       struct rq *rq;
> +
> +       rq = task_rq_lock(p, &rf);
> +       if (!cpumask_and(new_mask, &p->cpus_mask, subset_mask)) {
> +               task_rq_unlock(rq, p, &rf);
> +               return -EINVAL;
> +       }
> +
> +       return __set_cpus_allowed_ptr_locked(p, new_mask, false, rq, &rf);
> +}
> +
> +/*
> + * Restrict a given task's CPU affinity so that it is a subset of
> + * task_cpu_possible_mask(). If the resulting mask is empty, we warn and
> + * walk up the cpuset hierarchy until we find a suitable mask.

Technically you also have a case when new_mask allocation fails, in
which case task_cpu_possible_mask() is forced instead. Maybe add a
clarifying comment at "goto out_set_mask" ?

> + */
> +void force_compatible_cpus_allowed_ptr(struct task_struct *p)
> +{
> +       cpumask_var_t new_mask;
> +       const struct cpumask *override_mask = task_cpu_possible_mask(p);
> +
> +       if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
> +               goto out_set_mask;
> +
> +       if (!restrict_cpus_allowed_ptr(p, new_mask, override_mask))
> +               goto out_free_mask;
> +
> +       /*
> +        * We failed to find a valid subset of the affinity mask for the
> +        * task, so override it based on its cpuset hierarchy.
> +        */
> +       cpuset_cpus_allowed(p, new_mask);
> +       override_mask = new_mask;
> +
> +out_set_mask:
> +       if (printk_ratelimit()) {
> +               printk_deferred("Overriding affinity for process %d (%s) to CPUs %*pbl\n",
> +                               task_pid_nr(p), p->comm,
> +                               cpumask_pr_args(override_mask));
> +       }
> +
> +       set_cpus_allowed_ptr(p, override_mask);
> +out_free_mask:
> +       free_cpumask_var(new_mask);
> +}
> +
>  void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  {
>  #ifdef CONFIG_SCHED_DEBUG
> --
> 2.29.2.576.ga3fc446d84-goog
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-12-28  4:30 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-08 13:28 [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Will Deacon
2020-12-08 13:28 ` [PATCH v5 01/15] arm64: cpuinfo: Split AArch32 registers out into a separate struct Will Deacon
2020-12-08 13:28 ` [PATCH v5 02/15] arm64: Allow mismatched 32-bit EL0 support Will Deacon
2020-12-08 13:28 ` [PATCH v5 03/15] KVM: arm64: Kill 32-bit vCPUs on systems with mismatched " Will Deacon
2020-12-08 13:28 ` [PATCH v5 04/15] arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs Will Deacon
2020-12-08 13:28 ` [PATCH v5 05/15] arm64: Advertise CPUs capable of running 32-bit applications in sysfs Will Deacon
2020-12-08 13:28 ` [PATCH v5 06/15] sched: Introduce task_cpu_possible_mask() to limit fallback rq selection Will Deacon
2020-12-08 13:28 ` [PATCH v5 07/15] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
2020-12-17 12:15   ` Qais Yousef
2020-12-17 13:44     ` Peter Zijlstra
2020-12-17 14:59       ` Will Deacon
2020-12-17 15:00       ` Qais Yousef
2020-12-08 13:28 ` [PATCH v5 08/15] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() Will Deacon
2020-12-28  3:54   ` Suren Baghdasaryan
2020-12-08 13:28 ` [PATCH v5 09/15] sched: Reject CPU affinity changes based on task_cpu_possible_mask() Will Deacon
2020-12-08 13:28 ` [PATCH v5 10/15] sched: Introduce force_compatible_cpus_allowed_ptr() to limit CPU affinity Will Deacon
2020-12-28  4:29   ` Suren Baghdasaryan
2020-12-08 13:28 ` [PATCH v5 11/15] arm64: Implement task_cpu_possible_mask() Will Deacon
2020-12-08 13:28 ` [PATCH v5 12/15] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
2020-12-08 13:28 ` [PATCH v5 13/15] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system Will Deacon
2020-12-08 13:28 ` [PATCH v5 14/15] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 Will Deacon
2020-12-08 13:28 ` [PATCH v5 15/15] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores Will Deacon
2020-12-15 17:36 ` [PATCH v5 00/15] An alternative series for asymmetric AArch32 systems Peter Zijlstra
2020-12-15 18:50   ` Will Deacon
2020-12-17 10:55     ` Peter Zijlstra
2020-12-16 11:16 ` Qais Yousef
2020-12-16 14:14   ` Will Deacon
2020-12-16 16:48     ` Qais Yousef
2020-12-16 18:21       ` Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).