linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems
@ 2020-11-13  9:37 Will Deacon
  2020-11-13  9:37 ` [PATCH v3 01/14] arm64: cpuinfo: Split AArch32 registers out into a separate struct Will Deacon
                   ` (14 more replies)
  0 siblings, 15 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Hi again everyone,

	I'm not a scheduler hacker,
	I'm a scheduler hacker's mate.
	I'm only hacking the scheduler,
	'cos trying to run 32-bit applications on systems where not all of the CPUs support it is GREAT.

It's Friday 13th, and I'm back with version three of the increasingly
popular patches I previously posted here:

  v1: https://lore.kernel.org/r/20201027215118.27003-1-will@kernel.org
  v2: https://lore.kernel.org/r/20201109213023.15092-1-will@kernel.org

and which started life as a reimplementation of some patches from Qais:

  https://lore.kernel.org/r/20201021104611.2744565-1-qais.yousef@arm.com

The aim of this series is to allow 32-bit ARM applications to run on
arm64 SoCs where not all of the CPUs support the 32-bit instruction set.

There are some major changes in v3:

  * Add some scheduler hooks for restricting a task's affinity mask
  * Implement these hooks for arm64 so that we can avoid 32-bit tasks
    running on 64-bit-only cores
  * Restrict affinity mask of 32-bit tasks on execve()
  * Prevent hot-unplug of all 32-bit CPUs if we have a mismatched system
  * Ensure 32-bit EL0 cpumask is zero-initialised (oops)

It's worth mentioning that this approach goes directly against my
initial proposal for punting the affinity management to userspace,
because it turns out that doesn't really work. There are cases where the
kernel has to muck with the affinity mask explicitly, such as execve(),
CPU hotplug and cpuset balancing. Ensuring that these don't lead to
random SIGKILLs as far as userspace is concerned means avoiding any
64-bit-only CPUs appearing in the affinity mask for a 32-bit task, at
which point it's easier just to handle everything in the kernel anyway.

- Patches 1-6 hack the arm64 CPU feature code to allow 32-bit tasks to
  run on a mismatched system, but forcing SIGKILL if a task ends up on
  the wrong CPU. This is gated on a command-line option; without it, a
  mismatched system will be treated as 64-bit-only.

- Patches 7-11 add scheduler functionality necessary to constrain the
  CPU affinity mask on a per-task basis and hook this up for execve() on
  arm64.

- Patches 12-14 finish off the arm64 plumbing and remove the logic for
  killing misplaced tasks, as it adds overhead to the context-switch and
  ret-to-user paths.

This seems to do the right thing in my contrived QEMU environment, but
as I say, I'm not a scheduler hacker so I'm open to alternative ideas.

Cheers,

Will

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Qais Yousef <qais.yousef@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: kernel-team@android.com

--->8

Will Deacon (14):
  arm64: cpuinfo: Split AArch32 registers out into a separate struct
  arm64: Allow mismatched 32-bit EL0 support
  KVM: arm64: Kill 32-bit vCPUs on systems with mismatched EL0 support
  arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs
  arm64: Advertise CPUs capable of running 32-bit applications in sysfs
  arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0
  sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU
    affinity
  arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit
    EL0
  cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  sched: Introduce arch_cpu_allowed_mask() to limit fallback rq
    selection
  sched: Reject CPU affinity changes based on arch_cpu_allowed_mask()
  arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched
    system
  arm64: Implement arch_cpu_allowed_mask()
  arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores

 .../ABI/testing/sysfs-devices-system-cpu      |   9 +
 .../admin-guide/kernel-parameters.txt         |   7 +
 arch/arm64/include/asm/cpu.h                  |  44 ++--
 arch/arm64/include/asm/cpucaps.h              |   2 +-
 arch/arm64/include/asm/cpufeature.h           |   8 +-
 arch/arm64/include/asm/mmu_context.h          |  12 +
 arch/arm64/kernel/cpufeature.c                | 219 ++++++++++++++----
 arch/arm64/kernel/cpuinfo.c                   |  53 +++--
 arch/arm64/kernel/process.c                   |  17 +-
 arch/arm64/kvm/arm.c                          |  11 +-
 include/linux/sched.h                         |   1 +
 kernel/cgroup/cpuset.c                        |   6 +-
 kernel/sched/core.c                           |  90 +++++--
 13 files changed, 370 insertions(+), 109 deletions(-)

-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 01/14] arm64: cpuinfo: Split AArch32 registers out into a separate struct
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-13  9:37 ` [PATCH v3 02/14] arm64: Allow mismatched 32-bit EL0 support Will Deacon
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

In preparation for late initialisation of the "sanitised" AArch32 register
state, move the AArch32 registers out of 'struct cpuinfo' and into their
own struct definition.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/cpu.h   | 44 +++++++++++----------
 arch/arm64/kernel/cpufeature.c | 71 ++++++++++++++++++----------------
 arch/arm64/kernel/cpuinfo.c    | 53 +++++++++++++------------
 3 files changed, 89 insertions(+), 79 deletions(-)

diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index 7faae6ff3ab4..f4e01aa0f442 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -12,26 +12,7 @@
 /*
  * Records attributes of an individual CPU.
  */
-struct cpuinfo_arm64 {
-	struct cpu	cpu;
-	struct kobject	kobj;
-	u32		reg_ctr;
-	u32		reg_cntfrq;
-	u32		reg_dczid;
-	u32		reg_midr;
-	u32		reg_revidr;
-
-	u64		reg_id_aa64dfr0;
-	u64		reg_id_aa64dfr1;
-	u64		reg_id_aa64isar0;
-	u64		reg_id_aa64isar1;
-	u64		reg_id_aa64mmfr0;
-	u64		reg_id_aa64mmfr1;
-	u64		reg_id_aa64mmfr2;
-	u64		reg_id_aa64pfr0;
-	u64		reg_id_aa64pfr1;
-	u64		reg_id_aa64zfr0;
-
+struct cpuinfo_32bit {
 	u32		reg_id_dfr0;
 	u32		reg_id_dfr1;
 	u32		reg_id_isar0;
@@ -54,6 +35,29 @@ struct cpuinfo_arm64 {
 	u32		reg_mvfr0;
 	u32		reg_mvfr1;
 	u32		reg_mvfr2;
+};
+
+struct cpuinfo_arm64 {
+	struct cpu	cpu;
+	struct kobject	kobj;
+	u32		reg_ctr;
+	u32		reg_cntfrq;
+	u32		reg_dczid;
+	u32		reg_midr;
+	u32		reg_revidr;
+
+	u64		reg_id_aa64dfr0;
+	u64		reg_id_aa64dfr1;
+	u64		reg_id_aa64isar0;
+	u64		reg_id_aa64isar1;
+	u64		reg_id_aa64mmfr0;
+	u64		reg_id_aa64mmfr1;
+	u64		reg_id_aa64mmfr2;
+	u64		reg_id_aa64pfr0;
+	u64		reg_id_aa64pfr1;
+	u64		reg_id_aa64zfr0;
+
+	struct cpuinfo_32bit	aarch32;
 
 	/* pseudo-ZCR for recording maximum ZCR_EL1 LEN value: */
 	u64		reg_zcr;
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index dcc165b3fc04..d4a7e84b1513 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -819,6 +819,31 @@ static void __init init_cpu_hwcaps_indirect_list(void)
 
 static void __init setup_boot_cpu_capabilities(void);
 
+static void __init init_32bit_cpu_features(struct cpuinfo_32bit *info)
+{
+	init_cpu_ftr_reg(SYS_ID_DFR0_EL1, info->reg_id_dfr0);
+	init_cpu_ftr_reg(SYS_ID_DFR1_EL1, info->reg_id_dfr1);
+	init_cpu_ftr_reg(SYS_ID_ISAR0_EL1, info->reg_id_isar0);
+	init_cpu_ftr_reg(SYS_ID_ISAR1_EL1, info->reg_id_isar1);
+	init_cpu_ftr_reg(SYS_ID_ISAR2_EL1, info->reg_id_isar2);
+	init_cpu_ftr_reg(SYS_ID_ISAR3_EL1, info->reg_id_isar3);
+	init_cpu_ftr_reg(SYS_ID_ISAR4_EL1, info->reg_id_isar4);
+	init_cpu_ftr_reg(SYS_ID_ISAR5_EL1, info->reg_id_isar5);
+	init_cpu_ftr_reg(SYS_ID_ISAR6_EL1, info->reg_id_isar6);
+	init_cpu_ftr_reg(SYS_ID_MMFR0_EL1, info->reg_id_mmfr0);
+	init_cpu_ftr_reg(SYS_ID_MMFR1_EL1, info->reg_id_mmfr1);
+	init_cpu_ftr_reg(SYS_ID_MMFR2_EL1, info->reg_id_mmfr2);
+	init_cpu_ftr_reg(SYS_ID_MMFR3_EL1, info->reg_id_mmfr3);
+	init_cpu_ftr_reg(SYS_ID_MMFR4_EL1, info->reg_id_mmfr4);
+	init_cpu_ftr_reg(SYS_ID_MMFR5_EL1, info->reg_id_mmfr5);
+	init_cpu_ftr_reg(SYS_ID_PFR0_EL1, info->reg_id_pfr0);
+	init_cpu_ftr_reg(SYS_ID_PFR1_EL1, info->reg_id_pfr1);
+	init_cpu_ftr_reg(SYS_ID_PFR2_EL1, info->reg_id_pfr2);
+	init_cpu_ftr_reg(SYS_MVFR0_EL1, info->reg_mvfr0);
+	init_cpu_ftr_reg(SYS_MVFR1_EL1, info->reg_mvfr1);
+	init_cpu_ftr_reg(SYS_MVFR2_EL1, info->reg_mvfr2);
+}
+
 void __init init_cpu_features(struct cpuinfo_arm64 *info)
 {
 	/* Before we start using the tables, make sure it is sorted */
@@ -838,29 +863,8 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
 	init_cpu_ftr_reg(SYS_ID_AA64PFR1_EL1, info->reg_id_aa64pfr1);
 	init_cpu_ftr_reg(SYS_ID_AA64ZFR0_EL1, info->reg_id_aa64zfr0);
 
-	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) {
-		init_cpu_ftr_reg(SYS_ID_DFR0_EL1, info->reg_id_dfr0);
-		init_cpu_ftr_reg(SYS_ID_DFR1_EL1, info->reg_id_dfr1);
-		init_cpu_ftr_reg(SYS_ID_ISAR0_EL1, info->reg_id_isar0);
-		init_cpu_ftr_reg(SYS_ID_ISAR1_EL1, info->reg_id_isar1);
-		init_cpu_ftr_reg(SYS_ID_ISAR2_EL1, info->reg_id_isar2);
-		init_cpu_ftr_reg(SYS_ID_ISAR3_EL1, info->reg_id_isar3);
-		init_cpu_ftr_reg(SYS_ID_ISAR4_EL1, info->reg_id_isar4);
-		init_cpu_ftr_reg(SYS_ID_ISAR5_EL1, info->reg_id_isar5);
-		init_cpu_ftr_reg(SYS_ID_ISAR6_EL1, info->reg_id_isar6);
-		init_cpu_ftr_reg(SYS_ID_MMFR0_EL1, info->reg_id_mmfr0);
-		init_cpu_ftr_reg(SYS_ID_MMFR1_EL1, info->reg_id_mmfr1);
-		init_cpu_ftr_reg(SYS_ID_MMFR2_EL1, info->reg_id_mmfr2);
-		init_cpu_ftr_reg(SYS_ID_MMFR3_EL1, info->reg_id_mmfr3);
-		init_cpu_ftr_reg(SYS_ID_MMFR4_EL1, info->reg_id_mmfr4);
-		init_cpu_ftr_reg(SYS_ID_MMFR5_EL1, info->reg_id_mmfr5);
-		init_cpu_ftr_reg(SYS_ID_PFR0_EL1, info->reg_id_pfr0);
-		init_cpu_ftr_reg(SYS_ID_PFR1_EL1, info->reg_id_pfr1);
-		init_cpu_ftr_reg(SYS_ID_PFR2_EL1, info->reg_id_pfr2);
-		init_cpu_ftr_reg(SYS_MVFR0_EL1, info->reg_mvfr0);
-		init_cpu_ftr_reg(SYS_MVFR1_EL1, info->reg_mvfr1);
-		init_cpu_ftr_reg(SYS_MVFR2_EL1, info->reg_mvfr2);
-	}
+	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0))
+		init_32bit_cpu_features(&info->aarch32);
 
 	if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) {
 		init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
@@ -931,20 +935,12 @@ static void relax_cpu_ftr_reg(u32 sys_id, int field)
 	WARN_ON(!ftrp->width);
 }
 
-static int update_32bit_cpu_features(int cpu, struct cpuinfo_arm64 *info,
-				     struct cpuinfo_arm64 *boot)
+static int update_32bit_cpu_features(int cpu, struct cpuinfo_32bit *info,
+				     struct cpuinfo_32bit *boot)
 {
 	int taint = 0;
 	u64 pfr0 = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
 
-	/*
-	 * If we don't have AArch32 at all then skip the checks entirely
-	 * as the register values may be UNKNOWN and we're not going to be
-	 * using them for anything.
-	 */
-	if (!id_aa64pfr0_32bit_el0(pfr0))
-		return taint;
-
 	/*
 	 * If we don't have AArch32 at EL1, then relax the strictness of
 	 * EL1-dependent register fields to avoid spurious sanity check fails.
@@ -1091,10 +1087,17 @@ void update_cpu_features(int cpu,
 	}
 
 	/*
+	 * If we don't have AArch32 at all then skip the checks entirely
+	 * as the register values may be UNKNOWN and we're not going to be
+	 * using them for anything.
+	 *
 	 * This relies on a sanitised view of the AArch64 ID registers
 	 * (e.g. SYS_ID_AA64PFR0_EL1), so we call it last.
 	 */
-	taint |= update_32bit_cpu_features(cpu, info, boot);
+	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) {
+		taint |= update_32bit_cpu_features(cpu, &info->aarch32,
+						   &boot->aarch32);
+	}
 
 	/*
 	 * Mismatched CPU features are a recipe for disaster. Don't even
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 77605aec25fe..8ce33742ad6a 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -344,6 +344,32 @@ static void cpuinfo_detect_icache_policy(struct cpuinfo_arm64 *info)
 	pr_info("Detected %s I-cache on CPU%d\n", icache_policy_str[l1ip], cpu);
 }
 
+static void __cpuinfo_store_cpu_32bit(struct cpuinfo_32bit *info)
+{
+	info->reg_id_dfr0 = read_cpuid(ID_DFR0_EL1);
+	info->reg_id_dfr1 = read_cpuid(ID_DFR1_EL1);
+	info->reg_id_isar0 = read_cpuid(ID_ISAR0_EL1);
+	info->reg_id_isar1 = read_cpuid(ID_ISAR1_EL1);
+	info->reg_id_isar2 = read_cpuid(ID_ISAR2_EL1);
+	info->reg_id_isar3 = read_cpuid(ID_ISAR3_EL1);
+	info->reg_id_isar4 = read_cpuid(ID_ISAR4_EL1);
+	info->reg_id_isar5 = read_cpuid(ID_ISAR5_EL1);
+	info->reg_id_isar6 = read_cpuid(ID_ISAR6_EL1);
+	info->reg_id_mmfr0 = read_cpuid(ID_MMFR0_EL1);
+	info->reg_id_mmfr1 = read_cpuid(ID_MMFR1_EL1);
+	info->reg_id_mmfr2 = read_cpuid(ID_MMFR2_EL1);
+	info->reg_id_mmfr3 = read_cpuid(ID_MMFR3_EL1);
+	info->reg_id_mmfr4 = read_cpuid(ID_MMFR4_EL1);
+	info->reg_id_mmfr5 = read_cpuid(ID_MMFR5_EL1);
+	info->reg_id_pfr0 = read_cpuid(ID_PFR0_EL1);
+	info->reg_id_pfr1 = read_cpuid(ID_PFR1_EL1);
+	info->reg_id_pfr2 = read_cpuid(ID_PFR2_EL1);
+
+	info->reg_mvfr0 = read_cpuid(MVFR0_EL1);
+	info->reg_mvfr1 = read_cpuid(MVFR1_EL1);
+	info->reg_mvfr2 = read_cpuid(MVFR2_EL1);
+}
+
 static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 {
 	info->reg_cntfrq = arch_timer_get_cntfrq();
@@ -371,31 +397,8 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 	info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1);
 	info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1);
 
-	/* Update the 32bit ID registers only if AArch32 is implemented */
-	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) {
-		info->reg_id_dfr0 = read_cpuid(ID_DFR0_EL1);
-		info->reg_id_dfr1 = read_cpuid(ID_DFR1_EL1);
-		info->reg_id_isar0 = read_cpuid(ID_ISAR0_EL1);
-		info->reg_id_isar1 = read_cpuid(ID_ISAR1_EL1);
-		info->reg_id_isar2 = read_cpuid(ID_ISAR2_EL1);
-		info->reg_id_isar3 = read_cpuid(ID_ISAR3_EL1);
-		info->reg_id_isar4 = read_cpuid(ID_ISAR4_EL1);
-		info->reg_id_isar5 = read_cpuid(ID_ISAR5_EL1);
-		info->reg_id_isar6 = read_cpuid(ID_ISAR6_EL1);
-		info->reg_id_mmfr0 = read_cpuid(ID_MMFR0_EL1);
-		info->reg_id_mmfr1 = read_cpuid(ID_MMFR1_EL1);
-		info->reg_id_mmfr2 = read_cpuid(ID_MMFR2_EL1);
-		info->reg_id_mmfr3 = read_cpuid(ID_MMFR3_EL1);
-		info->reg_id_mmfr4 = read_cpuid(ID_MMFR4_EL1);
-		info->reg_id_mmfr5 = read_cpuid(ID_MMFR5_EL1);
-		info->reg_id_pfr0 = read_cpuid(ID_PFR0_EL1);
-		info->reg_id_pfr1 = read_cpuid(ID_PFR1_EL1);
-		info->reg_id_pfr2 = read_cpuid(ID_PFR2_EL1);
-
-		info->reg_mvfr0 = read_cpuid(MVFR0_EL1);
-		info->reg_mvfr1 = read_cpuid(MVFR1_EL1);
-		info->reg_mvfr2 = read_cpuid(MVFR2_EL1);
-	}
+	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0))
+		__cpuinfo_store_cpu_32bit(&info->aarch32);
 
 	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
 	    id_aa64pfr0_sve(info->reg_id_aa64pfr0))
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 02/14] arm64: Allow mismatched 32-bit EL0 support
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
  2020-11-13  9:37 ` [PATCH v3 01/14] arm64: cpuinfo: Split AArch32 registers out into a separate struct Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-19 11:27   ` Valentin Schneider
  2020-11-13  9:37 ` [PATCH v3 03/14] KVM: arm64: Kill 32-bit vCPUs on systems with mismatched " Will Deacon
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

When confronted with a mixture of CPUs, some of which support 32-bit
applications and others which don't, we quite sensibly treat the system
as 64-bit only for userspace and prevent execve() of 32-bit binaries.

Unfortunately, some crazy folks have decided to build systems like this
with the intention of running 32-bit applications, so relax our
sanitisation logic to continue to advertise 32-bit support to userspace
on these systems and track the real 32-bit capable cores in a cpumask
instead. For now, the default behaviour remains but will be tied to
a command-line option in a later patch.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/cpucaps.h    |   2 +-
 arch/arm64/include/asm/cpufeature.h |   8 ++-
 arch/arm64/kernel/cpufeature.c      | 106 ++++++++++++++++++++++++++--
 3 files changed, 107 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index e7d98997c09c..e6f0eb4643a0 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -20,7 +20,7 @@
 #define ARM64_ALT_PAN_NOT_UAO			10
 #define ARM64_HAS_VIRT_HOST_EXTN		11
 #define ARM64_WORKAROUND_CAVIUM_27456		12
-#define ARM64_HAS_32BIT_EL0			13
+#define ARM64_HAS_32BIT_EL0_DO_NOT_USE		13
 #define ARM64_HARDEN_EL2_VECTORS		14
 #define ARM64_HAS_CNP				15
 #define ARM64_HAS_NO_FPSIMD			16
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 97244d4feca9..f447d313a9c5 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -604,9 +604,15 @@ static inline bool cpu_supports_mixed_endian_el0(void)
 	return id_aa64mmfr0_mixed_endian_el0(read_cpuid(ID_AA64MMFR0_EL1));
 }
 
+const struct cpumask *system_32bit_el0_cpumask(void);
+DECLARE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0);
+
 static inline bool system_supports_32bit_el0(void)
 {
-	return cpus_have_const_cap(ARM64_HAS_32BIT_EL0);
+	u64 pfr0 = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
+
+	return id_aa64pfr0_32bit_el0(pfr0) ||
+	       static_branch_unlikely(&arm64_mismatched_32bit_el0);
 }
 
 static inline bool system_supports_4kb_granule(void)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index d4a7e84b1513..1aa01ecc133f 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -104,6 +104,24 @@ DECLARE_BITMAP(boot_capabilities, ARM64_NPATCHABLE);
 bool arm64_use_ng_mappings = false;
 EXPORT_SYMBOL(arm64_use_ng_mappings);
 
+/*
+ * Permit PER_LINUX32 and execve() of 32-bit binaries even if not all CPUs
+ * support it?
+ */
+static bool __read_mostly allow_mismatched_32bit_el0;
+
+/*
+ * Static branch enabled only if allow_mismatched_32bit_el0 is set and we have
+ * seen at least one CPU capable of 32-bit EL0.
+ */
+DEFINE_STATIC_KEY_FALSE(arm64_mismatched_32bit_el0);
+
+/*
+ * Mask of CPUs supporting 32-bit EL0.
+ * Only valid if arm64_mismatched_32bit_el0 is enabled.
+ */
+static cpumask_var_t cpu_32bit_el0_mask __cpumask_var_read_mostly;
+
 /*
  * Flag to indicate if we have computed the system wide
  * capabilities based on the boot time active CPUs. This
@@ -756,7 +774,7 @@ static void __init sort_ftr_regs(void)
  * Any bits that are not covered by an arm64_ftr_bits entry are considered
  * RES0 for the system-wide value, and must strictly match.
  */
-static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new)
+static void init_cpu_ftr_reg(u32 sys_reg, u64 new)
 {
 	u64 val = 0;
 	u64 strict_mask = ~0x0ULL;
@@ -819,7 +837,7 @@ static void __init init_cpu_hwcaps_indirect_list(void)
 
 static void __init setup_boot_cpu_capabilities(void);
 
-static void __init init_32bit_cpu_features(struct cpuinfo_32bit *info)
+static void init_32bit_cpu_features(struct cpuinfo_32bit *info)
 {
 	init_cpu_ftr_reg(SYS_ID_DFR0_EL1, info->reg_id_dfr0);
 	init_cpu_ftr_reg(SYS_ID_DFR1_EL1, info->reg_id_dfr1);
@@ -935,6 +953,25 @@ static void relax_cpu_ftr_reg(u32 sys_id, int field)
 	WARN_ON(!ftrp->width);
 }
 
+static void update_compat_elf_hwcaps(void);
+
+static void update_mismatched_32bit_el0_cpu_features(struct cpuinfo_arm64 *info,
+						     struct cpuinfo_arm64 *boot)
+{
+	static bool boot_cpu_32bit_regs_overridden = false;
+
+	if (!allow_mismatched_32bit_el0 || boot_cpu_32bit_regs_overridden)
+		return;
+
+	if (id_aa64pfr0_32bit_el0(boot->reg_id_aa64pfr0))
+		return;
+
+	boot->aarch32 = info->aarch32;
+	init_32bit_cpu_features(&boot->aarch32);
+	update_compat_elf_hwcaps();
+	boot_cpu_32bit_regs_overridden = true;
+}
+
 static int update_32bit_cpu_features(int cpu, struct cpuinfo_32bit *info,
 				     struct cpuinfo_32bit *boot)
 {
@@ -1095,6 +1132,7 @@ void update_cpu_features(int cpu,
 	 * (e.g. SYS_ID_AA64PFR0_EL1), so we call it last.
 	 */
 	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) {
+		update_mismatched_32bit_el0_cpu_features(info, boot);
 		taint |= update_32bit_cpu_features(cpu, &info->aarch32,
 						   &boot->aarch32);
 	}
@@ -1196,6 +1234,55 @@ has_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
 	return feature_matches(val, entry);
 }
 
+static int enable_mismatched_32bit_el0(unsigned int cpu)
+{
+	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
+	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
+
+	if (cpu_32bit) {
+		cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
+		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
+	}
+
+	return 0;
+}
+
+static int __init init_32bit_el0_mask(void)
+{
+	if (!allow_mismatched_32bit_el0)
+		return 0;
+
+	if (!zalloc_cpumask_var(&cpu_32bit_el0_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	return cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
+				 "arm64/mismatched_32bit_el0:online",
+				 enable_mismatched_32bit_el0, NULL);
+}
+subsys_initcall_sync(init_32bit_el0_mask);
+
+const struct cpumask *system_32bit_el0_cpumask(void)
+{
+	if (!system_supports_32bit_el0())
+		return cpu_none_mask;
+
+	if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
+		return cpu_32bit_el0_mask;
+
+	return cpu_present_mask;
+}
+
+static bool has_32bit_el0(const struct arm64_cpu_capabilities *entry, int scope)
+{
+	if (!has_cpuid_feature(entry, scope))
+		return allow_mismatched_32bit_el0;
+
+	if (scope == SCOPE_SYSTEM)
+		pr_info("detected: 32-bit EL0 Support\n");
+
+	return true;
+}
+
 static bool has_useable_gicv3_cpuif(const struct arm64_cpu_capabilities *entry, int scope)
 {
 	bool has_sre;
@@ -1803,10 +1890,9 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 	},
 #endif	/* CONFIG_ARM64_VHE */
 	{
-		.desc = "32-bit EL0 Support",
-		.capability = ARM64_HAS_32BIT_EL0,
+		.capability = ARM64_HAS_32BIT_EL0_DO_NOT_USE,
 		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
-		.matches = has_cpuid_feature,
+		.matches = has_32bit_el0,
 		.sys_reg = SYS_ID_AA64PFR0_EL1,
 		.sign = FTR_UNSIGNED,
 		.field_pos = ID_AA64PFR0_EL0_SHIFT,
@@ -2299,7 +2385,7 @@ static const struct arm64_cpu_capabilities compat_elf_hwcaps[] = {
 	{},
 };
 
-static void __init cap_set_elf_hwcap(const struct arm64_cpu_capabilities *cap)
+static void cap_set_elf_hwcap(const struct arm64_cpu_capabilities *cap)
 {
 	switch (cap->hwcap_type) {
 	case CAP_HWCAP:
@@ -2344,7 +2430,7 @@ static bool cpus_have_elf_hwcap(const struct arm64_cpu_capabilities *cap)
 	return rc;
 }
 
-static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps)
+static void setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps)
 {
 	/* We support emulation of accesses to CPU ID feature registers */
 	cpu_set_named_feature(CPUID);
@@ -2353,6 +2439,12 @@ static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps)
 			cap_set_elf_hwcap(hwcaps);
 }
 
+static void update_compat_elf_hwcaps(void)
+{
+	if (system_capabilities_finalized())
+		setup_elf_hwcaps(compat_elf_hwcaps);
+}
+
 static void update_cpu_capabilities(u16 scope_mask)
 {
 	int i;
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 03/14] KVM: arm64: Kill 32-bit vCPUs on systems with mismatched EL0 support
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
  2020-11-13  9:37 ` [PATCH v3 01/14] arm64: cpuinfo: Split AArch32 registers out into a separate struct Will Deacon
  2020-11-13  9:37 ` [PATCH v3 02/14] arm64: Allow mismatched 32-bit EL0 support Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-13  9:37 ` [PATCH v3 04/14] arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs Will Deacon
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

If a vCPU is caught running 32-bit code on a system with mismatched
support at EL0, then we should kill it.

Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/arm.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5750ec34960e..d322ac0f4a8e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -633,6 +633,15 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 	}
 }
 
+static bool vcpu_mode_is_bad_32bit(struct kvm_vcpu *vcpu)
+{
+	if (likely(!vcpu_mode_is_32bit(vcpu)))
+		return false;
+
+	return !system_supports_32bit_el0() ||
+		static_branch_unlikely(&arm64_mismatched_32bit_el0);
+}
+
 /**
  * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
  * @vcpu:	The VCPU pointer
@@ -816,7 +825,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 * with the asymmetric AArch32 case), return to userspace with
 		 * a fatal error.
 		 */
-		if (!system_supports_32bit_el0() && vcpu_mode_is_32bit(vcpu)) {
+		if (vcpu_mode_is_bad_32bit(vcpu)) {
 			/*
 			 * As we have caught the guest red-handed, decide that
 			 * it isn't fit for purpose anymore by making the vcpu
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 04/14] arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (2 preceding siblings ...)
  2020-11-13  9:37 ` [PATCH v3 03/14] KVM: arm64: Kill 32-bit vCPUs on systems with mismatched " Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-13  9:37 ` [PATCH v3 05/14] arm64: Advertise CPUs capable of running 32-bit applications in sysfs Will Deacon
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Scheduling a 32-bit application on a 64-bit-only CPU is a bad idea.

Ensure that 32-bit applications always take the slow-path when returning
to userspace on a system with mismatched support at EL0, so that we can
avoid trying to run on a 64-bit-only CPU and force a SIGKILL instead.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/process.c | 19 ++++++++++++++++++-
 arch/arm64/kernel/signal.c  | 26 ++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 4784011cecac..1540ab0fbf23 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -542,6 +542,15 @@ static void erratum_1418040_thread_switch(struct task_struct *prev,
 	write_sysreg(val, cntkctl_el1);
 }
 
+static void compat_thread_switch(struct task_struct *next)
+{
+	if (!is_compat_thread(task_thread_info(next)))
+		return;
+
+	if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
+		set_tsk_thread_flag(next, TIF_NOTIFY_RESUME);
+}
+
 /*
  * Thread switching.
  */
@@ -558,6 +567,7 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	uao_thread_switch(next);
 	ssbs_thread_switch(next);
 	erratum_1418040_thread_switch(prev, next);
+	compat_thread_switch(next);
 
 	/*
 	 * Complete any pending TLB or cache maintenance on this CPU in case
@@ -620,8 +630,15 @@ unsigned long arch_align_stack(unsigned long sp)
  */
 void arch_setup_new_exec(void)
 {
-	current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0;
+	unsigned long mmflags = 0;
+
+	if (is_compat_task()) {
+		mmflags = MMCF_AARCH32;
+		if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
+			set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
+	}
 
+	current->mm->context.flags = mmflags;
 	ptrauth_thread_init_user(current);
 
 	if (task_spec_ssb_noexec(current)) {
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index a8184cad8890..bcb6ca2d9a7c 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -911,6 +911,19 @@ static void do_signal(struct pt_regs *regs)
 	restore_saved_sigmask();
 }
 
+static bool cpu_affinity_invalid(struct pt_regs *regs)
+{
+	if (!compat_user_mode(regs))
+		return false;
+
+	/*
+	 * We're preemptible, but a reschedule will cause us to check the
+	 * affinity again.
+	 */
+	return !cpumask_test_cpu(raw_smp_processor_id(),
+				 system_32bit_el0_cpumask());
+}
+
 asmlinkage void do_notify_resume(struct pt_regs *regs,
 				 unsigned long thread_flags)
 {
@@ -948,6 +961,19 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 			if (thread_flags & _TIF_NOTIFY_RESUME) {
 				tracehook_notify_resume(regs);
 				rseq_handle_notify_resume(NULL, regs);
+
+				/*
+				 * If we reschedule after checking the affinity
+				 * then we must ensure that TIF_NOTIFY_RESUME
+				 * is set so that we check the affinity again.
+				 * Since tracehook_notify_resume() clears the
+				 * flag, ensure that the compiler doesn't move
+				 * it after the affinity check.
+				 */
+				barrier();
+
+				if (cpu_affinity_invalid(regs))
+					force_sig(SIGKILL);
 			}
 
 			if (thread_flags & _TIF_FOREIGN_FPSTATE)
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 05/14] arm64: Advertise CPUs capable of running 32-bit applications in sysfs
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (3 preceding siblings ...)
  2020-11-13  9:37 ` [PATCH v3 04/14] arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-13  9:37 ` [PATCH v3 06/14] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 Will Deacon
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Since 32-bit applications will be killed if they are caught trying to
execute on a 64-bit-only CPU in a mismatched system, advertise the set
of 32-bit capable CPUs to userspace in sysfs.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 .../ABI/testing/sysfs-devices-system-cpu      |  9 +++++++++
 arch/arm64/kernel/cpufeature.c                | 19 +++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1a04ca8162ad..8a2e377b0dde 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -493,6 +493,15 @@ Description:	AArch64 CPU registers
 		'identification' directory exposes the CPU ID registers for
 		identifying model and revision of the CPU.
 
+What:		/sys/devices/system/cpu/aarch32_el0
+Date:		November 2020
+Contact:	Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org>
+Description:	Identifies the subset of CPUs in the system that can execute
+		AArch32 (32-bit ARM) applications. If present, the same format as
+		/sys/devices/system/cpu/{offline,online,possible,present} is used.
+		If absent, then all or none of the CPUs can execute AArch32
+		applications and execve() will behave accordingly.
+
 What:		/sys/devices/system/cpu/cpu#/cpu_capacity
 Date:		December 2016
 Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 1aa01ecc133f..a7a9b7bffc49 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -67,6 +67,7 @@
 #include <linux/crash_dump.h>
 #include <linux/sort.h>
 #include <linux/stop_machine.h>
+#include <linux/sysfs.h>
 #include <linux/types.h>
 #include <linux/mm.h>
 #include <linux/cpu.h>
@@ -1272,6 +1273,24 @@ const struct cpumask *system_32bit_el0_cpumask(void)
 	return cpu_present_mask;
 }
 
+static ssize_t aarch32_el0_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	const struct cpumask *mask = system_32bit_el0_cpumask();
+
+	return sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(mask));
+}
+static const DEVICE_ATTR_RO(aarch32_el0);
+
+static int __init aarch32_el0_sysfs_init(void)
+{
+	if (!allow_mismatched_32bit_el0)
+		return 0;
+
+	return device_create_file(cpu_subsys.dev_root, &dev_attr_aarch32_el0);
+}
+device_initcall(aarch32_el0_sysfs_init);
+
 static bool has_32bit_el0(const struct arm64_cpu_capabilities *entry, int scope)
 {
 	if (!has_cpuid_feature(entry, scope))
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 06/14] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (4 preceding siblings ...)
  2020-11-13  9:37 ` [PATCH v3 05/14] arm64: Advertise CPUs capable of running 32-bit applications in sysfs Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-13  9:37 ` [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity Will Deacon
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Allow systems with mismatched 32-bit support at EL0 to run 32-bit
applications based on a new kernel parameter.

Signed-off-by: Will Deacon <will@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 7 +++++++
 arch/arm64/kernel/cpufeature.c                  | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 526d65d8573a..f20188c44d83 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -289,6 +289,13 @@
 			do not want to use tracing_snapshot_alloc() as it needs
 			to be done where GFP_KERNEL allocations are allowed.
 
+	allow_mismatched_32bit_el0 [ARM64]
+			Allow execve() of 32-bit applications and setting of the
+			PER_LINUX32 personality on systems where only a strict
+			subset of the CPUs support 32-bit EL0. When this
+			parameter is present, the set of CPUs supporting 32-bit
+			EL0 is indicated by /sys/devices/system/cpu/aarch32_el0.
+
 	amd_iommu=	[HW,X86-64]
 			Pass parameters to the AMD IOMMU driver in the system.
 			Possible values are:
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a7a9b7bffc49..dfded018af44 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1273,6 +1273,13 @@ const struct cpumask *system_32bit_el0_cpumask(void)
 	return cpu_present_mask;
 }
 
+static int __init parse_32bit_el0_param(char *str)
+{
+	allow_mismatched_32bit_el0 = true;
+	return 0;
+}
+early_param("allow_mismatched_32bit_el0", parse_32bit_el0_param);
+
 static ssize_t aarch32_el0_show(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (5 preceding siblings ...)
  2020-11-13  9:37 ` [PATCH v3 06/14] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-19  9:18   ` Quentin Perret
  2020-11-19 12:47   ` Valentin Schneider
  2020-11-13  9:37 ` [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

Although userspace can carefully manage the affinity masks for such
tasks, one place where it is particularly problematic is execve()
because the CPU on which the execve() is occurring may be incompatible
with the new application image. In such a situation, it is desirable to
restrict the affinity mask of the task and ensure that the new image is
entered on a compatible CPU. From userspace's point of view, this looks
the same as if the incompatible CPUs have been hotplugged off in its
affinity mask.

In preparation for restricting the affinity mask for compat tasks on
arm64 systems without uniform support for 32-bit applications, introduce
a restrict_cpus_allowed_ptr(), which allows the current affinity mask
for a task to be shrunk to the intersection of a parameter mask.

Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/sched.h |  1 +
 kernel/sched/core.c   | 73 ++++++++++++++++++++++++++++++++++---------
 2 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 063cd120b459..1cd12c3ce9ee 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1631,6 +1631,7 @@ extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
+extern int restrict_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *mask);
 #else
 static inline void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d2003a7d5ab5..818c8f7bdf2a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1860,24 +1860,18 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 }
 
 /*
- * Change a given task's CPU affinity. Migrate the thread to a
- * proper CPU and schedule it away if the CPU it's executing on
- * is removed from the allowed bitmask.
- *
- * NOTE: the caller must have a valid reference to the task, the
- * task must not exit() & deallocate itself prematurely. The
- * call is not atomic; no spinlocks may be held.
+ * Called with both p->pi_lock and rq->lock held; drops both before returning.
  */
-static int __set_cpus_allowed_ptr(struct task_struct *p,
-				  const struct cpumask *new_mask, bool check)
+static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
+					 const struct cpumask *new_mask,
+					 bool check,
+					 struct rq *rq,
+					 struct rq_flags *rf)
 {
 	const struct cpumask *cpu_valid_mask = cpu_active_mask;
 	unsigned int dest_cpu;
-	struct rq_flags rf;
-	struct rq *rq;
 	int ret = 0;
 
-	rq = task_rq_lock(p, &rf);
 	update_rq_clock(rq);
 
 	if (p->flags & PF_KTHREAD) {
@@ -1929,7 +1923,7 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 	if (task_running(rq, p) || p->state == TASK_WAKING) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
-		task_rq_unlock(rq, p, &rf);
+		task_rq_unlock(rq, p, rf);
 		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
 		return 0;
 	} else if (task_on_rq_queued(p)) {
@@ -1937,20 +1931,69 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 		 * OK, since we're going to drop the lock immediately
 		 * afterwards anyway.
 		 */
-		rq = move_queued_task(rq, &rf, p, dest_cpu);
+		rq = move_queued_task(rq, rf, p, dest_cpu);
 	}
 out:
-	task_rq_unlock(rq, p, &rf);
+	task_rq_unlock(rq, p, rf);
 
 	return ret;
 }
 
+/*
+ * Change a given task's CPU affinity. Migrate the thread to a
+ * proper CPU and schedule it away if the CPU it's executing on
+ * is removed from the allowed bitmask.
+ *
+ * NOTE: the caller must have a valid reference to the task, the
+ * task must not exit() & deallocate itself prematurely. The
+ * call is not atomic; no spinlocks may be held.
+ */
+static int __set_cpus_allowed_ptr(struct task_struct *p,
+				  const struct cpumask *new_mask, bool check)
+{
+	struct rq_flags rf;
+	struct rq *rq;
+
+	rq = task_rq_lock(p, &rf);
+	return __set_cpus_allowed_ptr_locked(p, new_mask, check, rq, &rf);
+}
+
 int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 {
 	return __set_cpus_allowed_ptr(p, new_mask, false);
 }
 EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);
 
+/*
+ * Change a given task's CPU affinity to the intersection of its current
+ * affinity mask and @subset_mask. If the resulting mask is empty, leave
+ * the affinity unchanged and return -EINVAL.
+ */
+int restrict_cpus_allowed_ptr(struct task_struct *p,
+			      const struct cpumask *subset_mask)
+{
+	struct rq_flags rf;
+	struct rq *rq;
+	cpumask_var_t new_mask;
+	int retval;
+
+	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	rq = task_rq_lock(p, &rf);
+	if (!cpumask_and(new_mask, &p->cpus_mask, subset_mask)) {
+		task_rq_unlock(rq, p, &rf);
+		retval = -EINVAL;
+		goto out_free_new_mask;
+	}
+
+	retval = __set_cpus_allowed_ptr_locked(p, new_mask, false, rq, &rf);
+
+out_free_new_mask:
+	free_cpumask_var(new_mask);
+	return retval;
+}
+
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
 #ifdef CONFIG_SCHED_DEBUG
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (6 preceding siblings ...)
  2020-11-13  9:37 ` [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-19  9:24   ` Quentin Perret
  2020-11-19 16:14   ` Peter Zijlstra
  2020-11-13  9:37 ` [PATCH v3 09/14] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

When exec'ing a 32-bit task on a system with mismatched support for
32-bit EL0, try to ensure that it starts life on a CPU that can actually
run it.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/process.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 1540ab0fbf23..17b94007fed4 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -625,6 +625,16 @@ unsigned long arch_align_stack(unsigned long sp)
 	return sp & ~0xf;
 }
 
+static void adjust_compat_task_affinity(struct task_struct *p)
+{
+	const struct cpumask *mask = system_32bit_el0_cpumask();
+
+	if (restrict_cpus_allowed_ptr(p, mask))
+		set_cpus_allowed_ptr(p, mask);
+
+	set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
+}
+
 /*
  * Called from setup_new_exec() after (COMPAT_)SET_PERSONALITY.
  */
@@ -635,7 +645,7 @@ void arch_setup_new_exec(void)
 	if (is_compat_task()) {
 		mmflags = MMCF_AARCH32;
 		if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
-			set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
+			adjust_compat_task_affinity(current);
 	}
 
 	current->mm->context.flags = mmflags;
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 09/14] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (7 preceding siblings ...)
  2020-11-13  9:37 ` [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-19  9:29   ` Quentin Perret
  2020-11-13  9:37 ` [PATCH v3 10/14] sched: Introduce arch_cpu_allowed_mask() to limit fallback rq selection Will Deacon
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

If the scheduler cannot find an allowed CPU for a task,
cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask
if cgroup v1 is in use.

In preparation for allowing architectures to provide their own fallback
mask, just return early if we're not using cgroup v2 and allow
select_fallback_rq() to figure out the mask by itself.

Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/cgroup/cpuset.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 57b5b5d0a5fd..e970737c3ed2 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3299,9 +3299,11 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 
 void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 {
+	if (!is_in_v2_mode())
+		return; /* select_fallback_rq will try harder */
+
 	rcu_read_lock();
-	do_set_cpus_allowed(tsk, is_in_v2_mode() ?
-		task_cs(tsk)->cpus_allowed : cpu_possible_mask);
+	do_set_cpus_allowed(tsk, task_cs(tsk)->cpus_allowed);
 	rcu_read_unlock();
 
 	/*
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 10/14] sched: Introduce arch_cpu_allowed_mask() to limit fallback rq selection
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (8 preceding siblings ...)
  2020-11-13  9:37 ` [PATCH v3 09/14] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-19  9:38   ` Quentin Perret
  2020-11-13  9:37 ` [PATCH v3 11/14] sched: Reject CPU affinity changes based on arch_cpu_allowed_mask() Will Deacon
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

On such a system, we must take care not to migrate a task to an
unsupported CPU when forcefully moving tasks in select_fallback_rq()
in response to a CPU hot-unplug operation.

Introduce an arch_cpu_allowed_mask() hook which, given a task argument,
allows an architecture to return a cpumask of CPUs that are capable of
executing that task. The default implementation returns the
cpu_possible_mask, since sane machines do not suffer from per-cpu ISA
limitations that affect scheduling. The new mask is used when selecting
the fallback runqueue as a last resort before forcing a migration to the
first active CPU.

Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/sched/core.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 818c8f7bdf2a..8df38ebfe769 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1696,6 +1696,11 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 
 #ifdef CONFIG_SMP
 
+/* Must contain at least one active CPU */
+#ifndef arch_cpu_allowed_mask
+#define  arch_cpu_allowed_mask(p)	cpu_possible_mask
+#endif
+
 /*
  * Per-CPU kthreads are allowed to run on !active && online CPUs, see
  * __set_cpus_allowed_ptr() and select_fallback_rq().
@@ -1708,7 +1713,10 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
 	if (is_per_cpu_kthread(p))
 		return cpu_online(cpu);
 
-	return cpu_active(cpu);
+	if (!cpu_active(cpu))
+		return false;
+
+	return cpumask_test_cpu(cpu, arch_cpu_allowed_mask(p));
 }
 
 /*
@@ -2361,10 +2369,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 			}
 			fallthrough;
 		case possible:
-			do_set_cpus_allowed(p, cpu_possible_mask);
+			do_set_cpus_allowed(p, arch_cpu_allowed_mask(p));
 			state = fail;
 			break;
-
 		case fail:
 			BUG();
 			break;
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 11/14] sched: Reject CPU affinity changes based on arch_cpu_allowed_mask()
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (9 preceding siblings ...)
  2020-11-13  9:37 ` [PATCH v3 10/14] sched: Introduce arch_cpu_allowed_mask() to limit fallback rq selection Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-19  9:47   ` Quentin Perret
  2020-11-13  9:37 ` [PATCH v3 12/14] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system Will Deacon
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Reject explicit requests to change the affinity mask of a task via
set_cpus_allowed_ptr() if the requested mask is not a subset of the
mask returned by arch_cpu_allowed_mask(). This ensures that the
'cpus_mask' for a given task cannot contain CPUs which are incapable of
executing it, except in cases where the affinity is forced.

Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/sched/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8df38ebfe769..13bdb2ae4d3f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1877,6 +1877,7 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
 					 struct rq_flags *rf)
 {
 	const struct cpumask *cpu_valid_mask = cpu_active_mask;
+	const struct cpumask *cpu_allowed_mask = arch_cpu_allowed_mask(p);
 	unsigned int dest_cpu;
 	int ret = 0;
 
@@ -1887,6 +1888,9 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
 		 * Kernel threads are allowed on online && !active CPUs
 		 */
 		cpu_valid_mask = cpu_online_mask;
+	} else if (!cpumask_subset(new_mask, cpu_allowed_mask)) {
+		ret = -EINVAL;
+		goto out;
 	}
 
 	/*
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 12/14] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (10 preceding siblings ...)
  2020-11-13  9:37 ` [PATCH v3 11/14] sched: Reject CPU affinity changes based on arch_cpu_allowed_mask() Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-13  9:37 ` [PATCH v3 13/14] arm64: Implement arch_cpu_allowed_mask() Will Deacon
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

If we want to support 32-bit applications, then when we identify a CPU
with mismatched 32-bit EL0 support we must ensure that we will always
have an active 32-bit CPU available to us from then on. This is important
for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
CPUs for compat tasks and forced migration due to a hotplug event will
hang if no 32-bit CPUs are available.

On detecting a mismatch, prevent offlining of either the mismatching CPU
if it is 32-bit capable, or find the first active 32-bit capable CPU
otherwise.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/cpufeature.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index dfded018af44..853f3ac876a3 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1237,6 +1237,8 @@ has_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
 
 static int enable_mismatched_32bit_el0(unsigned int cpu)
 {
+	static int lucky_winner = -1;
+
 	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
 	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
 
@@ -1245,6 +1247,22 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
 		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
 	}
 
+	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
+		return 0;
+
+	if (lucky_winner >= 0)
+		return 0;
+
+	/*
+	 * We've detected a mismatch. We need to keep one of our CPUs with
+	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
+	 * every CPU in the system for a 32-bit task.
+	 */
+	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
+							 cpu_active_mask);
+	get_cpu_device(lucky_winner)->offline_disabled = true;
+	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
+		cpu, lucky_winner);
 	return 0;
 }
 
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 13/14] arm64: Implement arch_cpu_allowed_mask()
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (11 preceding siblings ...)
  2020-11-13  9:37 ` [PATCH v3 12/14] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-13  9:37 ` [PATCH v3 14/14] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores Will Deacon
  2020-11-19 16:11 ` [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Peter Zijlstra
  14 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Provide an implementation of arch_cpu_allowed_mask() so that we can
prevent 64-bit-only cores being added to the 'cpus_mask' for compat
tasks on systems with mismatched 32-bit support at EL0,

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/mmu_context.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 0672236e1aea..54fc469e7db9 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -251,6 +251,18 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 #define deactivate_mm(tsk,mm)	do { } while (0)
 #define activate_mm(prev,next)	switch_mm(prev, next, current)
 
+static inline const struct cpumask *arch_cpu_allowed_mask(struct task_struct *p)
+{
+	if (!static_branch_unlikely(&arm64_mismatched_32bit_el0))
+		return cpu_possible_mask;
+
+	if (!is_compat_thread(task_thread_info(p)))
+		return cpu_possible_mask;
+
+	return system_32bit_el0_cpumask();
+}
+#define arch_cpu_allowed_mask	arch_cpu_allowed_mask
+
 void verify_cpu_asid_bits(void);
 void post_ttbr_update_workaround(void);
 
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 14/14] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (12 preceding siblings ...)
  2020-11-13  9:37 ` [PATCH v3 13/14] arm64: Implement arch_cpu_allowed_mask() Will Deacon
@ 2020-11-13  9:37 ` Will Deacon
  2020-11-19 16:11 ` [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Peter Zijlstra
  14 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-13  9:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

The scheduler now knows enough about these braindead systems to place
32-bit tasks accordingly, so throw out the safety checks and allow the
ret-to-user path to avoid do_notify_resume() if there is nothing to do.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/process.c | 12 ------------
 arch/arm64/kernel/signal.c  | 26 --------------------------
 2 files changed, 38 deletions(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 17b94007fed4..dba94af1b840 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -542,15 +542,6 @@ static void erratum_1418040_thread_switch(struct task_struct *prev,
 	write_sysreg(val, cntkctl_el1);
 }
 
-static void compat_thread_switch(struct task_struct *next)
-{
-	if (!is_compat_thread(task_thread_info(next)))
-		return;
-
-	if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
-		set_tsk_thread_flag(next, TIF_NOTIFY_RESUME);
-}
-
 /*
  * Thread switching.
  */
@@ -567,7 +558,6 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	uao_thread_switch(next);
 	ssbs_thread_switch(next);
 	erratum_1418040_thread_switch(prev, next);
-	compat_thread_switch(next);
 
 	/*
 	 * Complete any pending TLB or cache maintenance on this CPU in case
@@ -631,8 +621,6 @@ static void adjust_compat_task_affinity(struct task_struct *p)
 
 	if (restrict_cpus_allowed_ptr(p, mask))
 		set_cpus_allowed_ptr(p, mask);
-
-	set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
 }
 
 /*
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index bcb6ca2d9a7c..a8184cad8890 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -911,19 +911,6 @@ static void do_signal(struct pt_regs *regs)
 	restore_saved_sigmask();
 }
 
-static bool cpu_affinity_invalid(struct pt_regs *regs)
-{
-	if (!compat_user_mode(regs))
-		return false;
-
-	/*
-	 * We're preemptible, but a reschedule will cause us to check the
-	 * affinity again.
-	 */
-	return !cpumask_test_cpu(raw_smp_processor_id(),
-				 system_32bit_el0_cpumask());
-}
-
 asmlinkage void do_notify_resume(struct pt_regs *regs,
 				 unsigned long thread_flags)
 {
@@ -961,19 +948,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 			if (thread_flags & _TIF_NOTIFY_RESUME) {
 				tracehook_notify_resume(regs);
 				rseq_handle_notify_resume(NULL, regs);
-
-				/*
-				 * If we reschedule after checking the affinity
-				 * then we must ensure that TIF_NOTIFY_RESUME
-				 * is set so that we check the affinity again.
-				 * Since tracehook_notify_resume() clears the
-				 * flag, ensure that the compiler doesn't move
-				 * it after the affinity check.
-				 */
-				barrier();
-
-				if (cpu_affinity_invalid(regs))
-					force_sig(SIGKILL);
 			}
 
 			if (thread_flags & _TIF_FOREIGN_FPSTATE)
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-13  9:37 ` [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity Will Deacon
@ 2020-11-19  9:18   ` Quentin Perret
  2020-11-19 11:03     ` Quentin Perret
  2020-11-19 11:05     ` Will Deacon
  2020-11-19 12:47   ` Valentin Schneider
  1 sibling, 2 replies; 52+ messages in thread
From: Quentin Perret @ 2020-11-19  9:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

Hey Will,

On Friday 13 Nov 2020 at 09:37:12 (+0000), Will Deacon wrote:
> -static int __set_cpus_allowed_ptr(struct task_struct *p,
> -				  const struct cpumask *new_mask, bool check)
> +static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
> +					 const struct cpumask *new_mask,
> +					 bool check,
> +					 struct rq *rq,
> +					 struct rq_flags *rf)
>  {
>  	const struct cpumask *cpu_valid_mask = cpu_active_mask;
>  	unsigned int dest_cpu;
> -	struct rq_flags rf;
> -	struct rq *rq;
>  	int ret = 0;

Should we have a lockdep assertion here?

> -	rq = task_rq_lock(p, &rf);
>  	update_rq_clock(rq);
>  
>  	if (p->flags & PF_KTHREAD) {
> @@ -1929,7 +1923,7 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
>  	if (task_running(rq, p) || p->state == TASK_WAKING) {
>  		struct migration_arg arg = { p, dest_cpu };
>  		/* Need help from migration thread: drop lock and wait. */
> -		task_rq_unlock(rq, p, &rf);
> +		task_rq_unlock(rq, p, rf);
>  		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>  		return 0;
>  	} else if (task_on_rq_queued(p)) {
> @@ -1937,20 +1931,69 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
>  		 * OK, since we're going to drop the lock immediately
>  		 * afterwards anyway.
>  		 */
> -		rq = move_queued_task(rq, &rf, p, dest_cpu);
> +		rq = move_queued_task(rq, rf, p, dest_cpu);
>  	}
>  out:
> -	task_rq_unlock(rq, p, &rf);
> +	task_rq_unlock(rq, p, rf);

And that's a little odd to have here no? Can we move it back on the
caller's side?

>  	return ret;
>  }

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2020-11-13  9:37 ` [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
@ 2020-11-19  9:24   ` Quentin Perret
  2020-11-19 11:06     ` Will Deacon
  2020-11-19 16:14   ` Peter Zijlstra
  1 sibling, 1 reply; 52+ messages in thread
From: Quentin Perret @ 2020-11-19  9:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Friday 13 Nov 2020 at 09:37:13 (+0000), Will Deacon wrote:
> When exec'ing a 32-bit task on a system with mismatched support for
> 32-bit EL0, try to ensure that it starts life on a CPU that can actually
> run it.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kernel/process.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 1540ab0fbf23..17b94007fed4 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -625,6 +625,16 @@ unsigned long arch_align_stack(unsigned long sp)
>  	return sp & ~0xf;
>  }
>  
> +static void adjust_compat_task_affinity(struct task_struct *p)
> +{
> +	const struct cpumask *mask = system_32bit_el0_cpumask();
> +
> +	if (restrict_cpus_allowed_ptr(p, mask))
> +		set_cpus_allowed_ptr(p, mask);

My understanding of this call to set_cpus_allowed_ptr() is that you're
mimicking the hotplug vs affinity case behaviour in some ways. That is,
if a task is pinned to a CPU and userspace hotplugs that CPU, then the
kernel will reset the affinity of the task to the remaining online CPUs.

I guess that is a sensible fallback path when userspace gives
contradictory commands to the kernel, but that most certainly deserves a
comment :)

> +
> +	set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
> +}
>  
>  /*
>   * Called from setup_new_exec() after (COMPAT_)SET_PERSONALITY.
>   */
> @@ -635,7 +645,7 @@ void arch_setup_new_exec(void)
>  	if (is_compat_task()) {
>  		mmflags = MMCF_AARCH32;
>  		if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
> -			set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
> +			adjust_compat_task_affinity(current);
>  	}
>  
>  	current->mm->context.flags = mmflags;
> -- 
> 2.29.2.299.gdc1121823c-goog
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 09/14] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  2020-11-13  9:37 ` [PATCH v3 09/14] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
@ 2020-11-19  9:29   ` Quentin Perret
  2020-11-19 11:06     ` Will Deacon
  0 siblings, 1 reply; 52+ messages in thread
From: Quentin Perret @ 2020-11-19  9:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Friday 13 Nov 2020 at 09:37:14 (+0000), Will Deacon wrote:
> If the scheduler cannot find an allowed CPU for a task,
> cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask
> if cgroup v1 is in use.
> 
> In preparation for allowing architectures to provide their own fallback
> mask, just return early if we're not using cgroup v2 and allow
> select_fallback_rq() to figure out the mask by itself.
> 
> Cc: Li Zefan <lizefan@huawei.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Will Deacon <will@kernel.org>

That makes select_fallback_rq() slightly more expensive if you're using
cgroup v1, but I don't expect that be really measurable in real-world
workloads, so:

Reviewed-by: Quentin Perret <qperret@google.com>

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 10/14] sched: Introduce arch_cpu_allowed_mask() to limit fallback rq selection
  2020-11-13  9:37 ` [PATCH v3 10/14] sched: Introduce arch_cpu_allowed_mask() to limit fallback rq selection Will Deacon
@ 2020-11-19  9:38   ` Quentin Perret
  2020-11-19 11:07     ` Will Deacon
  0 siblings, 1 reply; 52+ messages in thread
From: Quentin Perret @ 2020-11-19  9:38 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Friday 13 Nov 2020 at 09:37:15 (+0000), Will Deacon wrote:
> Asymmetric systems may not offer the same level of userspace ISA support
> across all CPUs, meaning that some applications cannot be executed by
> some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
> not feature support for 32-bit applications on both clusters.
> 
> On such a system, we must take care not to migrate a task to an
> unsupported CPU when forcefully moving tasks in select_fallback_rq()
> in response to a CPU hot-unplug operation.
> 
> Introduce an arch_cpu_allowed_mask() hook which, given a task argument,
> allows an architecture to return a cpumask of CPUs that are capable of
> executing that task. The default implementation returns the
> cpu_possible_mask, since sane machines do not suffer from per-cpu ISA
> limitations that affect scheduling. The new mask is used when selecting
> the fallback runqueue as a last resort before forcing a migration to the
> first active CPU.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  kernel/sched/core.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 818c8f7bdf2a..8df38ebfe769 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1696,6 +1696,11 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
>  
>  #ifdef CONFIG_SMP
>  
> +/* Must contain at least one active CPU */
> +#ifndef arch_cpu_allowed_mask
> +#define  arch_cpu_allowed_mask(p)	cpu_possible_mask
> +#endif
> +
>  /*
>   * Per-CPU kthreads are allowed to run on !active && online CPUs, see
>   * __set_cpus_allowed_ptr() and select_fallback_rq().
> @@ -1708,7 +1713,10 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
>  	if (is_per_cpu_kthread(p))
>  		return cpu_online(cpu);
>  
> -	return cpu_active(cpu);
> +	if (!cpu_active(cpu))
> +		return false;
> +
> +	return cpumask_test_cpu(cpu, arch_cpu_allowed_mask(p));
>  }
>  
>  /*
> @@ -2361,10 +2369,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
>  			}
>  			fallthrough;
>  		case possible:
> -			do_set_cpus_allowed(p, cpu_possible_mask);
> +			do_set_cpus_allowed(p, arch_cpu_allowed_mask(p));

Nit: I'm wondering if this should be called arch_cpu_possible_mask()
instead?

In any case:

Reviewed-by: Quentin Perret <qperret@google.com?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 11/14] sched: Reject CPU affinity changes based on arch_cpu_allowed_mask()
  2020-11-13  9:37 ` [PATCH v3 11/14] sched: Reject CPU affinity changes based on arch_cpu_allowed_mask() Will Deacon
@ 2020-11-19  9:47   ` Quentin Perret
  2020-11-19 11:07     ` Will Deacon
  0 siblings, 1 reply; 52+ messages in thread
From: Quentin Perret @ 2020-11-19  9:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Friday 13 Nov 2020 at 09:37:16 (+0000), Will Deacon wrote:
> Reject explicit requests to change the affinity mask of a task via
> set_cpus_allowed_ptr() if the requested mask is not a subset of the
> mask returned by arch_cpu_allowed_mask(). This ensures that the
> 'cpus_mask' for a given task cannot contain CPUs which are incapable of
> executing it, except in cases where the affinity is forced.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  kernel/sched/core.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8df38ebfe769..13bdb2ae4d3f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1877,6 +1877,7 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
>  					 struct rq_flags *rf)
>  {
>  	const struct cpumask *cpu_valid_mask = cpu_active_mask;
> +	const struct cpumask *cpu_allowed_mask = arch_cpu_allowed_mask(p);
>  	unsigned int dest_cpu;
>  	int ret = 0;
>  
> @@ -1887,6 +1888,9 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
>  		 * Kernel threads are allowed on online && !active CPUs
>  		 */
>  		cpu_valid_mask = cpu_online_mask;
> +	} else if (!cpumask_subset(new_mask, cpu_allowed_mask)) {
> +		ret = -EINVAL;
> +		goto out;

So, IIUC, this should make the sched_setaffinity() syscall fail and
return -EINVAL to userspace if it tries to put 64bits CPUs in the
affinity mask of a 32 bits task, which I think makes sense.

But what about affinity change via cpusets? e.g., if a 32 bit task is
migrated to a cpuset with 64 bit CPUs, then the migration will be
'successful' and the task will appear to be in the destination cgroup,
but the actual affinity of the task will be something completely
different?

That's a bit yuck, but I'm not sure what else can be done here :/

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-19  9:18   ` Quentin Perret
@ 2020-11-19 11:03     ` Quentin Perret
  2020-11-19 11:05     ` Will Deacon
  1 sibling, 0 replies; 52+ messages in thread
From: Quentin Perret @ 2020-11-19 11:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thursday 19 Nov 2020 at 09:18:20 (+0000), Quentin Perret wrote:
> Hey Will,
> 
> On Friday 13 Nov 2020 at 09:37:12 (+0000), Will Deacon wrote:
> > -static int __set_cpus_allowed_ptr(struct task_struct *p,
> > -				  const struct cpumask *new_mask, bool check)
> > +static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
> > +					 const struct cpumask *new_mask,
> > +					 bool check,
> > +					 struct rq *rq,
> > +					 struct rq_flags *rf)
> >  {
> >  	const struct cpumask *cpu_valid_mask = cpu_active_mask;
> >  	unsigned int dest_cpu;
> > -	struct rq_flags rf;
> > -	struct rq *rq;
> >  	int ret = 0;
> 
> Should we have a lockdep assertion here?
> 
> > -	rq = task_rq_lock(p, &rf);
> >  	update_rq_clock(rq);
> >  
> >  	if (p->flags & PF_KTHREAD) {
> > @@ -1929,7 +1923,7 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
> >  	if (task_running(rq, p) || p->state == TASK_WAKING) {
> >  		struct migration_arg arg = { p, dest_cpu };
> >  		/* Need help from migration thread: drop lock and wait. */
> > -		task_rq_unlock(rq, p, &rf);
> > +		task_rq_unlock(rq, p, rf);
> >  		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> >  		return 0;
> >  	} else if (task_on_rq_queued(p)) {
> > @@ -1937,20 +1931,69 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
> >  		 * OK, since we're going to drop the lock immediately
> >  		 * afterwards anyway.
> >  		 */
> > -		rq = move_queued_task(rq, &rf, p, dest_cpu);
> > +		rq = move_queued_task(rq, rf, p, dest_cpu);
> >  	}
> >  out:
> > -	task_rq_unlock(rq, p, &rf);
> > +	task_rq_unlock(rq, p, rf);
> 
> And that's a little odd to have here no? Can we move it back on the
> caller's side?

Yeah, no, that obviously doesn't work for the stop_one_cpu() call above,
so feel free to ignore ...

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-19  9:18   ` Quentin Perret
  2020-11-19 11:03     ` Quentin Perret
@ 2020-11-19 11:05     ` Will Deacon
  2020-11-19 11:27       ` Valentin Schneider
  1 sibling, 1 reply; 52+ messages in thread
From: Will Deacon @ 2020-11-19 11:05 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

Hi Quentin,

Thanks for having a look.

On Thu, Nov 19, 2020 at 09:18:20AM +0000, Quentin Perret wrote:
> On Friday 13 Nov 2020 at 09:37:12 (+0000), Will Deacon wrote:
> > -static int __set_cpus_allowed_ptr(struct task_struct *p,
> > -				  const struct cpumask *new_mask, bool check)
> > +static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
> > +					 const struct cpumask *new_mask,
> > +					 bool check,
> > +					 struct rq *rq,
> > +					 struct rq_flags *rf)
> >  {
> >  	const struct cpumask *cpu_valid_mask = cpu_active_mask;
> >  	unsigned int dest_cpu;
> > -	struct rq_flags rf;
> > -	struct rq *rq;
> >  	int ret = 0;
> 
> Should we have a lockdep assertion here?

I pondered that, but I don't think it's necessary because we already have
one in do_set_cpus_allowed() so adding an extra one here doesn't really add
anything.

> > -	rq = task_rq_lock(p, &rf);
> >  	update_rq_clock(rq);
> >  
> >  	if (p->flags & PF_KTHREAD) {
> > @@ -1929,7 +1923,7 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
> >  	if (task_running(rq, p) || p->state == TASK_WAKING) {
> >  		struct migration_arg arg = { p, dest_cpu };
> >  		/* Need help from migration thread: drop lock and wait. */
> > -		task_rq_unlock(rq, p, &rf);
> > +		task_rq_unlock(rq, p, rf);
> >  		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> >  		return 0;
> >  	} else if (task_on_rq_queued(p)) {
> > @@ -1937,20 +1931,69 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
> >  		 * OK, since we're going to drop the lock immediately
> >  		 * afterwards anyway.
> >  		 */
> > -		rq = move_queued_task(rq, &rf, p, dest_cpu);
> > +		rq = move_queued_task(rq, rf, p, dest_cpu);
> >  	}
> >  out:
> > -	task_rq_unlock(rq, p, &rf);
> > +	task_rq_unlock(rq, p, rf);
> 
> And that's a little odd to have here no? Can we move it back on the
> caller's side?

I don't think so, unfortunately. __set_cpus_allowed_ptr_locked() can trigger
migration, so it can drop the rq lock as part of that and end up relocking a
new rq, which it also unlocks before returning. Doing the unlock in the
caller is therfore even weirder, because you'd have to return the lock
pointer or something horrible like that.

I did add a comment about this right before the function and it's an
internal function to the scheduler so I think it's ok.

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2020-11-19  9:24   ` Quentin Perret
@ 2020-11-19 11:06     ` Will Deacon
  2020-11-19 16:19       ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Will Deacon @ 2020-11-19 11:06 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 09:24:07AM +0000, Quentin Perret wrote:
> On Friday 13 Nov 2020 at 09:37:13 (+0000), Will Deacon wrote:
> > When exec'ing a 32-bit task on a system with mismatched support for
> > 32-bit EL0, try to ensure that it starts life on a CPU that can actually
> > run it.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kernel/process.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > index 1540ab0fbf23..17b94007fed4 100644
> > --- a/arch/arm64/kernel/process.c
> > +++ b/arch/arm64/kernel/process.c
> > @@ -625,6 +625,16 @@ unsigned long arch_align_stack(unsigned long sp)
> >  	return sp & ~0xf;
> >  }
> >  
> > +static void adjust_compat_task_affinity(struct task_struct *p)
> > +{
> > +	const struct cpumask *mask = system_32bit_el0_cpumask();
> > +
> > +	if (restrict_cpus_allowed_ptr(p, mask))
> > +		set_cpus_allowed_ptr(p, mask);
> 
> My understanding of this call to set_cpus_allowed_ptr() is that you're
> mimicking the hotplug vs affinity case behaviour in some ways. That is,
> if a task is pinned to a CPU and userspace hotplugs that CPU, then the
> kernel will reset the affinity of the task to the remaining online CPUs.

Correct. It looks to the 32-bit application like all the 64-bit-only CPUs
were hotplugged off at the point of the execve().

> I guess that is a sensible fallback path when userspace gives
> contradictory commands to the kernel, but that most certainly deserves a
> comment :)

Good point, I'll add this:

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index dba94af1b840..687d6acf2f81 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -619,6 +619,16 @@ static void adjust_compat_task_affinity(struct task_struct *p)
 {
 	const struct cpumask *mask = system_32bit_el0_cpumask();
 
+	/*
+	 * Restrict the CPU affinity mask for a 32-bit task so that it contains
+	 * only the 32-bit-capable subset of its original CPU mask. If this is
+	 * empty, then forcefully override it with the set of all
+	 * 32-bit-capable CPUs that we know about.
+	 *
+	 * From the perspective of the task, this looks similar to what would
+	 * happen if the 64-bit-only CPUs were hot-unplugged at the point of
+	 * execve().
+	 */
 	if (restrict_cpus_allowed_ptr(p, mask))
 		set_cpus_allowed_ptr(p, mask);
 }

Will

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 09/14] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  2020-11-19  9:29   ` Quentin Perret
@ 2020-11-19 11:06     ` Will Deacon
  0 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-19 11:06 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 09:29:22AM +0000, Quentin Perret wrote:
> On Friday 13 Nov 2020 at 09:37:14 (+0000), Will Deacon wrote:
> > If the scheduler cannot find an allowed CPU for a task,
> > cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask
> > if cgroup v1 is in use.
> > 
> > In preparation for allowing architectures to provide their own fallback
> > mask, just return early if we're not using cgroup v2 and allow
> > select_fallback_rq() to figure out the mask by itself.
> > 
> > Cc: Li Zefan <lizefan@huawei.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Will Deacon <will@kernel.org>
> 
> That makes select_fallback_rq() slightly more expensive if you're using
> cgroup v1, but I don't expect that be really measurable in real-world
> workloads, so:
> 
> Reviewed-by: Quentin Perret <qperret@google.com>

Cheers!

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 10/14] sched: Introduce arch_cpu_allowed_mask() to limit fallback rq selection
  2020-11-19  9:38   ` Quentin Perret
@ 2020-11-19 11:07     ` Will Deacon
  2020-11-19 20:39       ` Will Deacon
  0 siblings, 1 reply; 52+ messages in thread
From: Will Deacon @ 2020-11-19 11:07 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 09:38:50AM +0000, Quentin Perret wrote:
> On Friday 13 Nov 2020 at 09:37:15 (+0000), Will Deacon wrote:
> > Asymmetric systems may not offer the same level of userspace ISA support
> > across all CPUs, meaning that some applications cannot be executed by
> > some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
> > not feature support for 32-bit applications on both clusters.
> > 
> > On such a system, we must take care not to migrate a task to an
> > unsupported CPU when forcefully moving tasks in select_fallback_rq()
> > in response to a CPU hot-unplug operation.
> > 
> > Introduce an arch_cpu_allowed_mask() hook which, given a task argument,
> > allows an architecture to return a cpumask of CPUs that are capable of
> > executing that task. The default implementation returns the
> > cpu_possible_mask, since sane machines do not suffer from per-cpu ISA
> > limitations that affect scheduling. The new mask is used when selecting
> > the fallback runqueue as a last resort before forcing a migration to the
> > first active CPU.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  kernel/sched/core.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 818c8f7bdf2a..8df38ebfe769 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1696,6 +1696,11 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
> >  
> >  #ifdef CONFIG_SMP
> >  
> > +/* Must contain at least one active CPU */
> > +#ifndef arch_cpu_allowed_mask
> > +#define  arch_cpu_allowed_mask(p)	cpu_possible_mask
> > +#endif
> > +
> >  /*
> >   * Per-CPU kthreads are allowed to run on !active && online CPUs, see
> >   * __set_cpus_allowed_ptr() and select_fallback_rq().
> > @@ -1708,7 +1713,10 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
> >  	if (is_per_cpu_kthread(p))
> >  		return cpu_online(cpu);
> >  
> > -	return cpu_active(cpu);
> > +	if (!cpu_active(cpu))
> > +		return false;
> > +
> > +	return cpumask_test_cpu(cpu, arch_cpu_allowed_mask(p));
> >  }
> >  
> >  /*
> > @@ -2361,10 +2369,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
> >  			}
> >  			fallthrough;
> >  		case possible:
> > -			do_set_cpus_allowed(p, cpu_possible_mask);
> > +			do_set_cpus_allowed(p, arch_cpu_allowed_mask(p));
> 
> Nit: I'm wondering if this should be called arch_cpu_possible_mask()
> instead?

I'm open to renaming it, so if nobody else has any better ideas then I'll
go with this.

> In any case:
> 
> Reviewed-by: Quentin Perret <qperret@google.com?

Ta!

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 11/14] sched: Reject CPU affinity changes based on arch_cpu_allowed_mask()
  2020-11-19  9:47   ` Quentin Perret
@ 2020-11-19 11:07     ` Will Deacon
  2020-11-19 14:30       ` Quentin Perret
  0 siblings, 1 reply; 52+ messages in thread
From: Will Deacon @ 2020-11-19 11:07 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 09:47:44AM +0000, Quentin Perret wrote:
> On Friday 13 Nov 2020 at 09:37:16 (+0000), Will Deacon wrote:
> > Reject explicit requests to change the affinity mask of a task via
> > set_cpus_allowed_ptr() if the requested mask is not a subset of the
> > mask returned by arch_cpu_allowed_mask(). This ensures that the
> > 'cpus_mask' for a given task cannot contain CPUs which are incapable of
> > executing it, except in cases where the affinity is forced.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  kernel/sched/core.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 8df38ebfe769..13bdb2ae4d3f 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1877,6 +1877,7 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
> >  					 struct rq_flags *rf)
> >  {
> >  	const struct cpumask *cpu_valid_mask = cpu_active_mask;
> > +	const struct cpumask *cpu_allowed_mask = arch_cpu_allowed_mask(p);
> >  	unsigned int dest_cpu;
> >  	int ret = 0;
> >  
> > @@ -1887,6 +1888,9 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
> >  		 * Kernel threads are allowed on online && !active CPUs
> >  		 */
> >  		cpu_valid_mask = cpu_online_mask;
> > +	} else if (!cpumask_subset(new_mask, cpu_allowed_mask)) {
> > +		ret = -EINVAL;
> > +		goto out;
> 
> So, IIUC, this should make the sched_setaffinity() syscall fail and
> return -EINVAL to userspace if it tries to put 64bits CPUs in the
> affinity mask of a 32 bits task, which I think makes sense.
> 
> But what about affinity change via cpusets? e.g., if a 32 bit task is
> migrated to a cpuset with 64 bit CPUs, then the migration will be
> 'successful' and the task will appear to be in the destination cgroup,
> but the actual affinity of the task will be something completely
> different?

Yeah, the cpuset code ignores the return value of set_cpus_allowed_ptr() in
update_tasks_cpumask() so the failure won't be propagated, but then again I
think that might be the right thing to do. Nothing prevents 32-bit and
64-bit tasks from co-existing in the same cpuseti afaict, so forcing the
64-bit tasks onto the 32-bit-capable cores feels much worse than the
approach taken here imo. Nothing says we _have_ to schedule on all of the
cores in the mask.

The interesting case is what happens if the cpuset for a 32-bit task is
changed to contain only the 64-bit-only cores. I think that's a userspace
bug, but the fallback rq selection should avert disaster.

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/14] arm64: Allow mismatched 32-bit EL0 support
  2020-11-13  9:37 ` [PATCH v3 02/14] arm64: Allow mismatched 32-bit EL0 support Will Deacon
@ 2020-11-19 11:27   ` Valentin Schneider
  2020-11-19 13:12     ` Will Deacon
  0 siblings, 1 reply; 52+ messages in thread
From: Valentin Schneider @ 2020-11-19 11:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team


On 13/11/20 09:37, Will Deacon wrote:
> +const struct cpumask *system_32bit_el0_cpumask(void)
> +{
> +	if (!system_supports_32bit_el0())
> +		return cpu_none_mask;
> +
> +	if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
> +		return cpu_32bit_el0_mask;
> +
> +	return cpu_present_mask;
> +}

Nit: this is used in patch 13 to implement arch_cpu_allowed_mask(). Since
that latter defaults to cpu_possible_mask, this probably should too.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-19 11:05     ` Will Deacon
@ 2020-11-19 11:27       ` Valentin Schneider
  2020-11-19 13:13         ` Will Deacon
  0 siblings, 1 reply; 52+ messages in thread
From: Valentin Schneider @ 2020-11-19 11:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: Quentin Perret, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Peter Zijlstra, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team


On 19/11/20 11:05, Will Deacon wrote:
> On Thu, Nov 19, 2020 at 09:18:20AM +0000, Quentin Perret wrote:
>> > @@ -1937,20 +1931,69 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
>> >             * OK, since we're going to drop the lock immediately
>> >             * afterwards anyway.
>> >             */
>> > -		rq = move_queued_task(rq, &rf, p, dest_cpu);
>> > +		rq = move_queued_task(rq, rf, p, dest_cpu);
>> >    }
>> >  out:
>> > -	task_rq_unlock(rq, p, &rf);
>> > +	task_rq_unlock(rq, p, rf);
>>
>> And that's a little odd to have here no? Can we move it back on the
>> caller's side?
>
> I don't think so, unfortunately. __set_cpus_allowed_ptr_locked() can trigger
> migration, so it can drop the rq lock as part of that and end up relocking a
> new rq, which it also unlocks before returning. Doing the unlock in the
> caller is therfore even weirder, because you'd have to return the lock
> pointer or something horrible like that.
>
> I did add a comment about this right before the function and it's an
> internal function to the scheduler so I think it's ok.
>

An alternative here would be to add a new SCA_RESTRICT flag for
__set_cpus_allowed_ptr() (see migrate_disable() faff in
tip/sched/core). Not fond of either approaches, but the flag thing would
avoid this "quirk".

> Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-13  9:37 ` [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity Will Deacon
  2020-11-19  9:18   ` Quentin Perret
@ 2020-11-19 12:47   ` Valentin Schneider
  2020-11-19 13:13     ` Will Deacon
  1 sibling, 1 reply; 52+ messages in thread
From: Valentin Schneider @ 2020-11-19 12:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team


On 13/11/20 09:37, Will Deacon wrote:
> Asymmetric systems may not offer the same level of userspace ISA support
> across all CPUs, meaning that some applications cannot be executed by
> some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
> not feature support for 32-bit applications on both clusters.
>
> Although userspace can carefully manage the affinity masks for such
> tasks, one place where it is particularly problematic is execve()
> because the CPU on which the execve() is occurring may be incompatible
> with the new application image. In such a situation, it is desirable to
> restrict the affinity mask of the task and ensure that the new image is
> entered on a compatible CPU.

> From userspace's point of view, this looks the same as if the
> incompatible CPUs have been hotplugged off in its affinity mask.

{pedantic reading warning}

Hotplugged CPUs *can* be set in a task's affinity mask, though interact
weirdly with cpusets [1]. Having it be the same as hotplug would mean
keeping incompatible CPUs allowed in the affinity mask, but preventing them
from being picked via e.g. is_cpu_allowed().

I was actually hoping this could be a feasible approach, i.e. have an
extra CPU active mask filter for any task:

  cpu_active_mask & arch_cpu_allowed_mask(p)

rather than fiddle with task affinity. Sadly this would also require fixing
up pretty much any place that uses cpu_active_mask(), and probably places
that use p->cpus_ptr as well. RT / DL balancing comes to mind, because that
uses either a task's affinity or a CPU's root domain (which reflects the
cpu_active_mask) to figure out where to push a task.

So while I'm wary of hacking up affinity, I fear it might be the lesser
evil :(

[1]: https://lore.kernel.org/lkml/1251528473.590671.1579196495905.JavaMail.zimbra@efficios.com/

> In preparation for restricting the affinity mask for compat tasks on
> arm64 systems without uniform support for 32-bit applications, introduce
> a restrict_cpus_allowed_ptr(), which allows the current affinity mask
> for a task to be shrunk to the intersection of a parameter mask.
>
> Signed-off-by: Will Deacon <will@kernel.org>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 02/14] arm64: Allow mismatched 32-bit EL0 support
  2020-11-19 11:27   ` Valentin Schneider
@ 2020-11-19 13:12     ` Will Deacon
  0 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-19 13:12 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

Hi Valentin,

Thanks for the review.

On Thu, Nov 19, 2020 at 11:27:41AM +0000, Valentin Schneider wrote:
> 
> On 13/11/20 09:37, Will Deacon wrote:
> > +const struct cpumask *system_32bit_el0_cpumask(void)
> > +{
> > +	if (!system_supports_32bit_el0())
> > +		return cpu_none_mask;
> > +
> > +	if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
> > +		return cpu_32bit_el0_mask;
> > +
> > +	return cpu_present_mask;
> > +}
> 
> Nit: this is used in patch 13 to implement arch_cpu_allowed_mask(). Since
> that latter defaults to cpu_possible_mask, this probably should too.

My original thinking was that, in a system where 32-bit EL0 support is
detected at boot and we're not handling mismatches, then it would be nice
to avoid saying that late CPUs are all 32-bit capable given that they will
fail to be onlined if they're not.

However, the reality is that we don't currently distinguish between the
present and possible masks on arm64 so it doesn't make any difference. It's
also not useful to userspace, because if the cores aren't online then so
what? Your observation above is another nail in the coffin, so I'll change
this to the possible mask as you suggest.

Cheers,

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-19 11:27       ` Valentin Schneider
@ 2020-11-19 13:13         ` Will Deacon
  2020-11-19 14:54           ` Valentin Schneider
  0 siblings, 1 reply; 52+ messages in thread
From: Will Deacon @ 2020-11-19 13:13 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Quentin Perret, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Peter Zijlstra, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 11:27:55AM +0000, Valentin Schneider wrote:
> 
> On 19/11/20 11:05, Will Deacon wrote:
> > On Thu, Nov 19, 2020 at 09:18:20AM +0000, Quentin Perret wrote:
> >> > @@ -1937,20 +1931,69 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
> >> >             * OK, since we're going to drop the lock immediately
> >> >             * afterwards anyway.
> >> >             */
> >> > -		rq = move_queued_task(rq, &rf, p, dest_cpu);
> >> > +		rq = move_queued_task(rq, rf, p, dest_cpu);
> >> >    }
> >> >  out:
> >> > -	task_rq_unlock(rq, p, &rf);
> >> > +	task_rq_unlock(rq, p, rf);
> >>
> >> And that's a little odd to have here no? Can we move it back on the
> >> caller's side?
> >
> > I don't think so, unfortunately. __set_cpus_allowed_ptr_locked() can trigger
> > migration, so it can drop the rq lock as part of that and end up relocking a
> > new rq, which it also unlocks before returning. Doing the unlock in the
> > caller is therfore even weirder, because you'd have to return the lock
> > pointer or something horrible like that.
> >
> > I did add a comment about this right before the function and it's an
> > internal function to the scheduler so I think it's ok.
> >
> 
> An alternative here would be to add a new SCA_RESTRICT flag for
> __set_cpus_allowed_ptr() (see migrate_disable() faff in
> tip/sched/core). Not fond of either approaches, but the flag thing would
> avoid this "quirk".

I tried this when I read about the migrate_disable() stuff on lwn, but I
didn't really find it any better to work with tbh. It also doesn't help
with the locking that Quentin was mentioning, does it? (i.e. you still
have to allocate).

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-19 12:47   ` Valentin Schneider
@ 2020-11-19 13:13     ` Will Deacon
  2020-11-19 14:54       ` Valentin Schneider
  2020-11-19 16:09       ` Peter Zijlstra
  0 siblings, 2 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-19 13:13 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 12:47:34PM +0000, Valentin Schneider wrote:
> 
> On 13/11/20 09:37, Will Deacon wrote:
> > Asymmetric systems may not offer the same level of userspace ISA support
> > across all CPUs, meaning that some applications cannot be executed by
> > some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
> > not feature support for 32-bit applications on both clusters.
> >
> > Although userspace can carefully manage the affinity masks for such
> > tasks, one place where it is particularly problematic is execve()
> > because the CPU on which the execve() is occurring may be incompatible
> > with the new application image. In such a situation, it is desirable to
> > restrict the affinity mask of the task and ensure that the new image is
> > entered on a compatible CPU.
> 
> > From userspace's point of view, this looks the same as if the
> > incompatible CPUs have been hotplugged off in its affinity mask.
> 
> {pedantic reading warning}
> 
> Hotplugged CPUs *can* be set in a task's affinity mask, though interact
> weirdly with cpusets [1]. Having it be the same as hotplug would mean
> keeping incompatible CPUs allowed in the affinity mask, but preventing them
> from being picked via e.g. is_cpu_allowed().

Sure, but I was talking about what userspace sees, and I don't think it ever
sees CPUs that have been hotplugged off, right? That is, sched_getaffinity()
masks its result with the active_mask.

> I was actually hoping this could be a feasible approach, i.e. have an
> extra CPU active mask filter for any task:
> 
>   cpu_active_mask & arch_cpu_allowed_mask(p)
> 
> rather than fiddle with task affinity. Sadly this would also require fixing
> up pretty much any place that uses cpu_active_mask(), and probably places
> that use p->cpus_ptr as well. RT / DL balancing comes to mind, because that
> uses either a task's affinity or a CPU's root domain (which reflects the
> cpu_active_mask) to figure out where to push a task.

Yeah, I tried this at one point and you end up playing whack-a-mole trying
to figure out why a task got killed. p->cpus_ptr is used all over the place,
and I think if we took this approach then we couldn't realistically remove
the sanity check on the ret-to-user path.

> So while I'm wary of hacking up affinity, I fear it might be the lesser
> evil :(

It's the best thing I've been able to come up with.

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 11/14] sched: Reject CPU affinity changes based on arch_cpu_allowed_mask()
  2020-11-19 11:07     ` Will Deacon
@ 2020-11-19 14:30       ` Quentin Perret
  2020-11-19 16:44         ` Will Deacon
  0 siblings, 1 reply; 52+ messages in thread
From: Quentin Perret @ 2020-11-19 14:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thursday 19 Nov 2020 at 11:07:24 (+0000), Will Deacon wrote:
> Yeah, the cpuset code ignores the return value of set_cpus_allowed_ptr() in
> update_tasks_cpumask() so the failure won't be propagated, but then again I
> think that might be the right thing to do. Nothing prevents 32-bit and
> 64-bit tasks from co-existing in the same cpuseti afaict, so forcing the
> 64-bit tasks onto the 32-bit-capable cores feels much worse than the
> approach taken here imo.

Ack. And thinking about it more, failing the cgroup operation wouldn't
guarantee that the task's affinity and the cpuset are aligned anyway. We
could still exec into a 32 bit task from within a 64 bit-only cpuset.

> The interesting case is what happens if the cpuset for a 32-bit task is
> changed to contain only the 64-bit-only cores. I think that's a userspace
> bug

Maybe, but I think Android will do exactly that in some cases :/

> but the fallback rq selection should avert disaster.

I thought _this_ patch was 'fixing' this case by making the cpuset
operation pretty much a nop on the task affinity? The fallback rq stuff
is all about hotplug no?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-19 13:13         ` Will Deacon
@ 2020-11-19 14:54           ` Valentin Schneider
  2020-11-19 16:41             ` Will Deacon
  0 siblings, 1 reply; 52+ messages in thread
From: Valentin Schneider @ 2020-11-19 14:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: Quentin Perret, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Peter Zijlstra, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team


On 19/11/20 13:13, Will Deacon wrote:
> On Thu, Nov 19, 2020 at 11:27:55AM +0000, Valentin Schneider wrote:
>>
>> On 19/11/20 11:05, Will Deacon wrote:
>> > On Thu, Nov 19, 2020 at 09:18:20AM +0000, Quentin Perret wrote:
>> >> > @@ -1937,20 +1931,69 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
>> >> >             * OK, since we're going to drop the lock immediately
>> >> >             * afterwards anyway.
>> >> >             */
>> >> > -		rq = move_queued_task(rq, &rf, p, dest_cpu);
>> >> > +		rq = move_queued_task(rq, rf, p, dest_cpu);
>> >> >    }
>> >> >  out:
>> >> > -	task_rq_unlock(rq, p, &rf);
>> >> > +	task_rq_unlock(rq, p, rf);
>> >>
>> >> And that's a little odd to have here no? Can we move it back on the
>> >> caller's side?
>> >
>> > I don't think so, unfortunately. __set_cpus_allowed_ptr_locked() can trigger
>> > migration, so it can drop the rq lock as part of that and end up relocking a
>> > new rq, which it also unlocks before returning. Doing the unlock in the
>> > caller is therfore even weirder, because you'd have to return the lock
>> > pointer or something horrible like that.
>> >
>> > I did add a comment about this right before the function and it's an
>> > internal function to the scheduler so I think it's ok.
>> >
>>
>> An alternative here would be to add a new SCA_RESTRICT flag for
>> __set_cpus_allowed_ptr() (see migrate_disable() faff in
>> tip/sched/core). Not fond of either approaches, but the flag thing would
>> avoid this "quirk".
>
> I tried this when I read about the migrate_disable() stuff on lwn, but I
> didn't really find it any better to work with tbh. It also doesn't help
> with the locking that Quentin was mentioning, does it? (i.e. you still
> have to allocate).
>

You could keep it all bundled within __set_cpus_allowed_ptr() (i.e. not
have a _locked() version) and use the flag as indicator of any extra work.

Also FWIW we have this pattern of pre-allocating pcpu cpumasks
(select_idle_mask, load_balance_mask), but given this is AIUI a
very-not-hot path, this might be overkill (and reusing an existing one
would be on the icky side of things).

> Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-19 13:13     ` Will Deacon
@ 2020-11-19 14:54       ` Valentin Schneider
  2020-11-19 16:09       ` Peter Zijlstra
  1 sibling, 0 replies; 52+ messages in thread
From: Valentin Schneider @ 2020-11-19 14:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team


On 19/11/20 13:13, Will Deacon wrote:
> On Thu, Nov 19, 2020 at 12:47:34PM +0000, Valentin Schneider wrote:
>>
>> On 13/11/20 09:37, Will Deacon wrote:
>> > Asymmetric systems may not offer the same level of userspace ISA support
>> > across all CPUs, meaning that some applications cannot be executed by
>> > some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
>> > not feature support for 32-bit applications on both clusters.
>> >
>> > Although userspace can carefully manage the affinity masks for such
>> > tasks, one place where it is particularly problematic is execve()
>> > because the CPU on which the execve() is occurring may be incompatible
>> > with the new application image. In such a situation, it is desirable to
>> > restrict the affinity mask of the task and ensure that the new image is
>> > entered on a compatible CPU.
>>
>> > From userspace's point of view, this looks the same as if the
>> > incompatible CPUs have been hotplugged off in its affinity mask.
>>
>> {pedantic reading warning}
>>
>> Hotplugged CPUs *can* be set in a task's affinity mask, though interact
>> weirdly with cpusets [1]. Having it be the same as hotplug would mean
>> keeping incompatible CPUs allowed in the affinity mask, but preventing them
>> from being picked via e.g. is_cpu_allowed().
>
> Sure, but I was talking about what userspace sees, and I don't think it ever
> sees CPUs that have been hotplugged off, right? That is, sched_getaffinity()
> masks its result with the active_mask.
>

Right, this wasn't pedantic reading, but reading between the lines!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-19 13:13     ` Will Deacon
  2020-11-19 14:54       ` Valentin Schneider
@ 2020-11-19 16:09       ` Peter Zijlstra
  2020-11-19 16:57         ` Valentin Schneider
  1 sibling, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2020-11-19 16:09 UTC (permalink / raw)
  To: Will Deacon
  Cc: Valentin Schneider, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 01:13:20PM +0000, Will Deacon wrote:

> Sure, but I was talking about what userspace sees, and I don't think it ever
> sees CPUs that have been hotplugged off, right? That is, sched_getaffinity()
> masks its result with the active_mask.

# for i in /sys/devices/system/cpu/cpu*/online; do echo -n $i ":"; cat $i; done
/sys/devices/system/cpu/cpu1/online :0
/sys/devices/system/cpu/cpu2/online :1
/sys/devices/system/cpu/cpu3/online :1
/sys/devices/system/cpu/cpu4/online :1
/sys/devices/system/cpu/cpu5/online :1
/sys/devices/system/cpu/cpu6/online :1
/sys/devices/system/cpu/cpu7/online :1

# grep Cpus_allowed /proc/self/status
Cpus_allowed:   ff
Cpus_allowed_list:      0-7


:-)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems
  2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
                   ` (13 preceding siblings ...)
  2020-11-13  9:37 ` [PATCH v3 14/14] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores Will Deacon
@ 2020-11-19 16:11 ` Peter Zijlstra
  2020-11-19 16:39   ` Will Deacon
  14 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2020-11-19 16:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Li Zefan,
	Johannes Weiner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	kernel-team

On Fri, Nov 13, 2020 at 09:37:05AM +0000, Will Deacon wrote:

> The aim of this series is to allow 32-bit ARM applications to run on
> arm64 SoCs where not all of the CPUs support the 32-bit instruction set.
> 
> There are some major changes in v3:
> 
>   * Add some scheduler hooks for restricting a task's affinity mask
>   * Implement these hooks for arm64 so that we can avoid 32-bit tasks
>     running on 64-bit-only cores
>   * Restrict affinity mask of 32-bit tasks on execve()
>   * Prevent hot-unplug of all 32-bit CPUs if we have a mismatched system
>   * Ensure 32-bit EL0 cpumask is zero-initialised (oops)
> 
> It's worth mentioning that this approach goes directly against my
> initial proposal for punting the affinity management to userspace,
> because it turns out that doesn't really work. There are cases where the
> kernel has to muck with the affinity mask explicitly, such as execve(),
> CPU hotplug and cpuset balancing. Ensuring that these don't lead to
> random SIGKILLs as far as userspace is concerned means avoiding any

Mooo, I thought we were okay with that... Use does stupid, user gets
SIGKIL. What changed?

> 64-bit-only CPUs appearing in the affinity mask for a 32-bit task, at
> which point it's easier just to handle everything in the kernel anyway.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2020-11-13  9:37 ` [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
  2020-11-19  9:24   ` Quentin Perret
@ 2020-11-19 16:14   ` Peter Zijlstra
  2020-11-19 16:28     ` Will Deacon
  1 sibling, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2020-11-19 16:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Li Zefan,
	Johannes Weiner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	kernel-team

On Fri, Nov 13, 2020 at 09:37:13AM +0000, Will Deacon wrote:
> When exec'ing a 32-bit task on a system with mismatched support for
> 32-bit EL0, try to ensure that it starts life on a CPU that can actually
> run it.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kernel/process.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 1540ab0fbf23..17b94007fed4 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -625,6 +625,16 @@ unsigned long arch_align_stack(unsigned long sp)
>  	return sp & ~0xf;
>  }
>  
> +static void adjust_compat_task_affinity(struct task_struct *p)
> +{
> +	const struct cpumask *mask = system_32bit_el0_cpumask();
> +
> +	if (restrict_cpus_allowed_ptr(p, mask))
> +		set_cpus_allowed_ptr(p, mask);

This silently destroys user state, at the very least that ought to go
with a WARN or something. Ideally SIGKILL though. What's to stop someone
from doing a sched_setaffinity() right after the execve, same problem.
So why bother..

> +
> +	set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
> +}
> +
>  /*
>   * Called from setup_new_exec() after (COMPAT_)SET_PERSONALITY.
>   */
> @@ -635,7 +645,7 @@ void arch_setup_new_exec(void)
>  	if (is_compat_task()) {
>  		mmflags = MMCF_AARCH32;
>  		if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
> -			set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
> +			adjust_compat_task_affinity(current);
>  	}
>  
>  	current->mm->context.flags = mmflags;
> -- 
> 2.29.2.299.gdc1121823c-goog
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2020-11-19 11:06     ` Will Deacon
@ 2020-11-19 16:19       ` Peter Zijlstra
  2020-11-19 16:30         ` Will Deacon
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2020-11-19 16:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: Quentin Perret, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 11:06:04AM +0000, Will Deacon wrote:
> On Thu, Nov 19, 2020 at 09:24:07AM +0000, Quentin Perret wrote:
> > On Friday 13 Nov 2020 at 09:37:13 (+0000), Will Deacon wrote:
> > > When exec'ing a 32-bit task on a system with mismatched support for
> > > 32-bit EL0, try to ensure that it starts life on a CPU that can actually
> > > run it.
> > > 
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/kernel/process.c | 12 +++++++++++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > index 1540ab0fbf23..17b94007fed4 100644
> > > --- a/arch/arm64/kernel/process.c
> > > +++ b/arch/arm64/kernel/process.c
> > > @@ -625,6 +625,16 @@ unsigned long arch_align_stack(unsigned long sp)
> > >  	return sp & ~0xf;
> > >  }
> > >  
> > > +static void adjust_compat_task_affinity(struct task_struct *p)
> > > +{
> > > +	const struct cpumask *mask = system_32bit_el0_cpumask();
> > > +
> > > +	if (restrict_cpus_allowed_ptr(p, mask))
> > > +		set_cpus_allowed_ptr(p, mask);
> > 
> > My understanding of this call to set_cpus_allowed_ptr() is that you're
> > mimicking the hotplug vs affinity case behaviour in some ways. That is,
> > if a task is pinned to a CPU and userspace hotplugs that CPU, then the
> > kernel will reset the affinity of the task to the remaining online CPUs.
> 
> Correct. It looks to the 32-bit application like all the 64-bit-only CPUs
> were hotplugged off at the point of the execve().

This doesn't respect cpusets though :/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2020-11-19 16:14   ` Peter Zijlstra
@ 2020-11-19 16:28     ` Will Deacon
  2020-11-19 16:42       ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Will Deacon @ 2020-11-19 16:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Li Zefan,
	Johannes Weiner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	kernel-team

On Thu, Nov 19, 2020 at 05:14:48PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 13, 2020 at 09:37:13AM +0000, Will Deacon wrote:
> > When exec'ing a 32-bit task on a system with mismatched support for
> > 32-bit EL0, try to ensure that it starts life on a CPU that can actually
> > run it.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kernel/process.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > index 1540ab0fbf23..17b94007fed4 100644
> > --- a/arch/arm64/kernel/process.c
> > +++ b/arch/arm64/kernel/process.c
> > @@ -625,6 +625,16 @@ unsigned long arch_align_stack(unsigned long sp)
> >  	return sp & ~0xf;
> >  }
> >  
> > +static void adjust_compat_task_affinity(struct task_struct *p)
> > +{
> > +	const struct cpumask *mask = system_32bit_el0_cpumask();
> > +
> > +	if (restrict_cpus_allowed_ptr(p, mask))
> > +		set_cpus_allowed_ptr(p, mask);
> 
> This silently destroys user state, at the very least that ought to go
> with a WARN or something. Ideally SIGKILL though. What's to stop someone
> from doing a sched_setaffinity() right after the execve, same problem.
> So why bother..

It's no different to CPU hot-unplug though, is it? From the perspective of
the 32-bit task, the 64-bit-only cores were hot-unplugged at the point of
execve(). Calls to sched_setaffinity() for 32-bit tasks will reject attempts
to include 64-bit-only cores.

I initially wanted to punt this all to userspace, but one of the big
problems with that is when a 64-bit task is running on a CPU only capable
of running 64-bit tasks and it execve()s a 32-bit task. At the point, we
have to do something because we can't even run the new task for it to do
a sched_affinity() call (and we also can't deliver SIGILL).

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2020-11-19 16:19       ` Peter Zijlstra
@ 2020-11-19 16:30         ` Will Deacon
  2020-11-19 16:44           ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Will Deacon @ 2020-11-19 16:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Quentin Perret, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 05:19:56PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 19, 2020 at 11:06:04AM +0000, Will Deacon wrote:
> > On Thu, Nov 19, 2020 at 09:24:07AM +0000, Quentin Perret wrote:
> > > On Friday 13 Nov 2020 at 09:37:13 (+0000), Will Deacon wrote:
> > > > When exec'ing a 32-bit task on a system with mismatched support for
> > > > 32-bit EL0, try to ensure that it starts life on a CPU that can actually
> > > > run it.
> > > > 
> > > > Signed-off-by: Will Deacon <will@kernel.org>
> > > > ---
> > > >  arch/arm64/kernel/process.c | 12 +++++++++++-
> > > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > > index 1540ab0fbf23..17b94007fed4 100644
> > > > --- a/arch/arm64/kernel/process.c
> > > > +++ b/arch/arm64/kernel/process.c
> > > > @@ -625,6 +625,16 @@ unsigned long arch_align_stack(unsigned long sp)
> > > >  	return sp & ~0xf;
> > > >  }
> > > >  
> > > > +static void adjust_compat_task_affinity(struct task_struct *p)
> > > > +{
> > > > +	const struct cpumask *mask = system_32bit_el0_cpumask();
> > > > +
> > > > +	if (restrict_cpus_allowed_ptr(p, mask))
> > > > +		set_cpus_allowed_ptr(p, mask);
> > > 
> > > My understanding of this call to set_cpus_allowed_ptr() is that you're
> > > mimicking the hotplug vs affinity case behaviour in some ways. That is,
> > > if a task is pinned to a CPU and userspace hotplugs that CPU, then the
> > > kernel will reset the affinity of the task to the remaining online CPUs.
> > 
> > Correct. It looks to the 32-bit application like all the 64-bit-only CPUs
> > were hotplugged off at the point of the execve().
> 
> This doesn't respect cpusets though :/

How does that differ from select_fallback_rq()?

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems
  2020-11-19 16:11 ` [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Peter Zijlstra
@ 2020-11-19 16:39   ` Will Deacon
  0 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-19 16:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Li Zefan,
	Johannes Weiner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	kernel-team

On Thu, Nov 19, 2020 at 05:11:27PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 13, 2020 at 09:37:05AM +0000, Will Deacon wrote:
> 
> > The aim of this series is to allow 32-bit ARM applications to run on
> > arm64 SoCs where not all of the CPUs support the 32-bit instruction set.
> > 
> > There are some major changes in v3:
> > 
> >   * Add some scheduler hooks for restricting a task's affinity mask
> >   * Implement these hooks for arm64 so that we can avoid 32-bit tasks
> >     running on 64-bit-only cores
> >   * Restrict affinity mask of 32-bit tasks on execve()
> >   * Prevent hot-unplug of all 32-bit CPUs if we have a mismatched system
> >   * Ensure 32-bit EL0 cpumask is zero-initialised (oops)
> > 
> > It's worth mentioning that this approach goes directly against my
> > initial proposal for punting the affinity management to userspace,
> > because it turns out that doesn't really work. There are cases where the
> > kernel has to muck with the affinity mask explicitly, such as execve(),
> > CPU hotplug and cpuset balancing. Ensuring that these don't lead to
> > random SIGKILLs as far as userspace is concerned means avoiding any
> 
> Mooo, I thought we were okay with that... Use does stupid, user gets
> SIGKIL. What changed?

See my other reply, but there are 64-bit apps that execve() a 32-bit
payload. I was hoping this could be handling in the C library, but that
has no idea about what it's exec'ing beyond the path.

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-19 14:54           ` Valentin Schneider
@ 2020-11-19 16:41             ` Will Deacon
  0 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-19 16:41 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Quentin Perret, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Peter Zijlstra, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 02:54:32PM +0000, Valentin Schneider wrote:
> 
> On 19/11/20 13:13, Will Deacon wrote:
> > On Thu, Nov 19, 2020 at 11:27:55AM +0000, Valentin Schneider wrote:
> >>
> >> On 19/11/20 11:05, Will Deacon wrote:
> >> > On Thu, Nov 19, 2020 at 09:18:20AM +0000, Quentin Perret wrote:
> >> >> > @@ -1937,20 +1931,69 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
> >> >> >             * OK, since we're going to drop the lock immediately
> >> >> >             * afterwards anyway.
> >> >> >             */
> >> >> > -		rq = move_queued_task(rq, &rf, p, dest_cpu);
> >> >> > +		rq = move_queued_task(rq, rf, p, dest_cpu);
> >> >> >    }
> >> >> >  out:
> >> >> > -	task_rq_unlock(rq, p, &rf);
> >> >> > +	task_rq_unlock(rq, p, rf);
> >> >>
> >> >> And that's a little odd to have here no? Can we move it back on the
> >> >> caller's side?
> >> >
> >> > I don't think so, unfortunately. __set_cpus_allowed_ptr_locked() can trigger
> >> > migration, so it can drop the rq lock as part of that and end up relocking a
> >> > new rq, which it also unlocks before returning. Doing the unlock in the
> >> > caller is therfore even weirder, because you'd have to return the lock
> >> > pointer or something horrible like that.
> >> >
> >> > I did add a comment about this right before the function and it's an
> >> > internal function to the scheduler so I think it's ok.
> >> >
> >>
> >> An alternative here would be to add a new SCA_RESTRICT flag for
> >> __set_cpus_allowed_ptr() (see migrate_disable() faff in
> >> tip/sched/core). Not fond of either approaches, but the flag thing would
> >> avoid this "quirk".
> >
> > I tried this when I read about the migrate_disable() stuff on lwn, but I
> > didn't really find it any better to work with tbh. It also doesn't help
> > with the locking that Quentin was mentioning, does it? (i.e. you still
> > have to allocate).
> >
> 
> You could keep it all bundled within __set_cpus_allowed_ptr() (i.e. not
> have a _locked() version) and use the flag as indicator of any extra work.

Ah, gotcha. Still not convinced it's any better, but I see that it works.

> Also FWIW we have this pattern of pre-allocating pcpu cpumasks
> (select_idle_mask, load_balance_mask), but given this is AIUI a
> very-not-hot path, this might be overkill (and reusing an existing one
> would be on the icky side of things).

I think that makes sense for static masks, but since this is dynamic I was
following the lead of sched_setaffinity().

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2020-11-19 16:28     ` Will Deacon
@ 2020-11-19 16:42       ` Peter Zijlstra
  2020-11-19 16:48         ` Will Deacon
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2020-11-19 16:42 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Li Zefan,
	Johannes Weiner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	kernel-team

On Thu, Nov 19, 2020 at 04:28:23PM +0000, Will Deacon wrote:
> On Thu, Nov 19, 2020 at 05:14:48PM +0100, Peter Zijlstra wrote:
> > On Fri, Nov 13, 2020 at 09:37:13AM +0000, Will Deacon wrote:
> > > When exec'ing a 32-bit task on a system with mismatched support for
> > > 32-bit EL0, try to ensure that it starts life on a CPU that can actually
> > > run it.
> > > 
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/kernel/process.c | 12 +++++++++++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > index 1540ab0fbf23..17b94007fed4 100644
> > > --- a/arch/arm64/kernel/process.c
> > > +++ b/arch/arm64/kernel/process.c
> > > @@ -625,6 +625,16 @@ unsigned long arch_align_stack(unsigned long sp)
> > >  	return sp & ~0xf;
> > >  }
> > >  
> > > +static void adjust_compat_task_affinity(struct task_struct *p)
> > > +{
> > > +	const struct cpumask *mask = system_32bit_el0_cpumask();
> > > +
> > > +	if (restrict_cpus_allowed_ptr(p, mask))
> > > +		set_cpus_allowed_ptr(p, mask);
> > 
> > This silently destroys user state, at the very least that ought to go
> > with a WARN or something. Ideally SIGKILL though. What's to stop someone
> > from doing a sched_setaffinity() right after the execve, same problem.
> > So why bother..
> 
> It's no different to CPU hot-unplug though, is it? From the perspective of
> the 32-bit task, the 64-bit-only cores were hot-unplugged at the point of
> execve(). Calls to sched_setaffinity() for 32-bit tasks will reject attempts
> to include 64-bit-only cores.

select_fallback_rq() has a printk() in to at least notify things went
bad. But I don't particularly like the current hotplug semantics; I've
wanted to disallow the hotplug when it would result in this case, but
computing that is tricky. It's one of those things that's forever on the
todo list ... :/

> I initially wanted to punt this all to userspace, but one of the big
> problems with that is when a 64-bit task is running on a CPU only capable
> of running 64-bit tasks and it execve()s a 32-bit task. At the point, we
> have to do something because we can't even run the new task for it to do
> a sched_affinity() call (and we also can't deliver SIGILL).

Userspace can see that one coming though...  I suppose you can simply
make the execve fail before the point of no return.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2020-11-19 16:30         ` Will Deacon
@ 2020-11-19 16:44           ` Peter Zijlstra
  2020-11-19 16:51             ` Will Deacon
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2020-11-19 16:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: Quentin Perret, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 04:30:36PM +0000, Will Deacon wrote:
> On Thu, Nov 19, 2020 at 05:19:56PM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 19, 2020 at 11:06:04AM +0000, Will Deacon wrote:
> > > On Thu, Nov 19, 2020 at 09:24:07AM +0000, Quentin Perret wrote:
> > > > On Friday 13 Nov 2020 at 09:37:13 (+0000), Will Deacon wrote:
> > > > > When exec'ing a 32-bit task on a system with mismatched support for
> > > > > 32-bit EL0, try to ensure that it starts life on a CPU that can actually
> > > > > run it.
> > > > > 
> > > > > Signed-off-by: Will Deacon <will@kernel.org>
> > > > > ---
> > > > >  arch/arm64/kernel/process.c | 12 +++++++++++-
> > > > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > > > index 1540ab0fbf23..17b94007fed4 100644
> > > > > --- a/arch/arm64/kernel/process.c
> > > > > +++ b/arch/arm64/kernel/process.c
> > > > > @@ -625,6 +625,16 @@ unsigned long arch_align_stack(unsigned long sp)
> > > > >  	return sp & ~0xf;
> > > > >  }
> > > > >  
> > > > > +static void adjust_compat_task_affinity(struct task_struct *p)
> > > > > +{
> > > > > +	const struct cpumask *mask = system_32bit_el0_cpumask();
> > > > > +
> > > > > +	if (restrict_cpus_allowed_ptr(p, mask))
> > > > > +		set_cpus_allowed_ptr(p, mask);
> > > > 
> > > > My understanding of this call to set_cpus_allowed_ptr() is that you're
> > > > mimicking the hotplug vs affinity case behaviour in some ways. That is,
> > > > if a task is pinned to a CPU and userspace hotplugs that CPU, then the
> > > > kernel will reset the affinity of the task to the remaining online CPUs.
> > > 
> > > Correct. It looks to the 32-bit application like all the 64-bit-only CPUs
> > > were hotplugged off at the point of the execve().
> > 
> > This doesn't respect cpusets though :/
> 
> How does that differ from select_fallback_rq()?

That has that cpuset state it tries first, only if that fails does it do
the possible mask.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 11/14] sched: Reject CPU affinity changes based on arch_cpu_allowed_mask()
  2020-11-19 14:30       ` Quentin Perret
@ 2020-11-19 16:44         ` Will Deacon
  0 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-19 16:44 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 02:30:12PM +0000, Quentin Perret wrote:
> On Thursday 19 Nov 2020 at 11:07:24 (+0000), Will Deacon wrote:
> > Yeah, the cpuset code ignores the return value of set_cpus_allowed_ptr() in
> > update_tasks_cpumask() so the failure won't be propagated, but then again I
> > think that might be the right thing to do. Nothing prevents 32-bit and
> > 64-bit tasks from co-existing in the same cpuseti afaict, so forcing the
> > 64-bit tasks onto the 32-bit-capable cores feels much worse than the
> > approach taken here imo.
> 
> Ack. And thinking about it more, failing the cgroup operation wouldn't
> guarantee that the task's affinity and the cpuset are aligned anyway. We
> could still exec into a 32 bit task from within a 64 bit-only cpuset.
> 
> > The interesting case is what happens if the cpuset for a 32-bit task is
> > changed to contain only the 64-bit-only cores. I think that's a userspace
> > bug
> 
> Maybe, but I think Android will do exactly that in some cases :/
> 
> > but the fallback rq selection should avert disaster.
> 
> I thought _this_ patch was 'fixing' this case by making the cpuset
> operation pretty much a nop on the task affinity? The fallback rq stuff
> is all about hotplug no?

Yeah, sorry, I wasn't clear. This patch postpones disaster until hotplug
off time, when cpuset_cpus_allowed_fallback() will fail and
select_fallback_rq() will have to step in.

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2020-11-19 16:42       ` Peter Zijlstra
@ 2020-11-19 16:48         ` Will Deacon
  0 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-19 16:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Li Zefan,
	Johannes Weiner, Ingo Molnar, Juri Lelli, Vincent Guittot,
	kernel-team

On Thu, Nov 19, 2020 at 05:42:03PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 19, 2020 at 04:28:23PM +0000, Will Deacon wrote:
> > On Thu, Nov 19, 2020 at 05:14:48PM +0100, Peter Zijlstra wrote:
> > > On Fri, Nov 13, 2020 at 09:37:13AM +0000, Will Deacon wrote:
> > > > When exec'ing a 32-bit task on a system with mismatched support for
> > > > 32-bit EL0, try to ensure that it starts life on a CPU that can actually
> > > > run it.
> > > > 
> > > > Signed-off-by: Will Deacon <will@kernel.org>
> > > > ---
> > > >  arch/arm64/kernel/process.c | 12 +++++++++++-
> > > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > > index 1540ab0fbf23..17b94007fed4 100644
> > > > --- a/arch/arm64/kernel/process.c
> > > > +++ b/arch/arm64/kernel/process.c
> > > > @@ -625,6 +625,16 @@ unsigned long arch_align_stack(unsigned long sp)
> > > >  	return sp & ~0xf;
> > > >  }
> > > >  
> > > > +static void adjust_compat_task_affinity(struct task_struct *p)
> > > > +{
> > > > +	const struct cpumask *mask = system_32bit_el0_cpumask();
> > > > +
> > > > +	if (restrict_cpus_allowed_ptr(p, mask))
> > > > +		set_cpus_allowed_ptr(p, mask);
> > > 
> > > This silently destroys user state, at the very least that ought to go
> > > with a WARN or something. Ideally SIGKILL though. What's to stop someone
> > > from doing a sched_setaffinity() right after the execve, same problem.
> > > So why bother..
> > 
> > It's no different to CPU hot-unplug though, is it? From the perspective of
> > the 32-bit task, the 64-bit-only cores were hot-unplugged at the point of
> > execve(). Calls to sched_setaffinity() for 32-bit tasks will reject attempts
> > to include 64-bit-only cores.
> 
> select_fallback_rq() has a printk() in to at least notify things went
> bad. But I don't particularly like the current hotplug semantics; I've
> wanted to disallow the hotplug when it would result in this case, but
> computing that is tricky. It's one of those things that's forever on the
> todo list ... :/

I know that feeling...

I can add a printk() in the case where we override the mask (I think taking
the subset is ok), since I agree that it would be better if userspace had
had the foresight to avoid the situation in the first place.

> > I initially wanted to punt this all to userspace, but one of the big
> > problems with that is when a 64-bit task is running on a CPU only capable
> > of running 64-bit tasks and it execve()s a 32-bit task. At the point, we
> > have to do something because we can't even run the new task for it to do
> > a sched_affinity() call (and we also can't deliver SIGILL).
> 
> Userspace can see that one coming though...  I suppose you can simply
> make the execve fail before the point of no return.

If we could open up all the 32-bit apps out there and fix them, then I'd be
more sympathetic, but the reality is that we need to run existing binaries
on these stupid systems and exec'ing 32-bit payloads from 64-bit tasks is
something that we need to continue to support.

If it makes things any better, all of this stuff is off by default and gated
on a cmdline option.

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2020-11-19 16:44           ` Peter Zijlstra
@ 2020-11-19 16:51             ` Will Deacon
  0 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-19 16:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Quentin Perret, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 05:44:11PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 19, 2020 at 04:30:36PM +0000, Will Deacon wrote:
> > On Thu, Nov 19, 2020 at 05:19:56PM +0100, Peter Zijlstra wrote:
> > > On Thu, Nov 19, 2020 at 11:06:04AM +0000, Will Deacon wrote:
> > > > On Thu, Nov 19, 2020 at 09:24:07AM +0000, Quentin Perret wrote:
> > > > > On Friday 13 Nov 2020 at 09:37:13 (+0000), Will Deacon wrote:
> > > > > > When exec'ing a 32-bit task on a system with mismatched support for
> > > > > > 32-bit EL0, try to ensure that it starts life on a CPU that can actually
> > > > > > run it.
> > > > > > 
> > > > > > Signed-off-by: Will Deacon <will@kernel.org>
> > > > > > ---
> > > > > >  arch/arm64/kernel/process.c | 12 +++++++++++-
> > > > > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > > > > index 1540ab0fbf23..17b94007fed4 100644
> > > > > > --- a/arch/arm64/kernel/process.c
> > > > > > +++ b/arch/arm64/kernel/process.c
> > > > > > @@ -625,6 +625,16 @@ unsigned long arch_align_stack(unsigned long sp)
> > > > > >  	return sp & ~0xf;
> > > > > >  }
> > > > > >  
> > > > > > +static void adjust_compat_task_affinity(struct task_struct *p)
> > > > > > +{
> > > > > > +	const struct cpumask *mask = system_32bit_el0_cpumask();
> > > > > > +
> > > > > > +	if (restrict_cpus_allowed_ptr(p, mask))
> > > > > > +		set_cpus_allowed_ptr(p, mask);
> > > > > 
> > > > > My understanding of this call to set_cpus_allowed_ptr() is that you're
> > > > > mimicking the hotplug vs affinity case behaviour in some ways. That is,
> > > > > if a task is pinned to a CPU and userspace hotplugs that CPU, then the
> > > > > kernel will reset the affinity of the task to the remaining online CPUs.
> > > > 
> > > > Correct. It looks to the 32-bit application like all the 64-bit-only CPUs
> > > > were hotplugged off at the point of the execve().
> > > 
> > > This doesn't respect cpusets though :/
> > 
> > How does that differ from select_fallback_rq()?
> 
> That has that cpuset state it tries first, only if that fails does it do
> the possible mask.

Fair enough. Lemme try something similar here before overriding the mask
entirely...

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-19 16:09       ` Peter Zijlstra
@ 2020-11-19 16:57         ` Valentin Schneider
  2020-11-19 19:25           ` Will Deacon
  0 siblings, 1 reply; 52+ messages in thread
From: Valentin Schneider @ 2020-11-19 16:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team


On 19/11/20 16:09, Peter Zijlstra wrote:
> On Thu, Nov 19, 2020 at 01:13:20PM +0000, Will Deacon wrote:
>
>> Sure, but I was talking about what userspace sees, and I don't think it ever
>> sees CPUs that have been hotplugged off, right? That is, sched_getaffinity()
>> masks its result with the active_mask.
>
> # for i in /sys/devices/system/cpu/cpu*/online; do echo -n $i ":"; cat $i; done
> /sys/devices/system/cpu/cpu1/online :0
> /sys/devices/system/cpu/cpu2/online :1
> /sys/devices/system/cpu/cpu3/online :1
> /sys/devices/system/cpu/cpu4/online :1
> /sys/devices/system/cpu/cpu5/online :1
> /sys/devices/system/cpu/cpu6/online :1
> /sys/devices/system/cpu/cpu7/online :1
>
> # grep Cpus_allowed /proc/self/status
> Cpus_allowed:   ff
> Cpus_allowed_list:      0-7
>
>
> :-)

Harumph, so there is that...

$ while true; do continue; done &
$ PID=$!
$ taskset -pc 0-1 $PID
  pid 849's current affinity list: 0-5
  pid 849's new affinity list: 0,1
$ echo 0 > /sys/devices/system/cpu/cpu1/online
  [12578.545726] CPU1: shutdown
  [12578.548454] psci: CPU1 killed (polled 0 ms)
$ taskset -pc $PID
  pid 849's current affinity list: 0
$ cat /proc/$PID/status | grep Cpus
  Cpus_allowed:   03
  Cpus_allowed_list:      0-1

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity
  2020-11-19 16:57         ` Valentin Schneider
@ 2020-11-19 19:25           ` Will Deacon
  0 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2020-11-19 19:25 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra, linux-arm-kernel, linux-arch, linux-kernel,
	Catalin Marinas, Marc Zyngier, Greg Kroah-Hartman,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 04:57:22PM +0000, Valentin Schneider wrote:
> 
> On 19/11/20 16:09, Peter Zijlstra wrote:
> > On Thu, Nov 19, 2020 at 01:13:20PM +0000, Will Deacon wrote:
> >
> >> Sure, but I was talking about what userspace sees, and I don't think it ever
> >> sees CPUs that have been hotplugged off, right? That is, sched_getaffinity()
> >> masks its result with the active_mask.
> >
> > # for i in /sys/devices/system/cpu/cpu*/online; do echo -n $i ":"; cat $i; done
> > /sys/devices/system/cpu/cpu1/online :0
> > /sys/devices/system/cpu/cpu2/online :1
> > /sys/devices/system/cpu/cpu3/online :1
> > /sys/devices/system/cpu/cpu4/online :1
> > /sys/devices/system/cpu/cpu5/online :1
> > /sys/devices/system/cpu/cpu6/online :1
> > /sys/devices/system/cpu/cpu7/online :1
> >
> > # grep Cpus_allowed /proc/self/status
> > Cpus_allowed:   ff
> > Cpus_allowed_list:      0-7
> >
> >
> > :-)
> 
> Harumph, so there is that...
> 
> $ while true; do continue; done &
> $ PID=$!
> $ taskset -pc 0-1 $PID
>   pid 849's current affinity list: 0-5
>   pid 849's new affinity list: 0,1
> $ echo 0 > /sys/devices/system/cpu/cpu1/online
>   [12578.545726] CPU1: shutdown
>   [12578.548454] psci: CPU1 killed (polled 0 ms)
> $ taskset -pc $PID
>   pid 849's current affinity list: 0
> $ cat /proc/$PID/status | grep Cpus
>   Cpus_allowed:   03
>   Cpus_allowed_list:      0-1

Yeah, I'm not sure this is worth tackling tbh. sched_getaffinity() does the
right thing, but poking around in /proc and /sys is always going to defeat
the illusion and I don't see what we gain in reporting CPUs on which the
task is _never_ going to run anyway. But I'll revise my stance on it being
identical to hotplug :) (I would've gotten away with it too, if it wasn't
for those pesky hackers).

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 10/14] sched: Introduce arch_cpu_allowed_mask() to limit fallback rq selection
  2020-11-19 11:07     ` Will Deacon
@ 2020-11-19 20:39       ` Will Deacon
  2020-11-23 14:48         ` Quentin Perret
  0 siblings, 1 reply; 52+ messages in thread
From: Will Deacon @ 2020-11-19 20:39 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thu, Nov 19, 2020 at 11:07:09AM +0000, Will Deacon wrote:
> On Thu, Nov 19, 2020 at 09:38:50AM +0000, Quentin Perret wrote:
> > On Friday 13 Nov 2020 at 09:37:15 (+0000), Will Deacon wrote:
> > > Asymmetric systems may not offer the same level of userspace ISA support
> > > across all CPUs, meaning that some applications cannot be executed by
> > > some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
> > > not feature support for 32-bit applications on both clusters.
> > > 
> > > On such a system, we must take care not to migrate a task to an
> > > unsupported CPU when forcefully moving tasks in select_fallback_rq()
> > > in response to a CPU hot-unplug operation.
> > > 
> > > Introduce an arch_cpu_allowed_mask() hook which, given a task argument,
> > > allows an architecture to return a cpumask of CPUs that are capable of
> > > executing that task. The default implementation returns the
> > > cpu_possible_mask, since sane machines do not suffer from per-cpu ISA
> > > limitations that affect scheduling. The new mask is used when selecting
> > > the fallback runqueue as a last resort before forcing a migration to the
> > > first active CPU.
> > > 
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  kernel/sched/core.c | 13 ++++++++++---
> > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 818c8f7bdf2a..8df38ebfe769 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -1696,6 +1696,11 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
> > >  
> > >  #ifdef CONFIG_SMP
> > >  
> > > +/* Must contain at least one active CPU */
> > > +#ifndef arch_cpu_allowed_mask
> > > +#define  arch_cpu_allowed_mask(p)	cpu_possible_mask
> > > +#endif
> > > +
> > >  /*
> > >   * Per-CPU kthreads are allowed to run on !active && online CPUs, see
> > >   * __set_cpus_allowed_ptr() and select_fallback_rq().
> > > @@ -1708,7 +1713,10 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
> > >  	if (is_per_cpu_kthread(p))
> > >  		return cpu_online(cpu);
> > >  
> > > -	return cpu_active(cpu);
> > > +	if (!cpu_active(cpu))
> > > +		return false;
> > > +
> > > +	return cpumask_test_cpu(cpu, arch_cpu_allowed_mask(p));
> > >  }
> > >  
> > >  /*
> > > @@ -2361,10 +2369,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
> > >  			}
> > >  			fallthrough;
> > >  		case possible:
> > > -			do_set_cpus_allowed(p, cpu_possible_mask);
> > > +			do_set_cpus_allowed(p, arch_cpu_allowed_mask(p));
> > 
> > Nit: I'm wondering if this should be called arch_cpu_possible_mask()
> > instead?
> 
> I'm open to renaming it, so if nobody else has any better ideas then I'll
> go with this.

Ah, so in doing this I realised I don't like arch_cpu_possible_mask() so
much because it makes it sound like a back-end to cpu_possible_mask, but
the two are really different things.

arch_task_cpu_possible_mask() might work?

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 10/14] sched: Introduce arch_cpu_allowed_mask() to limit fallback rq selection
  2020-11-19 20:39       ` Will Deacon
@ 2020-11-23 14:48         ` Quentin Perret
  0 siblings, 0 replies; 52+ messages in thread
From: Quentin Perret @ 2020-11-23 14:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan, Tejun Heo,
	Li Zefan, Johannes Weiner, Ingo Molnar, Juri Lelli,
	Vincent Guittot, kernel-team

On Thursday 19 Nov 2020 at 20:39:07 (+0000), Will Deacon wrote:
> Ah, so in doing this I realised I don't like arch_cpu_possible_mask() so
> much because it makes it sound like a back-end to cpu_possible_mask, but
> the two are really different things.
> 
> arch_task_cpu_possible_mask() might work?

Yes, making it explicit in the name that this is a task-specific thing
doesn't hurt.

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2020-11-23 14:48 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-13  9:37 [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Will Deacon
2020-11-13  9:37 ` [PATCH v3 01/14] arm64: cpuinfo: Split AArch32 registers out into a separate struct Will Deacon
2020-11-13  9:37 ` [PATCH v3 02/14] arm64: Allow mismatched 32-bit EL0 support Will Deacon
2020-11-19 11:27   ` Valentin Schneider
2020-11-19 13:12     ` Will Deacon
2020-11-13  9:37 ` [PATCH v3 03/14] KVM: arm64: Kill 32-bit vCPUs on systems with mismatched " Will Deacon
2020-11-13  9:37 ` [PATCH v3 04/14] arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs Will Deacon
2020-11-13  9:37 ` [PATCH v3 05/14] arm64: Advertise CPUs capable of running 32-bit applications in sysfs Will Deacon
2020-11-13  9:37 ` [PATCH v3 06/14] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 Will Deacon
2020-11-13  9:37 ` [PATCH v3 07/14] sched: Introduce restrict_cpus_allowed_ptr() to limit task CPU affinity Will Deacon
2020-11-19  9:18   ` Quentin Perret
2020-11-19 11:03     ` Quentin Perret
2020-11-19 11:05     ` Will Deacon
2020-11-19 11:27       ` Valentin Schneider
2020-11-19 13:13         ` Will Deacon
2020-11-19 14:54           ` Valentin Schneider
2020-11-19 16:41             ` Will Deacon
2020-11-19 12:47   ` Valentin Schneider
2020-11-19 13:13     ` Will Deacon
2020-11-19 14:54       ` Valentin Schneider
2020-11-19 16:09       ` Peter Zijlstra
2020-11-19 16:57         ` Valentin Schneider
2020-11-19 19:25           ` Will Deacon
2020-11-13  9:37 ` [PATCH v3 08/14] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
2020-11-19  9:24   ` Quentin Perret
2020-11-19 11:06     ` Will Deacon
2020-11-19 16:19       ` Peter Zijlstra
2020-11-19 16:30         ` Will Deacon
2020-11-19 16:44           ` Peter Zijlstra
2020-11-19 16:51             ` Will Deacon
2020-11-19 16:14   ` Peter Zijlstra
2020-11-19 16:28     ` Will Deacon
2020-11-19 16:42       ` Peter Zijlstra
2020-11-19 16:48         ` Will Deacon
2020-11-13  9:37 ` [PATCH v3 09/14] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
2020-11-19  9:29   ` Quentin Perret
2020-11-19 11:06     ` Will Deacon
2020-11-13  9:37 ` [PATCH v3 10/14] sched: Introduce arch_cpu_allowed_mask() to limit fallback rq selection Will Deacon
2020-11-19  9:38   ` Quentin Perret
2020-11-19 11:07     ` Will Deacon
2020-11-19 20:39       ` Will Deacon
2020-11-23 14:48         ` Quentin Perret
2020-11-13  9:37 ` [PATCH v3 11/14] sched: Reject CPU affinity changes based on arch_cpu_allowed_mask() Will Deacon
2020-11-19  9:47   ` Quentin Perret
2020-11-19 11:07     ` Will Deacon
2020-11-19 14:30       ` Quentin Perret
2020-11-19 16:44         ` Will Deacon
2020-11-13  9:37 ` [PATCH v3 12/14] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system Will Deacon
2020-11-13  9:37 ` [PATCH v3 13/14] arm64: Implement arch_cpu_allowed_mask() Will Deacon
2020-11-13  9:37 ` [PATCH v3 14/14] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores Will Deacon
2020-11-19 16:11 ` [PATCH v3 00/14] An alternative series for asymmetric AArch32 systems Peter Zijlstra
2020-11-19 16:39   ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).