linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems
@ 2021-07-30 11:24 Will Deacon
  2021-07-30 11:24 ` [PATCH v11 01/16] sched: Introduce task_cpu_possible_mask() to limit fallback rq selection Will Deacon
                   ` (15 more replies)
  0 siblings, 16 replies; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team

Hi everyone,

This is version eleven of the patches previously posted here:

  v1: https://lore.kernel.org/r/20201027215118.27003-1-will@kernel.org
  v2: https://lore.kernel.org/r/20201109213023.15092-1-will@kernel.org
  v3: https://lore.kernel.org/r/20201113093720.21106-1-will@kernel.org
  v4: https://lore.kernel.org/r/20201124155039.13804-1-will@kernel.org
  v5: https://lore.kernel.org/r/20201208132835.6151-1-will@kernel.org
  v6: https://lore.kernel.org/r/20210518094725.7701-1-will@kernel.org
  v7: https://lore.kernel.org/r/20210525151432.16875-1-will@kernel.org
  v8: https://lore.kernel.org/r/20210602164719.31777-1-will@kernel.org
  v9: https://lore.kernel.org/r/20210608180313.11502-1-will@kernel.org
 v10: https://lore.kernel.org/r/20210623173848.318-1-will@kernel.org

The main changes since v10 are:

  * Now based on v5.14-rc1

  * Fixed a lockup found during testing where select_fallback_rq()
    could return an invalid CPU when trying to prioritise a destination
    on the same NUMA node

Greg kindly pointed out a potential issue with the new 'aarch32_el0'
file in sysfs, where the contents will be truncated if we need to
display more than a PAGE_SIZE of data. However, I have elected not to
address this for now as (a) I do not think we'll ever have hardware in
that configuration (yeah, I know...) and (b) It is useful for scripts
if the file behaves the same as the other files in the same directory
(e.g. 'online'). Should a solution to this problem emerge for the other
files, I will be happy to adopt it here as well.

This series is now mostly scheduler stuff, but the last few patches are
arm64 and so I would suggest a shared branch in -tip for merging.

Cheers,

Will

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Qais Yousef <qais.yousef@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: kernel-team@android.com

--->8

Will Deacon (16):
  sched: Introduce task_cpu_possible_mask() to limit fallback rq
    selection
  cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()
  cpuset: Cleanup cpuset_cpus_allowed_fallback() use in
    select_fallback_rq()
  sched: Reject CPU affinity changes based on task_cpu_possible_mask()
  sched: Introduce task_struct::user_cpus_ptr to track requested
    affinity
  sched: Split the guts of sched_setaffinity() into a helper function
  sched: Allow task CPU affinity to be restricted on asymmetric systems
  sched: Introduce dl_task_check_affinity() to check proposed affinity
  arm64: Implement task_cpu_possible_mask()
  arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit
    EL0
  arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched
    system
  arm64: Advertise CPUs capable of running 32-bit applications in sysfs
  arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0
  arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores
  Documentation: arm64: describe asymmetric 32-bit support

 .../ABI/testing/sysfs-devices-system-cpu      |   9 +
 .../admin-guide/kernel-parameters.txt         |  11 +
 Documentation/arm64/asymmetric-32bit.rst      | 155 ++++++++
 Documentation/arm64/index.rst                 |   1 +
 arch/arm64/include/asm/elf.h                  |   6 +-
 arch/arm64/include/asm/mmu_context.h          |  13 +
 arch/arm64/kernel/cpufeature.c                |  51 ++-
 arch/arm64/kernel/process.c                   |  47 ++-
 arch/arm64/kernel/signal.c                    |  26 --
 include/linux/cpuset.h                        |   8 +-
 include/linux/mmu_context.h                   |  14 +
 include/linux/sched.h                         |  21 ++
 init/init_task.c                              |   1 +
 kernel/cgroup/cpuset.c                        |  59 ++-
 kernel/fork.c                                 |   2 +
 kernel/sched/core.c                           | 344 ++++++++++++++----
 kernel/sched/sched.h                          |   1 +
 17 files changed, 628 insertions(+), 141 deletions(-)
 create mode 100644 Documentation/arm64/asymmetric-32bit.rst

-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 01/16] sched: Introduce task_cpu_possible_mask() to limit fallback rq selection
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
  2021-07-30 11:24 ` [PATCH v11 02/16] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team, Valentin Schneider

Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

On such a system, we must take care not to migrate a task to an
unsupported CPU when forcefully moving tasks in select_fallback_rq()
in response to a CPU hot-unplug operation.

Introduce a task_cpu_possible_mask() hook which, given a task argument,
allows an architecture to return a cpumask of CPUs that are capable of
executing that task. The default implementation returns the
cpu_possible_mask, since sane machines do not suffer from per-cpu ISA
limitations that affect scheduling. The new mask is used when selecting
the fallback runqueue as a last resort before forcing a migration to the
first active CPU.

Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/mmu_context.h | 14 ++++++++++++++
 kernel/sched/core.c         |  9 +++------
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/linux/mmu_context.h b/include/linux/mmu_context.h
index 03dee12d2b61..b9b970f7ab45 100644
--- a/include/linux/mmu_context.h
+++ b/include/linux/mmu_context.h
@@ -14,4 +14,18 @@
 static inline void leave_mm(int cpu) { }
 #endif
 
+/*
+ * CPUs that are capable of running user task @p. Must contain at least one
+ * active CPU. It is assumed that the kernel can run on all CPUs, so calling
+ * this for a kernel thread is pointless.
+ *
+ * By default, we assume a sane, homogeneous system.
+ */
+#ifndef task_cpu_possible_mask
+# define task_cpu_possible_mask(p)	cpu_possible_mask
+# define task_cpu_possible(cpu, p)	true
+#else
+# define task_cpu_possible(cpu, p)	cpumask_test_cpu((cpu), task_cpu_possible_mask(p))
+#endif
+
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d9ff40f4661..84b20feb3214 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2163,7 +2163,7 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
 
 	/* Non kernel threads are not allowed during either online or offline. */
 	if (!(p->flags & PF_KTHREAD))
-		return cpu_active(cpu);
+		return cpu_active(cpu) && task_cpu_possible(cpu, p);
 
 	/* KTHREAD_IS_PER_CPU is always allowed. */
 	if (kthread_is_per_cpu(p))
@@ -3114,9 +3114,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 
 		/* Look for allowed, online CPU in same node. */
 		for_each_cpu(dest_cpu, nodemask) {
-			if (!cpu_active(dest_cpu))
-				continue;
-			if (cpumask_test_cpu(dest_cpu, p->cpus_ptr))
+			if (is_cpu_allowed(p, dest_cpu))
 				return dest_cpu;
 		}
 	}
@@ -3146,10 +3144,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 			 *
 			 * More yuck to audit.
 			 */
-			do_set_cpus_allowed(p, cpu_possible_mask);
+			do_set_cpus_allowed(p, task_cpu_possible_mask(p));
 			state = fail;
 			break;
-
 		case fail:
 			BUG();
 			break;
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 02/16] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
  2021-07-30 11:24 ` [PATCH v11 01/16] sched: Introduce task_cpu_possible_mask() to limit fallback rq selection Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
  2021-07-30 11:24 ` [PATCH v11 03/16] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() Will Deacon
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team

If the scheduler cannot find an allowed CPU for a task,
cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask
if cgroup v1 is in use.

In preparation for allowing architectures to provide their own fallback
mask, just return early if we're either using cgroup v1 or we're using
cgroup v2 with a mask that contains invalid CPUs. This will allow
select_fallback_rq() to figure out the mask by itself.

Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/cpuset.h | 1 +
 kernel/cgroup/cpuset.c | 8 ++++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 04c20de66afc..ed6ec677dd6b 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -15,6 +15,7 @@
 #include <linux/cpumask.h>
 #include <linux/nodemask.h>
 #include <linux/mm.h>
+#include <linux/mmu_context.h>
 #include <linux/jump_label.h>
 
 #ifdef CONFIG_CPUSETS
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index adb5190c4429..6000d7fbf5da 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3322,9 +3322,13 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 
 void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 {
+	const struct cpumask *cs_mask;
+	const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
+
 	rcu_read_lock();
-	do_set_cpus_allowed(tsk, is_in_v2_mode() ?
-		task_cs(tsk)->cpus_allowed : cpu_possible_mask);
+	cs_mask = task_cs(tsk)->cpus_allowed;
+	if (is_in_v2_mode() && cpumask_subset(cs_mask, possible_mask))
+		do_set_cpus_allowed(tsk, cs_mask);
 	rcu_read_unlock();
 
 	/*
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 03/16] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
  2021-07-30 11:24 ` [PATCH v11 01/16] sched: Introduce task_cpu_possible_mask() to limit fallback rq selection Will Deacon
  2021-07-30 11:24 ` [PATCH v11 02/16] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
  2021-07-30 11:24 ` [PATCH v11 04/16] cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq() Will Deacon
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team, Valentin Schneider

Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

Modify guarantee_online_cpus() to take task_cpu_possible_mask() into
account when trying to find a suitable set of online CPUs for a given
task. This will avoid passing an invalid mask to set_cpus_allowed_ptr()
during ->attach() and will subsequently allow the cpuset hierarchy to be
taken into account when forcefully overriding the affinity mask for a
task which requires migration to a compatible CPU.

Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/cpuset.h |  2 +-
 kernel/cgroup/cpuset.c | 43 +++++++++++++++++++++++++-----------------
 2 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index ed6ec677dd6b..414a8e694413 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -185,7 +185,7 @@ static inline void cpuset_read_unlock(void) { }
 static inline void cpuset_cpus_allowed(struct task_struct *p,
 				       struct cpumask *mask)
 {
-	cpumask_copy(mask, cpu_possible_mask);
+	cpumask_copy(mask, task_cpu_possible_mask(p));
 }
 
 static inline void cpuset_cpus_allowed_fallback(struct task_struct *p)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 6000d7fbf5da..3984284c76bd 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -372,18 +372,29 @@ static inline bool is_in_v2_mode(void)
 }
 
 /*
- * Return in pmask the portion of a cpusets's cpus_allowed that
- * are online.  If none are online, walk up the cpuset hierarchy
- * until we find one that does have some online cpus.
+ * Return in pmask the portion of a task's cpusets's cpus_allowed that
+ * are online and are capable of running the task.  If none are found,
+ * walk up the cpuset hierarchy until we find one that does have some
+ * appropriate cpus.
  *
  * One way or another, we guarantee to return some non-empty subset
  * of cpu_online_mask.
  *
  * Call with callback_lock or cpuset_mutex held.
  */
-static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
+static void guarantee_online_cpus(struct task_struct *tsk,
+				  struct cpumask *pmask)
 {
-	while (!cpumask_intersects(cs->effective_cpus, cpu_online_mask)) {
+	const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
+	struct cpuset *cs;
+
+	if (WARN_ON(!cpumask_and(pmask, possible_mask, cpu_online_mask)))
+		cpumask_copy(pmask, cpu_online_mask);
+
+	rcu_read_lock();
+	cs = task_cs(tsk);
+
+	while (!cpumask_intersects(cs->effective_cpus, pmask)) {
 		cs = parent_cs(cs);
 		if (unlikely(!cs)) {
 			/*
@@ -393,11 +404,13 @@ static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
 			 * cpuset's effective_cpus is on its way to be
 			 * identical to cpu_online_mask.
 			 */
-			cpumask_copy(pmask, cpu_online_mask);
-			return;
+			goto out_unlock;
 		}
 	}
-	cpumask_and(pmask, cs->effective_cpus, cpu_online_mask);
+	cpumask_and(pmask, pmask, cs->effective_cpus);
+
+out_unlock:
+	rcu_read_unlock();
 }
 
 /*
@@ -2199,15 +2212,13 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 
 	percpu_down_write(&cpuset_rwsem);
 
-	/* prepare for attach */
-	if (cs == &top_cpuset)
-		cpumask_copy(cpus_attach, cpu_possible_mask);
-	else
-		guarantee_online_cpus(cs, cpus_attach);
-
 	guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
 
 	cgroup_taskset_for_each(task, css, tset) {
+		if (cs != &top_cpuset)
+			guarantee_online_cpus(task, cpus_attach);
+		else
+			cpumask_copy(cpus_attach, task_cpu_possible_mask(task));
 		/*
 		 * can_attach beforehand should guarantee that this doesn't
 		 * fail.  TODO: have a better way to handle failure here
@@ -3302,9 +3313,7 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 	unsigned long flags;
 
 	spin_lock_irqsave(&callback_lock, flags);
-	rcu_read_lock();
-	guarantee_online_cpus(task_cs(tsk), pmask);
-	rcu_read_unlock();
+	guarantee_online_cpus(tsk, pmask);
 	spin_unlock_irqrestore(&callback_lock, flags);
 }
 
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 04/16] cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq()
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (2 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 03/16] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
  2021-07-30 11:24 ` [PATCH v11 05/16] sched: Reject CPU affinity changes based on task_cpu_possible_mask() Will Deacon
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team

select_fallback_rq() only needs to recheck for an allowed CPU if the
affinity mask of the task has changed since the last check.

Return a 'bool' from cpuset_cpus_allowed_fallback() to indicate whether
the affinity mask was updated, and use this to elide the allowed check
when the mask has been left alone.

No functional change.

Suggested-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/cpuset.h |  5 +++--
 kernel/cgroup/cpuset.c | 10 ++++++++--
 kernel/sched/core.c    |  3 +--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 414a8e694413..d2b9c41c8edf 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -59,7 +59,7 @@ extern void cpuset_wait_for_hotplug(void);
 extern void cpuset_read_lock(void);
 extern void cpuset_read_unlock(void);
 extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
-extern void cpuset_cpus_allowed_fallback(struct task_struct *p);
+extern bool cpuset_cpus_allowed_fallback(struct task_struct *p);
 extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
 #define cpuset_current_mems_allowed (current->mems_allowed)
 void cpuset_init_current_mems_allowed(void);
@@ -188,8 +188,9 @@ static inline void cpuset_cpus_allowed(struct task_struct *p,
 	cpumask_copy(mask, task_cpu_possible_mask(p));
 }
 
-static inline void cpuset_cpus_allowed_fallback(struct task_struct *p)
+static inline bool cpuset_cpus_allowed_fallback(struct task_struct *p)
 {
+	return false;
 }
 
 static inline nodemask_t cpuset_mems_allowed(struct task_struct *p)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 3984284c76bd..e78271a1b2fa 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3327,17 +3327,22 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
  * which will not contain a sane cpumask during cases such as cpu hotplugging.
  * This is the absolute last resort for the scheduler and it is only used if
  * _every_ other avenue has been traveled.
+ *
+ * Returns true if the affinity of @tsk was changed, false otherwise.
  **/
 
-void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
+bool cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 {
 	const struct cpumask *cs_mask;
+	bool changed = false;
 	const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
 
 	rcu_read_lock();
 	cs_mask = task_cs(tsk)->cpus_allowed;
-	if (is_in_v2_mode() && cpumask_subset(cs_mask, possible_mask))
+	if (is_in_v2_mode() && cpumask_subset(cs_mask, possible_mask)) {
 		do_set_cpus_allowed(tsk, cs_mask);
+		changed = true;
+	}
 	rcu_read_unlock();
 
 	/*
@@ -3357,6 +3362,7 @@ void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 	 * select_fallback_rq() will fix things ups and set cpu_possible_mask
 	 * if required.
 	 */
+	return changed;
 }
 
 void __init cpuset_init_current_mems_allowed(void)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 84b20feb3214..9fd598b8dac5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3131,8 +3131,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 		/* No more Mr. Nice Guy. */
 		switch (state) {
 		case cpuset:
-			if (IS_ENABLED(CONFIG_CPUSETS)) {
-				cpuset_cpus_allowed_fallback(p);
+			if (cpuset_cpus_allowed_fallback(p)) {
 				state = possible;
 				break;
 			}
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 05/16] sched: Reject CPU affinity changes based on task_cpu_possible_mask()
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (3 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 04/16] cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq() Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
  2021-07-30 11:24 ` [PATCH v11 06/16] sched: Introduce task_struct::user_cpus_ptr to track requested affinity Will Deacon
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team, Valentin Schneider

Reject explicit requests to change the affinity mask of a task via
set_cpus_allowed_ptr() if the requested mask is not a subset of the
mask returned by task_cpu_possible_mask(). This ensures that the
'cpus_mask' for a given task cannot contain CPUs which are incapable of
executing it, except in cases where the affinity is forced.

Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/sched/core.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9fd598b8dac5..dbce9cd83a53 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2700,15 +2700,17 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 				  u32 flags)
 {
 	const struct cpumask *cpu_valid_mask = cpu_active_mask;
+	const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
 	unsigned int dest_cpu;
 	struct rq_flags rf;
 	struct rq *rq;
 	int ret = 0;
+	bool kthread = p->flags & PF_KTHREAD;
 
 	rq = task_rq_lock(p, &rf);
 	update_rq_clock(rq);
 
-	if (p->flags & PF_KTHREAD || is_migration_disabled(p)) {
+	if (kthread || is_migration_disabled(p)) {
 		/*
 		 * Kernel threads are allowed on online && !active CPUs,
 		 * however, during cpu-hot-unplug, even these might get pushed
@@ -2722,6 +2724,11 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 		cpu_valid_mask = cpu_online_mask;
 	}
 
+	if (!kthread && !cpumask_subset(new_mask, cpu_allowed_mask)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
 	/*
 	 * Must re-check here, to close a race against __kthread_bind(),
 	 * sched_setaffinity() is not guaranteed to observe the flag.
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 06/16] sched: Introduce task_struct::user_cpus_ptr to track requested affinity
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (4 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 05/16] sched: Reject CPU affinity changes based on task_cpu_possible_mask() Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
  2021-07-30 11:24 ` [PATCH v11 07/16] sched: Split the guts of sched_setaffinity() into a helper function Will Deacon
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team, Valentin Schneider

In preparation for saving and restoring the user-requested CPU affinity
mask of a task, add a new cpumask_t pointer to 'struct task_struct'.

If the pointer is non-NULL, then the mask is copied across fork() and
freed on task exit.

Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/sched.h | 13 +++++++++++++
 init/init_task.c      |  1 +
 kernel/fork.c         |  2 ++
 kernel/sched/core.c   | 20 ++++++++++++++++++++
 4 files changed, 36 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ec8d07d88641..91dab7a62aa1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -748,6 +748,7 @@ struct task_struct {
 	unsigned int			policy;
 	int				nr_cpus_allowed;
 	const cpumask_t			*cpus_ptr;
+	cpumask_t			*user_cpus_ptr;
 	cpumask_t			cpus_mask;
 	void				*migration_pending;
 #ifdef CONFIG_SMP
@@ -1705,6 +1706,8 @@ extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
+extern int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, int node);
+extern void release_user_cpus_ptr(struct task_struct *p);
 #else
 static inline void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 {
@@ -1715,6 +1718,16 @@ static inline int set_cpus_allowed_ptr(struct task_struct *p, const struct cpuma
 		return -EINVAL;
 	return 0;
 }
+static inline int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, int node)
+{
+	if (src->user_cpus_ptr)
+		return -EINVAL;
+	return 0;
+}
+static inline void release_user_cpus_ptr(struct task_struct *p)
+{
+	WARN_ON(p->user_cpus_ptr);
+}
 #endif
 
 extern int yield_to(struct task_struct *p, bool preempt);
diff --git a/init/init_task.c b/init/init_task.c
index 562f2ef8d157..2d024066e27b 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -80,6 +80,7 @@ struct task_struct init_task
 	.normal_prio	= MAX_PRIO - 20,
 	.policy		= SCHED_NORMAL,
 	.cpus_ptr	= &init_task.cpus_mask,
+	.user_cpus_ptr	= NULL,
 	.cpus_mask	= CPU_MASK_ALL,
 	.nr_cpus_allowed= NR_CPUS,
 	.mm		= NULL,
diff --git a/kernel/fork.c b/kernel/fork.c
index bc94b2cc5995..bd0e165b8397 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -446,6 +446,7 @@ void put_task_stack(struct task_struct *tsk)
 
 void free_task(struct task_struct *tsk)
 {
+	release_user_cpus_ptr(tsk);
 	scs_release(tsk);
 
 #ifndef CONFIG_THREAD_INFO_IN_TASK
@@ -924,6 +925,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 #endif
 	if (orig->cpus_ptr == &orig->cpus_mask)
 		tsk->cpus_ptr = &tsk->cpus_mask;
+	dup_user_cpus_ptr(tsk, orig, node);
 
 	/*
 	 * One for the user space visible state that goes away when reaped.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dbce9cd83a53..a139ed8be7e3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2470,6 +2470,26 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 	__do_set_cpus_allowed(p, new_mask, 0);
 }
 
+int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
+		      int node)
+{
+	if (!src->user_cpus_ptr)
+		return 0;
+
+	dst->user_cpus_ptr = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
+	if (!dst->user_cpus_ptr)
+		return -ENOMEM;
+
+	cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
+	return 0;
+}
+
+void release_user_cpus_ptr(struct task_struct *p)
+{
+	kfree(p->user_cpus_ptr);
+	p->user_cpus_ptr = NULL;
+}
+
 /*
  * This function is wildly self concurrent; here be dragons.
  *
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 07/16] sched: Split the guts of sched_setaffinity() into a helper function
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (5 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 06/16] sched: Introduce task_struct::user_cpus_ptr to track requested affinity Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-08-17 15:40   ` Peter Zijlstra
  2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
  2021-07-30 11:24 ` [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems Will Deacon
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team, Valentin Schneider

In preparation for replaying user affinity requests using a saved mask,
split sched_setaffinity() up so that the initial task lookup and
security checks are only performed when the request is coming directly
from userspace.

Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/sched/core.c | 105 ++++++++++++++++++++++++--------------------
 1 file changed, 57 insertions(+), 48 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a139ed8be7e3..d4219d366103 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7578,53 +7578,22 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
 	return retval;
 }
 
-long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
+static int
+__sched_setaffinity(struct task_struct *p, const struct cpumask *mask)
 {
-	cpumask_var_t cpus_allowed, new_mask;
-	struct task_struct *p;
 	int retval;
+	cpumask_var_t cpus_allowed, new_mask;
 
-	rcu_read_lock();
-
-	p = find_process_by_pid(pid);
-	if (!p) {
-		rcu_read_unlock();
-		return -ESRCH;
-	}
-
-	/* Prevent p going away */
-	get_task_struct(p);
-	rcu_read_unlock();
+	if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL))
+		return -ENOMEM;
 
-	if (p->flags & PF_NO_SETAFFINITY) {
-		retval = -EINVAL;
-		goto out_put_task;
-	}
-	if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL)) {
-		retval = -ENOMEM;
-		goto out_put_task;
-	}
 	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL)) {
 		retval = -ENOMEM;
 		goto out_free_cpus_allowed;
 	}
-	retval = -EPERM;
-	if (!check_same_owner(p)) {
-		rcu_read_lock();
-		if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
-			rcu_read_unlock();
-			goto out_free_new_mask;
-		}
-		rcu_read_unlock();
-	}
-
-	retval = security_task_setscheduler(p);
-	if (retval)
-		goto out_free_new_mask;
-
 
 	cpuset_cpus_allowed(p, cpus_allowed);
-	cpumask_and(new_mask, in_mask, cpus_allowed);
+	cpumask_and(new_mask, mask, cpus_allowed);
 
 	/*
 	 * Since bandwidth control happens on root_domain basis,
@@ -7645,23 +7614,63 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 #endif
 again:
 	retval = __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK);
+	if (retval)
+		goto out_free_new_mask;
 
-	if (!retval) {
-		cpuset_cpus_allowed(p, cpus_allowed);
-		if (!cpumask_subset(new_mask, cpus_allowed)) {
-			/*
-			 * We must have raced with a concurrent cpuset
-			 * update. Just reset the cpus_allowed to the
-			 * cpuset's cpus_allowed
-			 */
-			cpumask_copy(new_mask, cpus_allowed);
-			goto again;
-		}
+	cpuset_cpus_allowed(p, cpus_allowed);
+	if (!cpumask_subset(new_mask, cpus_allowed)) {
+		/*
+		 * We must have raced with a concurrent cpuset update.
+		 * Just reset the cpumask to the cpuset's cpus_allowed.
+		 */
+		cpumask_copy(new_mask, cpus_allowed);
+		goto again;
 	}
+
 out_free_new_mask:
 	free_cpumask_var(new_mask);
 out_free_cpus_allowed:
 	free_cpumask_var(cpus_allowed);
+	return retval;
+}
+
+long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
+{
+	struct task_struct *p;
+	int retval;
+
+	rcu_read_lock();
+
+	p = find_process_by_pid(pid);
+	if (!p) {
+		rcu_read_unlock();
+		return -ESRCH;
+	}
+
+	/* Prevent p going away */
+	get_task_struct(p);
+	rcu_read_unlock();
+
+	if (p->flags & PF_NO_SETAFFINITY) {
+		retval = -EINVAL;
+		goto out_put_task;
+	}
+
+	if (!check_same_owner(p)) {
+		rcu_read_lock();
+		if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
+			rcu_read_unlock();
+			retval = -EPERM;
+			goto out_put_task;
+		}
+		rcu_read_unlock();
+	}
+
+	retval = security_task_setscheduler(p);
+	if (retval)
+		goto out_put_task;
+
+	retval = __sched_setaffinity(p, in_mask);
 out_put_task:
 	put_task_struct(p);
 	return retval;
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (6 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 07/16] sched: Split the guts of sched_setaffinity() into a helper function Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-08-17 15:10   ` Peter Zijlstra
                     ` (2 more replies)
  2021-07-30 11:24 ` [PATCH v11 09/16] sched: Introduce dl_task_check_affinity() to check proposed affinity Will Deacon
                   ` (7 subsequent siblings)
  15 siblings, 3 replies; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team

Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

Although userspace can carefully manage the affinity masks for such
tasks, one place where it is particularly problematic is execve()
because the CPU on which the execve() is occurring may be incompatible
with the new application image. In such a situation, it is desirable to
restrict the affinity mask of the task and ensure that the new image is
entered on a compatible CPU. From userspace's point of view, this looks
the same as if the incompatible CPUs have been hotplugged off in the
task's affinity mask. Similarly, if a subsequent execve() reverts to
a compatible image, then the old affinity is restored if it is still
valid.

In preparation for restricting the affinity mask for compat tasks on
arm64 systems without uniform support for 32-bit applications, introduce
{force,relax}_compatible_cpus_allowed_ptr(), which respectively restrict
and restore the affinity mask for a task based on the compatible CPUs.

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/sched.h |   2 +
 kernel/sched/core.c   | 180 ++++++++++++++++++++++++++++++++++++++----
 kernel/sched/sched.h  |   1 +
 3 files changed, 167 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 91dab7a62aa1..2ebe3d6f8f0c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1708,6 +1708,8 @@ extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
 extern int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, int node);
 extern void release_user_cpus_ptr(struct task_struct *p);
+extern void force_compatible_cpus_allowed_ptr(struct task_struct *p);
+extern void relax_compatible_cpus_allowed_ptr(struct task_struct *p);
 #else
 static inline void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d4219d366103..aec75ec1d257 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2707,27 +2707,22 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 }
 
 /*
- * Change a given task's CPU affinity. Migrate the thread to a
- * proper CPU and schedule it away if the CPU it's executing on
- * is removed from the allowed bitmask.
- *
- * NOTE: the caller must have a valid reference to the task, the
- * task must not exit() & deallocate itself prematurely. The
- * call is not atomic; no spinlocks may be held.
+ * Called with both p->pi_lock and rq->lock held; drops both before returning.
  */
-static int __set_cpus_allowed_ptr(struct task_struct *p,
-				  const struct cpumask *new_mask,
-				  u32 flags)
+static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
+					 const struct cpumask *new_mask,
+					 u32 flags,
+					 struct rq *rq,
+					 struct rq_flags *rf)
+	__releases(rq->lock)
+	__releases(p->pi_lock)
 {
 	const struct cpumask *cpu_valid_mask = cpu_active_mask;
 	const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
 	unsigned int dest_cpu;
-	struct rq_flags rf;
-	struct rq *rq;
 	int ret = 0;
 	bool kthread = p->flags & PF_KTHREAD;
 
-	rq = task_rq_lock(p, &rf);
 	update_rq_clock(rq);
 
 	if (kthread || is_migration_disabled(p)) {
@@ -2783,20 +2778,173 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 
 	__do_set_cpus_allowed(p, new_mask, flags);
 
-	return affine_move_task(rq, p, &rf, dest_cpu, flags);
+	if (flags & SCA_USER)
+		release_user_cpus_ptr(p);
+
+	return affine_move_task(rq, p, rf, dest_cpu, flags);
 
 out:
-	task_rq_unlock(rq, p, &rf);
+	task_rq_unlock(rq, p, rf);
 
 	return ret;
 }
 
+/*
+ * Change a given task's CPU affinity. Migrate the thread to a
+ * proper CPU and schedule it away if the CPU it's executing on
+ * is removed from the allowed bitmask.
+ *
+ * NOTE: the caller must have a valid reference to the task, the
+ * task must not exit() & deallocate itself prematurely. The
+ * call is not atomic; no spinlocks may be held.
+ */
+static int __set_cpus_allowed_ptr(struct task_struct *p,
+				  const struct cpumask *new_mask, u32 flags)
+{
+	struct rq_flags rf;
+	struct rq *rq;
+
+	rq = task_rq_lock(p, &rf);
+	return __set_cpus_allowed_ptr_locked(p, new_mask, flags, rq, &rf);
+}
+
 int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 {
 	return __set_cpus_allowed_ptr(p, new_mask, 0);
 }
 EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);
 
+/*
+ * Change a given task's CPU affinity to the intersection of its current
+ * affinity mask and @subset_mask, writing the resulting mask to @new_mask
+ * and pointing @p->user_cpus_ptr to a copy of the old mask.
+ * If the resulting mask is empty, leave the affinity unchanged and return
+ * -EINVAL.
+ */
+static int restrict_cpus_allowed_ptr(struct task_struct *p,
+				     struct cpumask *new_mask,
+				     const struct cpumask *subset_mask)
+{
+	struct rq_flags rf;
+	struct rq *rq;
+	int err;
+	struct cpumask *user_mask = NULL;
+
+	if (!p->user_cpus_ptr) {
+		user_mask = kmalloc(cpumask_size(), GFP_KERNEL);
+
+		if (!user_mask)
+			return -ENOMEM;
+	}
+
+	rq = task_rq_lock(p, &rf);
+
+	/*
+	 * Forcefully restricting the affinity of a deadline task is
+	 * likely to cause problems, so fail and noisily override the
+	 * mask entirely.
+	 */
+	if (task_has_dl_policy(p) && dl_bandwidth_enabled()) {
+		err = -EPERM;
+		goto err_unlock;
+	}
+
+	if (!cpumask_and(new_mask, &p->cpus_mask, subset_mask)) {
+		err = -EINVAL;
+		goto err_unlock;
+	}
+
+	/*
+	 * We're about to butcher the task affinity, so keep track of what
+	 * the user asked for in case we're able to restore it later on.
+	 */
+	if (user_mask) {
+		cpumask_copy(user_mask, p->cpus_ptr);
+		p->user_cpus_ptr = user_mask;
+	}
+
+	return __set_cpus_allowed_ptr_locked(p, new_mask, 0, rq, &rf);
+
+err_unlock:
+	task_rq_unlock(rq, p, &rf);
+	kfree(user_mask);
+	return err;
+}
+
+/*
+ * Restrict the CPU affinity of task @p so that it is a subset of
+ * task_cpu_possible_mask() and point @p->user_cpu_ptr to a copy of the
+ * old affinity mask. If the resulting mask is empty, we warn and walk
+ * up the cpuset hierarchy until we find a suitable mask.
+ */
+void force_compatible_cpus_allowed_ptr(struct task_struct *p)
+{
+	cpumask_var_t new_mask;
+	const struct cpumask *override_mask = task_cpu_possible_mask(p);
+
+	alloc_cpumask_var(&new_mask, GFP_KERNEL);
+
+	/*
+	 * __migrate_task() can fail silently in the face of concurrent
+	 * offlining of the chosen destination CPU, so take the hotplug
+	 * lock to ensure that the migration succeeds.
+	 */
+	cpus_read_lock();
+	if (!cpumask_available(new_mask))
+		goto out_set_mask;
+
+	if (!restrict_cpus_allowed_ptr(p, new_mask, override_mask))
+		goto out_free_mask;
+
+	/*
+	 * We failed to find a valid subset of the affinity mask for the
+	 * task, so override it based on its cpuset hierarchy.
+	 */
+	cpuset_cpus_allowed(p, new_mask);
+	override_mask = new_mask;
+
+out_set_mask:
+	if (printk_ratelimit()) {
+		printk_deferred("Overriding affinity for process %d (%s) to CPUs %*pbl\n",
+				task_pid_nr(p), p->comm,
+				cpumask_pr_args(override_mask));
+	}
+
+	WARN_ON(set_cpus_allowed_ptr(p, override_mask));
+out_free_mask:
+	cpus_read_unlock();
+	free_cpumask_var(new_mask);
+}
+
+static int
+__sched_setaffinity(struct task_struct *p, const struct cpumask *mask);
+
+/*
+ * Restore the affinity of a task @p which was previously restricted by a
+ * call to force_compatible_cpus_allowed_ptr(). This will clear (and free)
+ * @p->user_cpus_ptr.
+ *
+ * It is the caller's responsibility to serialise this with any calls to
+ * force_compatible_cpus_allowed_ptr(@p).
+ */
+void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
+{
+	unsigned long flags;
+	struct cpumask *mask = p->user_cpus_ptr;
+
+	/*
+	 * Try to restore the old affinity mask. If this fails, then
+	 * we free the mask explicitly to avoid it being inherited across
+	 * a subsequent fork().
+	 */
+	if (!mask || !__sched_setaffinity(p, mask))
+		return;
+
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
+	release_user_cpus_ptr(p);
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+}
+
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
 #ifdef CONFIG_SCHED_DEBUG
@@ -7613,7 +7761,7 @@ __sched_setaffinity(struct task_struct *p, const struct cpumask *mask)
 	}
 #endif
 again:
-	retval = __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK);
+	retval = __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK | SCA_USER);
 	if (retval)
 		goto out_free_new_mask;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 14a41a243f7b..e88c2d399f0d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2234,6 +2234,7 @@ extern struct task_struct *pick_next_task_idle(struct rq *rq);
 #define SCA_CHECK		0x01
 #define SCA_MIGRATE_DISABLE	0x02
 #define SCA_MIGRATE_ENABLE	0x04
+#define SCA_USER		0x08
 
 #ifdef CONFIG_SMP
 
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 09/16] sched: Introduce dl_task_check_affinity() to check proposed affinity
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (7 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
  2021-07-30 11:24 ` [PATCH v11 10/16] arm64: Implement task_cpu_possible_mask() Will Deacon
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team

In preparation for restricting the affinity of a task during execve()
on arm64, introduce a new dl_task_check_affinity() helper function to
give an indication as to whether the restricted mask is admissible for
a deadline task.

Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/sched.h |  6 ++++++
 kernel/sched/core.c   | 46 +++++++++++++++++++++++++++----------------
 2 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2ebe3d6f8f0c..6ecd02e2ca1e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1708,6 +1708,7 @@ extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
 extern int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, int node);
 extern void release_user_cpus_ptr(struct task_struct *p);
+extern int dl_task_check_affinity(struct task_struct *p, const struct cpumask *mask);
 extern void force_compatible_cpus_allowed_ptr(struct task_struct *p);
 extern void relax_compatible_cpus_allowed_ptr(struct task_struct *p);
 #else
@@ -1730,6 +1731,11 @@ static inline void release_user_cpus_ptr(struct task_struct *p)
 {
 	WARN_ON(p->user_cpus_ptr);
 }
+
+static inline int dl_task_check_affinity(struct task_struct *p, const struct cpumask *mask)
+{
+	return 0;
+}
 #endif
 
 extern int yield_to(struct task_struct *p, bool preempt);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index aec75ec1d257..9f576a67bc31 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7726,6 +7726,32 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
 	return retval;
 }
 
+#ifdef CONFIG_SMP
+int dl_task_check_affinity(struct task_struct *p, const struct cpumask *mask)
+{
+	int ret = 0;
+
+	/*
+	 * If the task isn't a deadline task or admission control is
+	 * disabled then we don't care about affinity changes.
+	 */
+	if (!task_has_dl_policy(p) || !dl_bandwidth_enabled())
+		return 0;
+
+	/*
+	 * Since bandwidth control happens on root_domain basis,
+	 * if admission test is enabled, we only admit -deadline
+	 * tasks allowed to run on all the CPUs in the task's
+	 * root_domain.
+	 */
+	rcu_read_lock();
+	if (!cpumask_subset(task_rq(p)->rd->span, mask))
+		ret = -EBUSY;
+	rcu_read_unlock();
+	return ret;
+}
+#endif
+
 static int
 __sched_setaffinity(struct task_struct *p, const struct cpumask *mask)
 {
@@ -7743,23 +7769,9 @@ __sched_setaffinity(struct task_struct *p, const struct cpumask *mask)
 	cpuset_cpus_allowed(p, cpus_allowed);
 	cpumask_and(new_mask, mask, cpus_allowed);
 
-	/*
-	 * Since bandwidth control happens on root_domain basis,
-	 * if admission test is enabled, we only admit -deadline
-	 * tasks allowed to run on all the CPUs in the task's
-	 * root_domain.
-	 */
-#ifdef CONFIG_SMP
-	if (task_has_dl_policy(p) && dl_bandwidth_enabled()) {
-		rcu_read_lock();
-		if (!cpumask_subset(task_rq(p)->rd->span, new_mask)) {
-			retval = -EBUSY;
-			rcu_read_unlock();
-			goto out_free_new_mask;
-		}
-		rcu_read_unlock();
-	}
-#endif
+	retval = dl_task_check_affinity(p, new_mask);
+	if (retval)
+		goto out_free_new_mask;
 again:
 	retval = __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK | SCA_USER);
 	if (retval)
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 10/16] arm64: Implement task_cpu_possible_mask()
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (8 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 09/16] sched: Introduce dl_task_check_affinity() to check proposed affinity Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-07-30 11:24 ` [PATCH v11 11/16] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team

Provide an implementation of task_cpu_possible_mask() so that we can
prevent 64-bit-only cores being added to the 'cpus_mask' for compat
tasks on systems with mismatched 32-bit support at EL0,

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/mmu_context.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index eeb210997149..f4ba93d4ffeb 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -231,6 +231,19 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 	update_saved_ttbr0(tsk, next);
 }
 
+static inline const struct cpumask *
+task_cpu_possible_mask(struct task_struct *p)
+{
+	if (!static_branch_unlikely(&arm64_mismatched_32bit_el0))
+		return cpu_possible_mask;
+
+	if (!is_compat_thread(task_thread_info(p)))
+		return cpu_possible_mask;
+
+	return system_32bit_el0_cpumask();
+}
+#define task_cpu_possible_mask	task_cpu_possible_mask
+
 void verify_cpu_asid_bits(void);
 void post_ttbr_update_workaround(void);
 
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 11/16] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (9 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 10/16] arm64: Implement task_cpu_possible_mask() Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-07-30 11:24 ` [PATCH v11 12/16] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system Will Deacon
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team

When exec'ing a 32-bit task on a system with mismatched support for
32-bit EL0, try to ensure that it starts life on a CPU that can actually
run it.

Similarly, when exec'ing a 64-bit task on such a system, try to restore
the old affinity mask if it was previously restricted.

Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/elf.h |  6 ++----
 arch/arm64/kernel/process.c  | 39 +++++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index 8d1c8dcb87fd..97932fbf973d 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -213,10 +213,8 @@ typedef compat_elf_greg_t		compat_elf_gregset_t[COMPAT_ELF_NGREG];
 
 /* AArch32 EABI. */
 #define EF_ARM_EABI_MASK		0xff000000
-#define compat_elf_check_arch(x)	(system_supports_32bit_el0() && \
-					 ((x)->e_machine == EM_ARM) && \
-					 ((x)->e_flags & EF_ARM_EABI_MASK))
-
+int compat_elf_check_arch(const struct elf32_hdr *);
+#define compat_elf_check_arch		compat_elf_check_arch
 #define compat_start_thread		compat_start_thread
 /*
  * Unlike the native SET_PERSONALITY macro, the compat version maintains
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index c8989b999250..583ee58f8c9c 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -21,6 +21,7 @@
 #include <linux/mman.h>
 #include <linux/mm.h>
 #include <linux/nospec.h>
+#include <linux/sched.h>
 #include <linux/stddef.h>
 #include <linux/sysctl.h>
 #include <linux/unistd.h>
@@ -579,6 +580,28 @@ unsigned long arch_align_stack(unsigned long sp)
 	return sp & ~0xf;
 }
 
+#ifdef CONFIG_COMPAT
+int compat_elf_check_arch(const struct elf32_hdr *hdr)
+{
+	if (!system_supports_32bit_el0())
+		return false;
+
+	if ((hdr)->e_machine != EM_ARM)
+		return false;
+
+	if (!((hdr)->e_flags & EF_ARM_EABI_MASK))
+		return false;
+
+	/*
+	 * Prevent execve() of a 32-bit program from a deadline task
+	 * if the restricted affinity mask would be inadmissible on an
+	 * asymmetric system.
+	 */
+	return !static_branch_unlikely(&arm64_mismatched_32bit_el0) ||
+	       !dl_task_check_affinity(current, system_32bit_el0_cpumask());
+}
+#endif
+
 /*
  * Called from setup_new_exec() after (COMPAT_)SET_PERSONALITY.
  */
@@ -588,8 +611,22 @@ void arch_setup_new_exec(void)
 
 	if (is_compat_task()) {
 		mmflags = MMCF_AARCH32;
-		if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
+
+		/*
+		 * Restrict the CPU affinity mask for a 32-bit task so that
+		 * it contains only 32-bit-capable CPUs.
+		 *
+		 * From the perspective of the task, this looks similar to
+		 * what would happen if the 64-bit-only CPUs were hot-unplugged
+		 * at the point of execve(), although we try a bit harder to
+		 * honour the cpuset hierarchy.
+		 */
+		if (static_branch_unlikely(&arm64_mismatched_32bit_el0)) {
+			force_compatible_cpus_allowed_ptr(current);
 			set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
+		}
+	} else if (static_branch_unlikely(&arm64_mismatched_32bit_el0)) {
+		relax_compatible_cpus_allowed_ptr(current);
 	}
 
 	current->mm->context.flags = mmflags;
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 12/16] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (10 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 11/16] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-07-30 11:24 ` [PATCH v11 13/16] arm64: Advertise CPUs capable of running 32-bit applications in sysfs Will Deacon
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team

If we want to support 32-bit applications, then when we identify a CPU
with mismatched 32-bit EL0 support we must ensure that we will always
have an active 32-bit CPU available to us from then on. This is important
for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
CPUs for compat tasks and forced migration due to a hotplug event will
hang if no 32-bit CPUs are available.

On detecting a mismatch, prevent offlining of either the mismatching CPU
if it is 32-bit capable, or find the first active 32-bit capable CPU
otherwise.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/cpufeature.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 125d5c9471ac..d99a29f52aa1 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2900,15 +2900,38 @@ void __init setup_cpu_features(void)
 
 static int enable_mismatched_32bit_el0(unsigned int cpu)
 {
+	/*
+	 * The first 32-bit-capable CPU we detected and so can no longer
+	 * be offlined by userspace. -1 indicates we haven't yet onlined
+	 * a 32-bit-capable CPU.
+	 */
+	static int lucky_winner = -1;
+
 	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
 	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
 
 	if (cpu_32bit) {
 		cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
 		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
-		setup_elf_hwcaps(compat_elf_hwcaps);
 	}
 
+	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
+		return 0;
+
+	if (lucky_winner >= 0)
+		return 0;
+
+	/*
+	 * We've detected a mismatch. We need to keep one of our CPUs with
+	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
+	 * every CPU in the system for a 32-bit task.
+	 */
+	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
+							 cpu_active_mask);
+	get_cpu_device(lucky_winner)->offline_disabled = true;
+	setup_elf_hwcaps(compat_elf_hwcaps);
+	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
+		cpu, lucky_winner);
 	return 0;
 }
 
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 13/16] arm64: Advertise CPUs capable of running 32-bit applications in sysfs
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (11 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 12/16] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-07-30 11:24 ` [PATCH v11 14/16] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 Will Deacon
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team

Since 32-bit applications will be killed if they are caught trying to
execute on a 64-bit-only CPU in a mismatched system, advertise the set
of 32-bit capable CPUs to userspace in sysfs.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 .../ABI/testing/sysfs-devices-system-cpu      |  9 +++++++++
 arch/arm64/kernel/cpufeature.c                | 19 +++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 160b10c029c0..69edbd99e0b7 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -494,6 +494,15 @@ Description:	AArch64 CPU registers
 		'identification' directory exposes the CPU ID registers for
 		identifying model and revision of the CPU.
 
+What:		/sys/devices/system/cpu/aarch32_el0
+Date:		May 2021
+Contact:	Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org>
+Description:	Identifies the subset of CPUs in the system that can execute
+		AArch32 (32-bit ARM) applications. If present, the same format as
+		/sys/devices/system/cpu/{offline,online,possible,present} is used.
+		If absent, then all or none of the CPUs can execute AArch32
+		applications and execve() will behave accordingly.
+
 What:		/sys/devices/system/cpu/cpu#/cpu_capacity
 Date:		December 2016
 Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index d99a29f52aa1..7ee1095a4585 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -67,6 +67,7 @@
 #include <linux/crash_dump.h>
 #include <linux/sort.h>
 #include <linux/stop_machine.h>
+#include <linux/sysfs.h>
 #include <linux/types.h>
 #include <linux/minmax.h>
 #include <linux/mm.h>
@@ -1320,6 +1321,24 @@ const struct cpumask *system_32bit_el0_cpumask(void)
 	return cpu_possible_mask;
 }
 
+static ssize_t aarch32_el0_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	const struct cpumask *mask = system_32bit_el0_cpumask();
+
+	return sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(mask));
+}
+static const DEVICE_ATTR_RO(aarch32_el0);
+
+static int __init aarch32_el0_sysfs_init(void)
+{
+	if (!allow_mismatched_32bit_el0)
+		return 0;
+
+	return device_create_file(cpu_subsys.dev_root, &dev_attr_aarch32_el0);
+}
+device_initcall(aarch32_el0_sysfs_init);
+
 static bool has_32bit_el0(const struct arm64_cpu_capabilities *entry, int scope)
 {
 	if (!has_cpuid_feature(entry, scope))
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 14/16] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (12 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 13/16] arm64: Advertise CPUs capable of running 32-bit applications in sysfs Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-07-30 11:24 ` [PATCH v11 15/16] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores Will Deacon
  2021-07-30 11:24 ` [PATCH v11 16/16] Documentation: arm64: describe asymmetric 32-bit support Will Deacon
  15 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team

Allow systems with mismatched 32-bit support at EL0 to run 32-bit
applications based on a new kernel parameter.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
 arch/arm64/kernel/cpufeature.c                  | 7 +++++++
 2 files changed, 15 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bdb22006f713..6ab625dea8c0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -287,6 +287,14 @@
 			do not want to use tracing_snapshot_alloc() as it needs
 			to be done where GFP_KERNEL allocations are allowed.
 
+	allow_mismatched_32bit_el0 [ARM64]
+			Allow execve() of 32-bit applications and setting of the
+			PER_LINUX32 personality on systems where only a strict
+			subset of the CPUs support 32-bit EL0. When this
+			parameter is present, the set of CPUs supporting 32-bit
+			EL0 is indicated by /sys/devices/system/cpu/aarch32_el0
+			and hot-unplug operations may be restricted.
+
 	amd_iommu=	[HW,X86-64]
 			Pass parameters to the AMD IOMMU driver in the system.
 			Possible values are:
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 7ee1095a4585..a306ed5a6549 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1321,6 +1321,13 @@ const struct cpumask *system_32bit_el0_cpumask(void)
 	return cpu_possible_mask;
 }
 
+static int __init parse_32bit_el0_param(char *str)
+{
+	allow_mismatched_32bit_el0 = true;
+	return 0;
+}
+early_param("allow_mismatched_32bit_el0", parse_32bit_el0_param);
+
 static ssize_t aarch32_el0_show(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 15/16] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (13 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 14/16] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  2021-07-30 11:24 ` [PATCH v11 16/16] Documentation: arm64: describe asymmetric 32-bit support Will Deacon
  15 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team

The scheduler now knows enough about these braindead systems to place
32-bit tasks accordingly, so throw out the safety checks and allow the
ret-to-user path to avoid do_notify_resume() if there is nothing to do.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/process.c | 14 +-------------
 arch/arm64/kernel/signal.c  | 26 --------------------------
 2 files changed, 1 insertion(+), 39 deletions(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 583ee58f8c9c..e0e7f4e9b607 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -469,15 +469,6 @@ static void erratum_1418040_thread_switch(struct task_struct *prev,
 	write_sysreg(val, cntkctl_el1);
 }
 
-static void compat_thread_switch(struct task_struct *next)
-{
-	if (!is_compat_thread(task_thread_info(next)))
-		return;
-
-	if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
-		set_tsk_thread_flag(next, TIF_NOTIFY_RESUME);
-}
-
 static void update_sctlr_el1(u64 sctlr)
 {
 	/*
@@ -519,7 +510,6 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	ssbs_thread_switch(next);
 	erratum_1418040_thread_switch(prev, next);
 	ptrauth_thread_switch_user(next);
-	compat_thread_switch(next);
 
 	/*
 	 * Complete any pending TLB or cache maintenance on this CPU in case
@@ -621,10 +611,8 @@ void arch_setup_new_exec(void)
 		 * at the point of execve(), although we try a bit harder to
 		 * honour the cpuset hierarchy.
 		 */
-		if (static_branch_unlikely(&arm64_mismatched_32bit_el0)) {
+		if (static_branch_unlikely(&arm64_mismatched_32bit_el0))
 			force_compatible_cpus_allowed_ptr(current);
-			set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
-		}
 	} else if (static_branch_unlikely(&arm64_mismatched_32bit_el0)) {
 		relax_compatible_cpus_allowed_ptr(current);
 	}
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index f8192f4ae0b8..6237486ff6bb 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -911,19 +911,6 @@ static void do_signal(struct pt_regs *regs)
 	restore_saved_sigmask();
 }
 
-static bool cpu_affinity_invalid(struct pt_regs *regs)
-{
-	if (!compat_user_mode(regs))
-		return false;
-
-	/*
-	 * We're preemptible, but a reschedule will cause us to check the
-	 * affinity again.
-	 */
-	return !cpumask_test_cpu(raw_smp_processor_id(),
-				 system_32bit_el0_cpumask());
-}
-
 asmlinkage void do_notify_resume(struct pt_regs *regs,
 				 unsigned long thread_flags)
 {
@@ -951,19 +938,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 			if (thread_flags & _TIF_NOTIFY_RESUME) {
 				tracehook_notify_resume(regs);
 				rseq_handle_notify_resume(NULL, regs);
-
-				/*
-				 * If we reschedule after checking the affinity
-				 * then we must ensure that TIF_NOTIFY_RESUME
-				 * is set so that we check the affinity again.
-				 * Since tracehook_notify_resume() clears the
-				 * flag, ensure that the compiler doesn't move
-				 * it after the affinity check.
-				 */
-				barrier();
-
-				if (cpu_affinity_invalid(regs))
-					force_sig(SIGKILL);
 			}
 
 			if (thread_flags & _TIF_FOREIGN_FPSTATE)
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 16/16] Documentation: arm64: describe asymmetric 32-bit support
  2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
                   ` (14 preceding siblings ...)
  2021-07-30 11:24 ` [PATCH v11 15/16] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores Will Deacon
@ 2021-07-30 11:24 ` Will Deacon
  15 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2021-07-30 11:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, linux-kernel, Will Deacon, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Peter Zijlstra,
	Morten Rasmussen, Qais Yousef, Suren Baghdasaryan,
	Quentin Perret, Tejun Heo, Johannes Weiner, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Rafael J. Wysocki, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Valentin Schneider, Mark Rutland,
	kernel-team

Document support for running 32-bit tasks on asymmetric 32-bit systems
and its impact on the user ABI when enabled.

Signed-off-by: Will Deacon <will@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |   3 +
 Documentation/arm64/asymmetric-32bit.rst      | 155 ++++++++++++++++++
 Documentation/arm64/index.rst                 |   1 +
 3 files changed, 159 insertions(+)
 create mode 100644 Documentation/arm64/asymmetric-32bit.rst

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6ab625dea8c0..b2f5dd4ea805 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -295,6 +295,9 @@
 			EL0 is indicated by /sys/devices/system/cpu/aarch32_el0
 			and hot-unplug operations may be restricted.
 
+			See Documentation/arm64/asymmetric-32bit.rst for more
+			information.
+
 	amd_iommu=	[HW,X86-64]
 			Pass parameters to the AMD IOMMU driver in the system.
 			Possible values are:
diff --git a/Documentation/arm64/asymmetric-32bit.rst b/Documentation/arm64/asymmetric-32bit.rst
new file mode 100644
index 000000000000..64a0b505da7d
--- /dev/null
+++ b/Documentation/arm64/asymmetric-32bit.rst
@@ -0,0 +1,155 @@
+======================
+Asymmetric 32-bit SoCs
+======================
+
+Author: Will Deacon <will@kernel.org>
+
+This document describes the impact of asymmetric 32-bit SoCs on the
+execution of 32-bit (``AArch32``) applications.
+
+Date: 2021-05-17
+
+Introduction
+============
+
+Some Armv9 SoCs suffer from a big.LITTLE misfeature where only a subset
+of the CPUs are capable of executing 32-bit user applications. On such
+a system, Linux by default treats the asymmetry as a "mismatch" and
+disables support for both the ``PER_LINUX32`` personality and
+``execve(2)`` of 32-bit ELF binaries, with the latter returning
+``-ENOEXEC``. If the mismatch is detected during late onlining of a
+64-bit-only CPU, then the onlining operation fails and the new CPU is
+unavailable for scheduling.
+
+Surprisingly, these SoCs have been produced with the intention of
+running legacy 32-bit binaries. Unsurprisingly, that doesn't work very
+well with the default behaviour of Linux.
+
+It seems inevitable that future SoCs will drop 32-bit support
+altogether, so if you're stuck in the unenviable position of needing to
+run 32-bit code on one of these transitionary platforms then you would
+be wise to consider alternatives such as recompilation, emulation or
+retirement. If neither of those options are practical, then read on.
+
+Enabling kernel support
+=======================
+
+Since the kernel support is not completely transparent to userspace,
+allowing 32-bit tasks to run on an asymmetric 32-bit system requires an
+explicit "opt-in" and can be enabled by passing the
+``allow_mismatched_32bit_el0`` parameter on the kernel command-line.
+
+For the remainder of this document we will refer to an *asymmetric
+system* to mean an asymmetric 32-bit SoC running Linux with this kernel
+command-line option enabled.
+
+Userspace impact
+================
+
+32-bit tasks running on an asymmetric system behave in mostly the same
+way as on a homogeneous system, with a few key differences relating to
+CPU affinity.
+
+sysfs
+-----
+
+The subset of CPUs capable of running 32-bit tasks is described in
+``/sys/devices/system/cpu/aarch32_el0`` and is documented further in
+``Documentation/ABI/testing/sysfs-devices-system-cpu``.
+
+**Note:** CPUs are advertised by this file as they are detected and so
+late-onlining of 32-bit-capable CPUs can result in the file contents
+being modified by the kernel at runtime. Once advertised, CPUs are never
+removed from the file.
+
+``execve(2)``
+-------------
+
+On a homogeneous system, the CPU affinity of a task is preserved across
+``execve(2)``. This is not always possible on an asymmetric system,
+specifically when the new program being executed is 32-bit yet the
+affinity mask contains 64-bit-only CPUs. In this situation, the kernel
+determines the new affinity mask as follows:
+
+  1. If the 32-bit-capable subset of the affinity mask is not empty,
+     then the affinity is restricted to that subset and the old affinity
+     mask is saved. This saved mask is inherited over ``fork(2)`` and
+     preserved across ``execve(2)`` of 32-bit programs.
+
+     **Note:** This step does not apply to ``SCHED_DEADLINE`` tasks.
+     See `SCHED_DEADLINE`_.
+
+  2. Otherwise, the cpuset hierarchy of the task is walked until an
+     ancestor is found containing at least one 32-bit-capable CPU. The
+     affinity of the task is then changed to match the 32-bit-capable
+     subset of the cpuset determined by the walk.
+
+  3. On failure (i.e. out of memory), the affinity is changed to the set
+     of all 32-bit-capable CPUs of which the kernel is aware.
+
+A subsequent ``execve(2)`` of a 64-bit program by the 32-bit task will
+invalidate the affinity mask saved in (1) and attempt to restore the CPU
+affinity of the task using the saved mask if it was previously valid.
+This restoration may fail due to intervening changes to the deadline
+policy or cpuset hierarchy, in which case the ``execve(2)`` continues
+with the affinity unchanged.
+
+Calls to ``sched_setaffinity(2)`` for a 32-bit task will consider only
+the 32-bit-capable CPUs of the requested affinity mask. On success, the
+affinity for the task is updated and any saved mask from a prior
+``execve(2)`` is invalidated.
+
+``SCHED_DEADLINE``
+------------------
+
+Explicit admission of a 32-bit deadline task to the default root domain
+(e.g. by calling ``sched_setattr(2)``) is rejected on an asymmetric
+32-bit system unless admission control is disabled by writing -1 to
+``/proc/sys/kernel/sched_rt_runtime_us``.
+
+``execve(2)`` of a 32-bit program from a 64-bit deadline task will
+return ``-ENOEXEC`` if the root domain for the task contains any
+64-bit-only CPUs and admission control is enabled. Concurrent offlining
+of 32-bit-capable CPUs may still necessitate the procedure described in
+`execve(2)`_, in which case step (1) is skipped and a warning is
+emitted on the console.
+
+**Note:** It is recommended that a set of 32-bit-capable CPUs are placed
+into a separate root domain if ``SCHED_DEADLINE`` is to be used with
+32-bit tasks on an asymmetric system. Failure to do so is likely to
+result in missed deadlines.
+
+Cpusets
+-------
+
+The affinity of a 32-bit task on an asymmetric system may include CPUs
+that are not explicitly allowed by the cpuset to which it is attached.
+This can occur as a result of the following two situations:
+
+  - A 64-bit task attached to a cpuset which allows only 64-bit CPUs
+    executes a 32-bit program.
+
+  - All of the 32-bit-capable CPUs allowed by a cpuset containing a
+    32-bit task are offlined.
+
+In both of these cases, the new affinity is calculated according to step
+(2) of the process described in `execve(2)`_ and the cpuset hierarchy is
+unchanged irrespective of the cgroup version.
+
+CPU hotplug
+-----------
+
+On an asymmetric system, the first detected 32-bit-capable CPU is
+prevented from being offlined by userspace and any such attempt will
+return ``-EPERM``. Note that suspend is still permitted even if the
+primary CPU (i.e. CPU 0) is 64-bit-only.
+
+KVM
+---
+
+Although KVM will not advertise 32-bit EL0 support to any vCPUs on an
+asymmetric system, a broken guest at EL1 could still attempt to execute
+32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit
+mode will return to host userspace with an ``exit_reason`` of
+``KVM_EXIT_FAIL_ENTRY`` and will remain non-runnable until successfully
+re-initialised by a subsequent ``KVM_ARM_VCPU_INIT`` operation.
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index 97d65ba12a35..4f840bac083e 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -10,6 +10,7 @@ ARM64 Architecture
     acpi_object_usage
     amu
     arm-acpi
+    asymmetric-32bit
     booting
     cpu-feature-registers
     elf_hwcaps
-- 
2.32.0.402.g57bb445576-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems
  2021-07-30 11:24 ` [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems Will Deacon
@ 2021-08-17 15:10   ` Peter Zijlstra
  2021-08-18 10:42     ` Will Deacon
  2021-08-17 15:41   ` Peter Zijlstra
  2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
  2 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2021-08-17 15:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Rafael J. Wysocki,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Valentin Schneider,
	Mark Rutland, kernel-team

On Fri, Jul 30, 2021 at 12:24:35PM +0100, Will Deacon wrote:
> @@ -2783,20 +2778,173 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
>  
>  	__do_set_cpus_allowed(p, new_mask, flags);
>  
> -	return affine_move_task(rq, p, &rf, dest_cpu, flags);
> +	if (flags & SCA_USER)
> +		release_user_cpus_ptr(p);
> +
> +	return affine_move_task(rq, p, rf, dest_cpu, flags);
>  
>  out:
> -	task_rq_unlock(rq, p, &rf);
> +	task_rq_unlock(rq, p, rf);
>  
>  	return ret;
>  }

> +void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
> +{
> +	unsigned long flags;
> +	struct cpumask *mask = p->user_cpus_ptr;
> +
> +	/*
> +	 * Try to restore the old affinity mask. If this fails, then
> +	 * we free the mask explicitly to avoid it being inherited across
> +	 * a subsequent fork().
> +	 */
> +	if (!mask || !__sched_setaffinity(p, mask))
> +		return;
> +
> +	raw_spin_lock_irqsave(&p->pi_lock, flags);
> +	release_user_cpus_ptr(p);
> +	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +}

Both these are a problem on RT.

The easiest recourse is simply never freeing the CPU mask (except on
exit). The alternative is something like the below I suppose..

I'm leaning towards the former option, wdyt?

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2733,6 +2733,7 @@ static int __set_cpus_allowed_ptr_locked
 	const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
 	const struct cpumask *cpu_valid_mask = cpu_active_mask;
 	bool kthread = p->flags & PF_KTHREAD;
+	struct cpumask *user_mask = NULL;
 	unsigned int dest_cpu;
 	int ret = 0;
 
@@ -2792,9 +2793,13 @@ static int __set_cpus_allowed_ptr_locked
 	__do_set_cpus_allowed(p, new_mask, flags);
 
 	if (flags & SCA_USER)
-		release_user_cpus_ptr(p);
+		swap(user_mask, p->user_cpus_ptr);
+
+	ret = affine_move_task(rq, p, rf, dest_cpu, flags);
+
+	kfree(user_mask);
 
-	return affine_move_task(rq, p, rf, dest_cpu, flags);
+	return ret;
 
 out:
 	task_rq_unlock(rq, p, rf);
@@ -2954,8 +2959,10 @@ void relax_compatible_cpus_allowed_ptr(s
 		return;
 
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	release_user_cpus_ptr(p);
+	p->user_cpus_ptr = NULL;
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+
+	kfree(mask);
 }
 
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 07/16] sched: Split the guts of sched_setaffinity() into a helper function
  2021-07-30 11:24 ` [PATCH v11 07/16] sched: Split the guts of sched_setaffinity() into a helper function Will Deacon
@ 2021-08-17 15:40   ` Peter Zijlstra
  2021-08-18 10:50     ` Will Deacon
  2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2021-08-17 15:40 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Rafael J. Wysocki,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Valentin Schneider,
	Mark Rutland, kernel-team

On Fri, Jul 30, 2021 at 12:24:34PM +0100, Will Deacon wrote:
> In preparation for replaying user affinity requests using a saved mask,
> split sched_setaffinity() up so that the initial task lookup and
> security checks are only performed when the request is coming directly
> from userspace.
> 
> Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
> Signed-off-by: Will Deacon <will@kernel.org>

Should not sched_setaffinity() update user_cpus_ptr when it isn't NULL,
such that the upcoming relax_compatible_cpus_allowed_ptr() preserve the
full user mask?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems
  2021-07-30 11:24 ` [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems Will Deacon
  2021-08-17 15:10   ` Peter Zijlstra
@ 2021-08-17 15:41   ` Peter Zijlstra
  2021-08-18 10:43     ` Will Deacon
  2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
  2 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2021-08-17 15:41 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Rafael J. Wysocki,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Valentin Schneider,
	Mark Rutland, kernel-team

On Fri, Jul 30, 2021 at 12:24:35PM +0100, Will Deacon wrote:
> +	struct rq_flags rf;
> +	struct rq *rq;
> +	int err;
> +	struct cpumask *user_mask = NULL;

> +	cpumask_var_t new_mask;
> +	const struct cpumask *override_mask = task_cpu_possible_mask(p);

> +	unsigned long flags;
> +	struct cpumask *mask = p->user_cpus_ptr;

I've fixed all that up to be proper reverse x-mas trees; similar for
other patches.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems
  2021-08-17 15:10   ` Peter Zijlstra
@ 2021-08-18 10:42     ` Will Deacon
  2021-08-18 10:56       ` Peter Zijlstra
  2021-08-18 11:06       ` Peter Zijlstra
  0 siblings, 2 replies; 38+ messages in thread
From: Will Deacon @ 2021-08-18 10:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Rafael J. Wysocki,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Valentin Schneider,
	Mark Rutland, kernel-team

Hi Peter,

On Tue, Aug 17, 2021 at 05:10:53PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 30, 2021 at 12:24:35PM +0100, Will Deacon wrote:
> > @@ -2783,20 +2778,173 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
> >  
> >  	__do_set_cpus_allowed(p, new_mask, flags);
> >  
> > -	return affine_move_task(rq, p, &rf, dest_cpu, flags);
> > +	if (flags & SCA_USER)
> > +		release_user_cpus_ptr(p);
> > +
> > +	return affine_move_task(rq, p, rf, dest_cpu, flags);
> >  
> >  out:
> > -	task_rq_unlock(rq, p, &rf);
> > +	task_rq_unlock(rq, p, rf);
> >  
> >  	return ret;
> >  }
> 
> > +void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
> > +{
> > +	unsigned long flags;
> > +	struct cpumask *mask = p->user_cpus_ptr;
> > +
> > +	/*
> > +	 * Try to restore the old affinity mask. If this fails, then
> > +	 * we free the mask explicitly to avoid it being inherited across
> > +	 * a subsequent fork().
> > +	 */
> > +	if (!mask || !__sched_setaffinity(p, mask))
> > +		return;
> > +
> > +	raw_spin_lock_irqsave(&p->pi_lock, flags);
> > +	release_user_cpus_ptr(p);
> > +	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> > +}
> 
> Both these are a problem on RT.

Ah, sorry. I didn't realise you couldn't _free_ with a raw lock held in RT.
Is there somewhere I can read up on that?

> The easiest recourse is simply never freeing the CPU mask (except on
> exit). The alternative is something like the below I suppose..
> 
> I'm leaning towards the former option, wdyt?

Defering the freeing until exit feels like a little fiddly, as we still
want to clear ->user_cpus_ptr on affinity changes when SCA_USER is set
so we'd have to keep track of the mask somewhere and reuse it instead
of allocating a new one if we need it later on. Do-able, but feels a bit
nasty, particular across fork().

As for your other suggestion:

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2733,6 +2733,7 @@ static int __set_cpus_allowed_ptr_locked
>  	const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
>  	const struct cpumask *cpu_valid_mask = cpu_active_mask;
>  	bool kthread = p->flags & PF_KTHREAD;
> +	struct cpumask *user_mask = NULL;
>  	unsigned int dest_cpu;
>  	int ret = 0;
>  
> @@ -2792,9 +2793,13 @@ static int __set_cpus_allowed_ptr_locked
>  	__do_set_cpus_allowed(p, new_mask, flags);
>  
>  	if (flags & SCA_USER)
> -		release_user_cpus_ptr(p);
> +		swap(user_mask, p->user_cpus_ptr);
> +
> +	ret = affine_move_task(rq, p, rf, dest_cpu, flags);
> +
> +	kfree(user_mask);
>  
> -	return affine_move_task(rq, p, rf, dest_cpu, flags);
> +	return ret;
>  
>  out:
>  	task_rq_unlock(rq, p, rf);
> @@ -2954,8 +2959,10 @@ void relax_compatible_cpus_allowed_ptr(s
>  		return;
>  
>  	raw_spin_lock_irqsave(&p->pi_lock, flags);
> -	release_user_cpus_ptr(p);
> +	p->user_cpus_ptr = NULL;
>  	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> +
> +	kfree(mask);

I think the idea looks good, but perhaps we could wrap things up a bit:

/* Comment about why this is useful with RT */
static cpumask_t *clear_user_cpus_ptr(struct task_struct *p)
{
	struct cpumask *user_mask = NULL;

	swap(user_mask, p->user_cpus_ptr);
	return user_mask;
}

void release_user_cpus_ptr(struct task_struct *p)
{
	kfree(clear_user_cpus_ptr(p));
}

Then just use clear_user_cpus_ptr() in sched/core.c where we know what
we're doing (well, at least one of us does!).

Will

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems
  2021-08-17 15:41   ` Peter Zijlstra
@ 2021-08-18 10:43     ` Will Deacon
  0 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2021-08-18 10:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Rafael J. Wysocki,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Valentin Schneider,
	Mark Rutland, kernel-team

On Tue, Aug 17, 2021 at 05:41:42PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 30, 2021 at 12:24:35PM +0100, Will Deacon wrote:
> > +	struct rq_flags rf;
> > +	struct rq *rq;
> > +	int err;
> > +	struct cpumask *user_mask = NULL;
> 
> > +	cpumask_var_t new_mask;
> > +	const struct cpumask *override_mask = task_cpu_possible_mask(p);
> 
> > +	unsigned long flags;
> > +	struct cpumask *mask = p->user_cpus_ptr;
> 
> I've fixed all that up to be proper reverse x-mas trees; similar for
> other patches.

Thanks.

Will

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 07/16] sched: Split the guts of sched_setaffinity() into a helper function
  2021-08-17 15:40   ` Peter Zijlstra
@ 2021-08-18 10:50     ` Will Deacon
  2021-08-18 10:56       ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Will Deacon @ 2021-08-18 10:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Rafael J. Wysocki,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Valentin Schneider,
	Mark Rutland, kernel-team

On Tue, Aug 17, 2021 at 05:40:24PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 30, 2021 at 12:24:34PM +0100, Will Deacon wrote:
> > In preparation for replaying user affinity requests using a saved mask,
> > split sched_setaffinity() up so that the initial task lookup and
> > security checks are only performed when the request is coming directly
> > from userspace.
> > 
> > Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> 
> Should not sched_setaffinity() update user_cpus_ptr when it isn't NULL,
> such that the upcoming relax_compatible_cpus_allowed_ptr() preserve the
> full user mask?

The idea is that force_compatible_cpus_allowed_ptr() and
relax_compatible_cpus_allowed_ptr() are used as a pair, with the former
setting ->user_cpus_ptr and the latter restoring it. An intervening call
to sched_setaffinity() must _clear_ the saved mask, as we discussed
before at:

https://lore.kernel.org/r/YK53kDtczHIYumDC@hirez.programming.kicks-ass.net

Will

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 07/16] sched: Split the guts of sched_setaffinity() into a helper function
  2021-08-18 10:50     ` Will Deacon
@ 2021-08-18 10:56       ` Peter Zijlstra
  2021-08-18 11:11         ` Will Deacon
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2021-08-18 10:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Rafael J. Wysocki,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Valentin Schneider,
	Mark Rutland, kernel-team

On Wed, Aug 18, 2021 at 11:50:30AM +0100, Will Deacon wrote:
> On Tue, Aug 17, 2021 at 05:40:24PM +0200, Peter Zijlstra wrote:
> > On Fri, Jul 30, 2021 at 12:24:34PM +0100, Will Deacon wrote:
> > > In preparation for replaying user affinity requests using a saved mask,
> > > split sched_setaffinity() up so that the initial task lookup and
> > > security checks are only performed when the request is coming directly
> > > from userspace.
> > > 
> > > Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > 
> > Should not sched_setaffinity() update user_cpus_ptr when it isn't NULL,
> > such that the upcoming relax_compatible_cpus_allowed_ptr() preserve the
> > full user mask?
> 
> The idea is that force_compatible_cpus_allowed_ptr() and
> relax_compatible_cpus_allowed_ptr() are used as a pair, with the former
> setting ->user_cpus_ptr and the latter restoring it. An intervening call
> to sched_setaffinity() must _clear_ the saved mask, as we discussed
> before at:
> 
> https://lore.kernel.org/r/YK53kDtczHIYumDC@hirez.programming.kicks-ass.net

Clearly that deserves a comment somewhere, because I keep trying to make
it more consistent than it can be :/ I'll see if I can find a spot.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems
  2021-08-18 10:42     ` Will Deacon
@ 2021-08-18 10:56       ` Peter Zijlstra
  2021-08-18 11:53         ` Peter Zijlstra
  2021-08-18 11:06       ` Peter Zijlstra
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2021-08-18 10:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Rafael J. Wysocki,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Valentin Schneider,
	Mark Rutland, kernel-team

On Wed, Aug 18, 2021 at 11:42:28AM +0100, Will Deacon wrote:
> As for your other suggestion:
> 
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2733,6 +2733,7 @@ static int __set_cpus_allowed_ptr_locked
> >  	const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
> >  	const struct cpumask *cpu_valid_mask = cpu_active_mask;
> >  	bool kthread = p->flags & PF_KTHREAD;
> > +	struct cpumask *user_mask = NULL;
> >  	unsigned int dest_cpu;
> >  	int ret = 0;
> >  
> > @@ -2792,9 +2793,13 @@ static int __set_cpus_allowed_ptr_locked
> >  	__do_set_cpus_allowed(p, new_mask, flags);
> >  
> >  	if (flags & SCA_USER)
> > -		release_user_cpus_ptr(p);
> > +		swap(user_mask, p->user_cpus_ptr);
> > +
> > +	ret = affine_move_task(rq, p, rf, dest_cpu, flags);
> > +
> > +	kfree(user_mask);
> >  
> > -	return affine_move_task(rq, p, rf, dest_cpu, flags);
> > +	return ret;
> >  
> >  out:
> >  	task_rq_unlock(rq, p, rf);
> > @@ -2954,8 +2959,10 @@ void relax_compatible_cpus_allowed_ptr(s
> >  		return;
> >  
> >  	raw_spin_lock_irqsave(&p->pi_lock, flags);
> > -	release_user_cpus_ptr(p);
> > +	p->user_cpus_ptr = NULL;
> >  	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> > +
> > +	kfree(mask);
> 
> I think the idea looks good, but perhaps we could wrap things up a bit:
> 
> /* Comment about why this is useful with RT */
> static cpumask_t *clear_user_cpus_ptr(struct task_struct *p)
> {
> 	struct cpumask *user_mask = NULL;
> 
> 	swap(user_mask, p->user_cpus_ptr);
> 	return user_mask;
> }
> 
> void release_user_cpus_ptr(struct task_struct *p)
> {
> 	kfree(clear_user_cpus_ptr(p));
> }
> 
> Then just use clear_user_cpus_ptr() in sched/core.c where we know what
> we're doing (well, at least one of us does!).

OK, I'll go make it like that.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems
  2021-08-18 10:42     ` Will Deacon
  2021-08-18 10:56       ` Peter Zijlstra
@ 2021-08-18 11:06       ` Peter Zijlstra
  1 sibling, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2021-08-18 11:06 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Rafael J. Wysocki,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Valentin Schneider,
	Mark Rutland, kernel-team

On Wed, Aug 18, 2021 at 11:42:28AM +0100, Will Deacon wrote:

> Ah, sorry. I didn't realise you couldn't _free_ with a raw lock held in RT.
> Is there somewhere I can read up on that?

It's because the allocators use spinlock_t, which cannot nest inside
raw_spinlock_t.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 07/16] sched: Split the guts of sched_setaffinity() into a helper function
  2021-08-18 10:56       ` Peter Zijlstra
@ 2021-08-18 11:11         ` Will Deacon
  0 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2021-08-18 11:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Rafael J. Wysocki,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Valentin Schneider,
	Mark Rutland, kernel-team

On Wed, Aug 18, 2021 at 12:56:24PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 18, 2021 at 11:50:30AM +0100, Will Deacon wrote:
> > On Tue, Aug 17, 2021 at 05:40:24PM +0200, Peter Zijlstra wrote:
> > > On Fri, Jul 30, 2021 at 12:24:34PM +0100, Will Deacon wrote:
> > > > In preparation for replaying user affinity requests using a saved mask,
> > > > split sched_setaffinity() up so that the initial task lookup and
> > > > security checks are only performed when the request is coming directly
> > > > from userspace.
> > > > 
> > > > Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
> > > > Signed-off-by: Will Deacon <will@kernel.org>
> > > 
> > > Should not sched_setaffinity() update user_cpus_ptr when it isn't NULL,
> > > such that the upcoming relax_compatible_cpus_allowed_ptr() preserve the
> > > full user mask?
> > 
> > The idea is that force_compatible_cpus_allowed_ptr() and
> > relax_compatible_cpus_allowed_ptr() are used as a pair, with the former
> > setting ->user_cpus_ptr and the latter restoring it. An intervening call
> > to sched_setaffinity() must _clear_ the saved mask, as we discussed
> > before at:
> > 
> > https://lore.kernel.org/r/YK53kDtczHIYumDC@hirez.programming.kicks-ass.net
> 
> Clearly that deserves a comment somewhere, because I keep trying to make
> it more consistent than it can be :/ I'll see if I can find a spot.

Agreed. The relax/force functions are already commented, so maybe alongside
SCA_USER?

Will

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems
  2021-08-18 10:56       ` Peter Zijlstra
@ 2021-08-18 11:53         ` Peter Zijlstra
  2021-08-18 12:19           ` Will Deacon
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2021-08-18 11:53 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Rafael J. Wysocki,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Valentin Schneider,
	Mark Rutland, kernel-team

On Wed, Aug 18, 2021 at 12:56:41PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 18, 2021 at 11:42:28AM +0100, Will Deacon wrote:

> > I think the idea looks good, but perhaps we could wrap things up a bit:
> > 
> > /* Comment about why this is useful with RT */
> > static cpumask_t *clear_user_cpus_ptr(struct task_struct *p)
> > {
> > 	struct cpumask *user_mask = NULL;
> > 
> > 	swap(user_mask, p->user_cpus_ptr);
> > 	return user_mask;
> > }
> > 
> > void release_user_cpus_ptr(struct task_struct *p)
> > {
> > 	kfree(clear_user_cpus_ptr(p));
> > }
> > 
> > Then just use clear_user_cpus_ptr() in sched/core.c where we know what
> > we're doing (well, at least one of us does!).
> 
> OK, I'll go make it like that.

Something like so then?

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2497,10 +2497,18 @@ int dup_user_cpus_ptr(struct task_struct
 	return 0;
 }
 
+static inline struct cpumask *clear_user_cpus_ptr(struct task_struct *p)
+{
+	struct cpumask *user_mask = NULL;
+
+	swap(p->user_cpus_ptr, user_mask);
+
+	return user_mask;
+}
+
 void release_user_cpus_ptr(struct task_struct *p)
 {
-	kfree(p->user_cpus_ptr);
-	p->user_cpus_ptr = NULL;
+	kfree(clear_user_cpus_ptr(p));
 }
 
 /*
@@ -2733,6 +2741,7 @@ static int __set_cpus_allowed_ptr_locked
 	const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
 	const struct cpumask *cpu_valid_mask = cpu_active_mask;
 	bool kthread = p->flags & PF_KTHREAD;
+	struct cpumask *user_mask = NULL;
 	unsigned int dest_cpu;
 	int ret = 0;
 
@@ -2792,9 +2801,13 @@ static int __set_cpus_allowed_ptr_locked
 	__do_set_cpus_allowed(p, new_mask, flags);
 
 	if (flags & SCA_USER)
-		release_user_cpus_ptr(p);
+		user_mask = clear_user_cpus_ptr(p);
 
-	return affine_move_task(rq, p, rf, dest_cpu, flags);
+	ret = affine_move_task(rq, p, rf, dest_cpu, flags);
+
+	kfree(user_mask);
+
+	return ret;
 
 out:
 	task_rq_unlock(rq, p, rf);
@@ -2941,20 +2954,22 @@ __sched_setaffinity(struct task_struct *
  */
 void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
 {
+	struct cpumask *user_mask = p->user_cpus_ptr;
 	unsigned long flags;
-	struct cpumask *mask = p->user_cpus_ptr;
 
 	/*
 	 * Try to restore the old affinity mask. If this fails, then
 	 * we free the mask explicitly to avoid it being inherited across
 	 * a subsequent fork().
 	 */
-	if (!mask || !__sched_setaffinity(p, mask))
+	if (!user_mask || !__sched_setaffinity(p, user_mask))
 		return;
 
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	release_user_cpus_ptr(p);
+	user_mask = clear_user_cpus_ptr(p);
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+
+	kfree(user_mask);
 }
 
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems
  2021-08-18 11:53         ` Peter Zijlstra
@ 2021-08-18 12:19           ` Will Deacon
  0 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2021-08-18 12:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arm-kernel, linux-arch, linux-kernel, Catalin Marinas,
	Marc Zyngier, Greg Kroah-Hartman, Morten Rasmussen, Qais Yousef,
	Suren Baghdasaryan, Quentin Perret, Tejun Heo, Johannes Weiner,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Rafael J. Wysocki,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Valentin Schneider,
	Mark Rutland, kernel-team

On Wed, Aug 18, 2021 at 01:53:28PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 18, 2021 at 12:56:41PM +0200, Peter Zijlstra wrote:
> > On Wed, Aug 18, 2021 at 11:42:28AM +0100, Will Deacon wrote:
> 
> > > I think the idea looks good, but perhaps we could wrap things up a bit:
> > > 
> > > /* Comment about why this is useful with RT */
> > > static cpumask_t *clear_user_cpus_ptr(struct task_struct *p)
> > > {
> > > 	struct cpumask *user_mask = NULL;
> > > 
> > > 	swap(user_mask, p->user_cpus_ptr);
> > > 	return user_mask;
> > > }
> > > 
> > > void release_user_cpus_ptr(struct task_struct *p)
> > > {
> > > 	kfree(clear_user_cpus_ptr(p));
> > > }
> > > 
> > > Then just use clear_user_cpus_ptr() in sched/core.c where we know what
> > > we're doing (well, at least one of us does!).
> > 
> > OK, I'll go make it like that.
> 
> Something like so then?

Looks good to me, thanks!

Will

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip: sched/core] sched: Introduce dl_task_check_affinity() to check proposed affinity
  2021-07-30 11:24 ` [PATCH v11 09/16] sched: Introduce dl_task_check_affinity() to check proposed affinity Will Deacon
@ 2021-08-23  9:26   ` tip-bot2 for Will Deacon
  0 siblings, 0 replies; 38+ messages in thread
From: tip-bot2 for Will Deacon @ 2021-08-23  9:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Will Deacon, Peter Zijlstra (Intel),
	Daniel Bristot de Oliveira, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     234b8ab6476c5edd5262e2ff563de9498d60044a
Gitweb:        https://git.kernel.org/tip/234b8ab6476c5edd5262e2ff563de9498d60044a
Author:        Will Deacon <will@kernel.org>
AuthorDate:    Fri, 30 Jul 2021 12:24:36 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 20 Aug 2021 12:33:00 +02:00

sched: Introduce dl_task_check_affinity() to check proposed affinity

In preparation for restricting the affinity of a task during execve()
on arm64, introduce a new dl_task_check_affinity() helper function to
give an indication as to whether the restricted mask is admissible for
a deadline task.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lore.kernel.org/r/20210730112443.23245-10-will@kernel.org
---
 include/linux/sched.h |  6 +++++-
 kernel/sched/core.c   | 46 ++++++++++++++++++++++++++----------------
 2 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ce2d5cf..3bb9fec 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1709,6 +1709,7 @@ extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
 extern int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, int node);
 extern void release_user_cpus_ptr(struct task_struct *p);
+extern int dl_task_check_affinity(struct task_struct *p, const struct cpumask *mask);
 extern void force_compatible_cpus_allowed_ptr(struct task_struct *p);
 extern void relax_compatible_cpus_allowed_ptr(struct task_struct *p);
 #else
@@ -1731,6 +1732,11 @@ static inline void release_user_cpus_ptr(struct task_struct *p)
 {
 	WARN_ON(p->user_cpus_ptr);
 }
+
+static inline int dl_task_check_affinity(struct task_struct *p, const struct cpumask *mask)
+{
+	return 0;
+}
 #endif
 
 extern int yield_to(struct task_struct *p, bool preempt);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6ee1970..a22cc3c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7756,6 +7756,32 @@ out_unlock:
 	return retval;
 }
 
+#ifdef CONFIG_SMP
+int dl_task_check_affinity(struct task_struct *p, const struct cpumask *mask)
+{
+	int ret = 0;
+
+	/*
+	 * If the task isn't a deadline task or admission control is
+	 * disabled then we don't care about affinity changes.
+	 */
+	if (!task_has_dl_policy(p) || !dl_bandwidth_enabled())
+		return 0;
+
+	/*
+	 * Since bandwidth control happens on root_domain basis,
+	 * if admission test is enabled, we only admit -deadline
+	 * tasks allowed to run on all the CPUs in the task's
+	 * root_domain.
+	 */
+	rcu_read_lock();
+	if (!cpumask_subset(task_rq(p)->rd->span, mask))
+		ret = -EBUSY;
+	rcu_read_unlock();
+	return ret;
+}
+#endif
+
 static int
 __sched_setaffinity(struct task_struct *p, const struct cpumask *mask)
 {
@@ -7773,23 +7799,9 @@ __sched_setaffinity(struct task_struct *p, const struct cpumask *mask)
 	cpuset_cpus_allowed(p, cpus_allowed);
 	cpumask_and(new_mask, mask, cpus_allowed);
 
-	/*
-	 * Since bandwidth control happens on root_domain basis,
-	 * if admission test is enabled, we only admit -deadline
-	 * tasks allowed to run on all the CPUs in the task's
-	 * root_domain.
-	 */
-#ifdef CONFIG_SMP
-	if (task_has_dl_policy(p) && dl_bandwidth_enabled()) {
-		rcu_read_lock();
-		if (!cpumask_subset(task_rq(p)->rd->span, new_mask)) {
-			retval = -EBUSY;
-			rcu_read_unlock();
-			goto out_free_new_mask;
-		}
-		rcu_read_unlock();
-	}
-#endif
+	retval = dl_task_check_affinity(p, new_mask);
+	if (retval)
+		goto out_free_new_mask;
 again:
 	retval = __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK | SCA_USER);
 	if (retval)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip: sched/core] sched: Allow task CPU affinity to be restricted on asymmetric systems
  2021-07-30 11:24 ` [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems Will Deacon
  2021-08-17 15:10   ` Peter Zijlstra
  2021-08-17 15:41   ` Peter Zijlstra
@ 2021-08-23  9:26   ` tip-bot2 for Will Deacon
  2 siblings, 0 replies; 38+ messages in thread
From: tip-bot2 for Will Deacon @ 2021-08-23  9:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Will Deacon, Peter Zijlstra (Intel),
	Valentin Schneider, Quentin Perret, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     07ec77a1d4e82526e1588979fff2f024f8e96df2
Gitweb:        https://git.kernel.org/tip/07ec77a1d4e82526e1588979fff2f024f8e96df2
Author:        Will Deacon <will@kernel.org>
AuthorDate:    Fri, 30 Jul 2021 12:24:35 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 20 Aug 2021 12:33:00 +02:00

sched: Allow task CPU affinity to be restricted on asymmetric systems

Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

Although userspace can carefully manage the affinity masks for such
tasks, one place where it is particularly problematic is execve()
because the CPU on which the execve() is occurring may be incompatible
with the new application image. In such a situation, it is desirable to
restrict the affinity mask of the task and ensure that the new image is
entered on a compatible CPU. From userspace's point of view, this looks
the same as if the incompatible CPUs have been hotplugged off in the
task's affinity mask. Similarly, if a subsequent execve() reverts to
a compatible image, then the old affinity is restored if it is still
valid.

In preparation for restricting the affinity mask for compat tasks on
arm64 systems without uniform support for 32-bit applications, introduce
{force,relax}_compatible_cpus_allowed_ptr(), which respectively restrict
and restore the affinity mask for a task based on the compatible CPUs.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20210730112443.23245-9-will@kernel.org
---
 include/linux/sched.h |   2 +-
 kernel/sched/core.c   | 198 +++++++++++++++++++++++++++++++++++++----
 kernel/sched/sched.h  |   1 +-
 3 files changed, 183 insertions(+), 18 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2c5d638..ce2d5cf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1709,6 +1709,8 @@ extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
 extern int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, int node);
 extern void release_user_cpus_ptr(struct task_struct *p);
+extern void force_compatible_cpus_allowed_ptr(struct task_struct *p);
+extern void relax_compatible_cpus_allowed_ptr(struct task_struct *p);
 #else
 static inline void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 672d0fc..6ee1970 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2494,10 +2494,18 @@ int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
 	return 0;
 }
 
+static inline struct cpumask *clear_user_cpus_ptr(struct task_struct *p)
+{
+	struct cpumask *user_mask = NULL;
+
+	swap(p->user_cpus_ptr, user_mask);
+
+	return user_mask;
+}
+
 void release_user_cpus_ptr(struct task_struct *p)
 {
-	kfree(p->user_cpus_ptr);
-	p->user_cpus_ptr = NULL;
+	kfree(clear_user_cpus_ptr(p));
 }
 
 /*
@@ -2717,27 +2725,23 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 }
 
 /*
- * Change a given task's CPU affinity. Migrate the thread to a
- * proper CPU and schedule it away if the CPU it's executing on
- * is removed from the allowed bitmask.
- *
- * NOTE: the caller must have a valid reference to the task, the
- * task must not exit() & deallocate itself prematurely. The
- * call is not atomic; no spinlocks may be held.
+ * Called with both p->pi_lock and rq->lock held; drops both before returning.
  */
-static int __set_cpus_allowed_ptr(struct task_struct *p,
-				  const struct cpumask *new_mask,
-				  u32 flags)
+static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
+					 const struct cpumask *new_mask,
+					 u32 flags,
+					 struct rq *rq,
+					 struct rq_flags *rf)
+	__releases(rq->lock)
+	__releases(p->pi_lock)
 {
 	const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
 	const struct cpumask *cpu_valid_mask = cpu_active_mask;
 	bool kthread = p->flags & PF_KTHREAD;
+	struct cpumask *user_mask = NULL;
 	unsigned int dest_cpu;
-	struct rq_flags rf;
-	struct rq *rq;
 	int ret = 0;
 
-	rq = task_rq_lock(p, &rf);
 	update_rq_clock(rq);
 
 	if (kthread || is_migration_disabled(p)) {
@@ -2793,20 +2797,178 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 
 	__do_set_cpus_allowed(p, new_mask, flags);
 
-	return affine_move_task(rq, p, &rf, dest_cpu, flags);
+	if (flags & SCA_USER)
+		user_mask = clear_user_cpus_ptr(p);
+
+	ret = affine_move_task(rq, p, rf, dest_cpu, flags);
+
+	kfree(user_mask);
+
+	return ret;
 
 out:
-	task_rq_unlock(rq, p, &rf);
+	task_rq_unlock(rq, p, rf);
 
 	return ret;
 }
 
+/*
+ * Change a given task's CPU affinity. Migrate the thread to a
+ * proper CPU and schedule it away if the CPU it's executing on
+ * is removed from the allowed bitmask.
+ *
+ * NOTE: the caller must have a valid reference to the task, the
+ * task must not exit() & deallocate itself prematurely. The
+ * call is not atomic; no spinlocks may be held.
+ */
+static int __set_cpus_allowed_ptr(struct task_struct *p,
+				  const struct cpumask *new_mask, u32 flags)
+{
+	struct rq_flags rf;
+	struct rq *rq;
+
+	rq = task_rq_lock(p, &rf);
+	return __set_cpus_allowed_ptr_locked(p, new_mask, flags, rq, &rf);
+}
+
 int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 {
 	return __set_cpus_allowed_ptr(p, new_mask, 0);
 }
 EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);
 
+/*
+ * Change a given task's CPU affinity to the intersection of its current
+ * affinity mask and @subset_mask, writing the resulting mask to @new_mask
+ * and pointing @p->user_cpus_ptr to a copy of the old mask.
+ * If the resulting mask is empty, leave the affinity unchanged and return
+ * -EINVAL.
+ */
+static int restrict_cpus_allowed_ptr(struct task_struct *p,
+				     struct cpumask *new_mask,
+				     const struct cpumask *subset_mask)
+{
+	struct cpumask *user_mask = NULL;
+	struct rq_flags rf;
+	struct rq *rq;
+	int err;
+
+	if (!p->user_cpus_ptr) {
+		user_mask = kmalloc(cpumask_size(), GFP_KERNEL);
+		if (!user_mask)
+			return -ENOMEM;
+	}
+
+	rq = task_rq_lock(p, &rf);
+
+	/*
+	 * Forcefully restricting the affinity of a deadline task is
+	 * likely to cause problems, so fail and noisily override the
+	 * mask entirely.
+	 */
+	if (task_has_dl_policy(p) && dl_bandwidth_enabled()) {
+		err = -EPERM;
+		goto err_unlock;
+	}
+
+	if (!cpumask_and(new_mask, &p->cpus_mask, subset_mask)) {
+		err = -EINVAL;
+		goto err_unlock;
+	}
+
+	/*
+	 * We're about to butcher the task affinity, so keep track of what
+	 * the user asked for in case we're able to restore it later on.
+	 */
+	if (user_mask) {
+		cpumask_copy(user_mask, p->cpus_ptr);
+		p->user_cpus_ptr = user_mask;
+	}
+
+	return __set_cpus_allowed_ptr_locked(p, new_mask, 0, rq, &rf);
+
+err_unlock:
+	task_rq_unlock(rq, p, &rf);
+	kfree(user_mask);
+	return err;
+}
+
+/*
+ * Restrict the CPU affinity of task @p so that it is a subset of
+ * task_cpu_possible_mask() and point @p->user_cpu_ptr to a copy of the
+ * old affinity mask. If the resulting mask is empty, we warn and walk
+ * up the cpuset hierarchy until we find a suitable mask.
+ */
+void force_compatible_cpus_allowed_ptr(struct task_struct *p)
+{
+	cpumask_var_t new_mask;
+	const struct cpumask *override_mask = task_cpu_possible_mask(p);
+
+	alloc_cpumask_var(&new_mask, GFP_KERNEL);
+
+	/*
+	 * __migrate_task() can fail silently in the face of concurrent
+	 * offlining of the chosen destination CPU, so take the hotplug
+	 * lock to ensure that the migration succeeds.
+	 */
+	cpus_read_lock();
+	if (!cpumask_available(new_mask))
+		goto out_set_mask;
+
+	if (!restrict_cpus_allowed_ptr(p, new_mask, override_mask))
+		goto out_free_mask;
+
+	/*
+	 * We failed to find a valid subset of the affinity mask for the
+	 * task, so override it based on its cpuset hierarchy.
+	 */
+	cpuset_cpus_allowed(p, new_mask);
+	override_mask = new_mask;
+
+out_set_mask:
+	if (printk_ratelimit()) {
+		printk_deferred("Overriding affinity for process %d (%s) to CPUs %*pbl\n",
+				task_pid_nr(p), p->comm,
+				cpumask_pr_args(override_mask));
+	}
+
+	WARN_ON(set_cpus_allowed_ptr(p, override_mask));
+out_free_mask:
+	cpus_read_unlock();
+	free_cpumask_var(new_mask);
+}
+
+static int
+__sched_setaffinity(struct task_struct *p, const struct cpumask *mask);
+
+/*
+ * Restore the affinity of a task @p which was previously restricted by a
+ * call to force_compatible_cpus_allowed_ptr(). This will clear (and free)
+ * @p->user_cpus_ptr.
+ *
+ * It is the caller's responsibility to serialise this with any calls to
+ * force_compatible_cpus_allowed_ptr(@p).
+ */
+void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
+{
+	struct cpumask *user_mask = p->user_cpus_ptr;
+	unsigned long flags;
+
+	/*
+	 * Try to restore the old affinity mask. If this fails, then
+	 * we free the mask explicitly to avoid it being inherited across
+	 * a subsequent fork().
+	 */
+	if (!user_mask || !__sched_setaffinity(p, user_mask))
+		return;
+
+	raw_spin_lock_irqsave(&p->pi_lock, flags);
+	user_mask = clear_user_cpus_ptr(p);
+	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+
+	kfree(user_mask);
+}
+
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
 #ifdef CONFIG_SCHED_DEBUG
@@ -7629,7 +7791,7 @@ __sched_setaffinity(struct task_struct *p, const struct cpumask *mask)
 	}
 #endif
 again:
-	retval = __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK);
+	retval = __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK | SCA_USER);
 	if (retval)
 		goto out_free_new_mask;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5fa0290..e7e2bba 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2244,6 +2244,7 @@ extern struct task_struct *pick_next_task_idle(struct rq *rq);
 #define SCA_CHECK		0x01
 #define SCA_MIGRATE_DISABLE	0x02
 #define SCA_MIGRATE_ENABLE	0x04
+#define SCA_USER		0x08
 
 #ifdef CONFIG_SMP
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip: sched/core] sched: Split the guts of sched_setaffinity() into a helper function
  2021-07-30 11:24 ` [PATCH v11 07/16] sched: Split the guts of sched_setaffinity() into a helper function Will Deacon
  2021-08-17 15:40   ` Peter Zijlstra
@ 2021-08-23  9:26   ` tip-bot2 for Will Deacon
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot2 for Will Deacon @ 2021-08-23  9:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Will Deacon, Peter Zijlstra (Intel),
	Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     db3b02ae896e88b6bb7a95c1373602e87e0de84c
Gitweb:        https://git.kernel.org/tip/db3b02ae896e88b6bb7a95c1373602e87e0de84c
Author:        Will Deacon <will@kernel.org>
AuthorDate:    Fri, 30 Jul 2021 12:24:34 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 20 Aug 2021 12:33:00 +02:00

sched: Split the guts of sched_setaffinity() into a helper function

In preparation for replaying user affinity requests using a saved mask,
split sched_setaffinity() up so that the initial task lookup and
security checks are only performed when the request is coming directly
from userspace.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Link: https://lore.kernel.org/r/20210730112443.23245-8-will@kernel.org
---
 kernel/sched/core.c | 105 +++++++++++++++++++++++--------------------
 1 file changed, 57 insertions(+), 48 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 360a3ec..672d0fc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7594,53 +7594,22 @@ out_unlock:
 	return retval;
 }
 
-long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
+static int
+__sched_setaffinity(struct task_struct *p, const struct cpumask *mask)
 {
-	cpumask_var_t cpus_allowed, new_mask;
-	struct task_struct *p;
 	int retval;
+	cpumask_var_t cpus_allowed, new_mask;
 
-	rcu_read_lock();
-
-	p = find_process_by_pid(pid);
-	if (!p) {
-		rcu_read_unlock();
-		return -ESRCH;
-	}
-
-	/* Prevent p going away */
-	get_task_struct(p);
-	rcu_read_unlock();
+	if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL))
+		return -ENOMEM;
 
-	if (p->flags & PF_NO_SETAFFINITY) {
-		retval = -EINVAL;
-		goto out_put_task;
-	}
-	if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL)) {
-		retval = -ENOMEM;
-		goto out_put_task;
-	}
 	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL)) {
 		retval = -ENOMEM;
 		goto out_free_cpus_allowed;
 	}
-	retval = -EPERM;
-	if (!check_same_owner(p)) {
-		rcu_read_lock();
-		if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
-			rcu_read_unlock();
-			goto out_free_new_mask;
-		}
-		rcu_read_unlock();
-	}
-
-	retval = security_task_setscheduler(p);
-	if (retval)
-		goto out_free_new_mask;
-
 
 	cpuset_cpus_allowed(p, cpus_allowed);
-	cpumask_and(new_mask, in_mask, cpus_allowed);
+	cpumask_and(new_mask, mask, cpus_allowed);
 
 	/*
 	 * Since bandwidth control happens on root_domain basis,
@@ -7661,23 +7630,63 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 #endif
 again:
 	retval = __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK);
+	if (retval)
+		goto out_free_new_mask;
 
-	if (!retval) {
-		cpuset_cpus_allowed(p, cpus_allowed);
-		if (!cpumask_subset(new_mask, cpus_allowed)) {
-			/*
-			 * We must have raced with a concurrent cpuset
-			 * update. Just reset the cpus_allowed to the
-			 * cpuset's cpus_allowed
-			 */
-			cpumask_copy(new_mask, cpus_allowed);
-			goto again;
-		}
+	cpuset_cpus_allowed(p, cpus_allowed);
+	if (!cpumask_subset(new_mask, cpus_allowed)) {
+		/*
+		 * We must have raced with a concurrent cpuset update.
+		 * Just reset the cpumask to the cpuset's cpus_allowed.
+		 */
+		cpumask_copy(new_mask, cpus_allowed);
+		goto again;
 	}
+
 out_free_new_mask:
 	free_cpumask_var(new_mask);
 out_free_cpus_allowed:
 	free_cpumask_var(cpus_allowed);
+	return retval;
+}
+
+long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
+{
+	struct task_struct *p;
+	int retval;
+
+	rcu_read_lock();
+
+	p = find_process_by_pid(pid);
+	if (!p) {
+		rcu_read_unlock();
+		return -ESRCH;
+	}
+
+	/* Prevent p going away */
+	get_task_struct(p);
+	rcu_read_unlock();
+
+	if (p->flags & PF_NO_SETAFFINITY) {
+		retval = -EINVAL;
+		goto out_put_task;
+	}
+
+	if (!check_same_owner(p)) {
+		rcu_read_lock();
+		if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
+			rcu_read_unlock();
+			retval = -EPERM;
+			goto out_put_task;
+		}
+		rcu_read_unlock();
+	}
+
+	retval = security_task_setscheduler(p);
+	if (retval)
+		goto out_put_task;
+
+	retval = __sched_setaffinity(p, in_mask);
 out_put_task:
 	put_task_struct(p);
 	return retval;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip: sched/core] sched: Introduce task_struct::user_cpus_ptr to track requested affinity
  2021-07-30 11:24 ` [PATCH v11 06/16] sched: Introduce task_struct::user_cpus_ptr to track requested affinity Will Deacon
@ 2021-08-23  9:26   ` tip-bot2 for Will Deacon
  0 siblings, 0 replies; 38+ messages in thread
From: tip-bot2 for Will Deacon @ 2021-08-23  9:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Will Deacon, Peter Zijlstra (Intel),
	Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     b90ca8badbd11488e5f762346b028666808164e7
Gitweb:        https://git.kernel.org/tip/b90ca8badbd11488e5f762346b028666808164e7
Author:        Will Deacon <will@kernel.org>
AuthorDate:    Fri, 30 Jul 2021 12:24:33 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 20 Aug 2021 12:33:00 +02:00

sched: Introduce task_struct::user_cpus_ptr to track requested affinity

In preparation for saving and restoring the user-requested CPU affinity
mask of a task, add a new cpumask_t pointer to 'struct task_struct'.

If the pointer is non-NULL, then the mask is copied across fork() and
freed on task exit.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Link: https://lore.kernel.org/r/20210730112443.23245-7-will@kernel.org
---
 include/linux/sched.h | 13 +++++++++++++
 init/init_task.c      |  1 +
 kernel/fork.c         |  2 ++
 kernel/sched/core.c   | 20 ++++++++++++++++++++
 4 files changed, 36 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 50db949..2c5d638 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -748,6 +748,7 @@ struct task_struct {
 	unsigned int			policy;
 	int				nr_cpus_allowed;
 	const cpumask_t			*cpus_ptr;
+	cpumask_t			*user_cpus_ptr;
 	cpumask_t			cpus_mask;
 	void				*migration_pending;
 #ifdef CONFIG_SMP
@@ -1706,6 +1707,8 @@ extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
+extern int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, int node);
+extern void release_user_cpus_ptr(struct task_struct *p);
 #else
 static inline void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 {
@@ -1716,6 +1719,16 @@ static inline int set_cpus_allowed_ptr(struct task_struct *p, const struct cpuma
 		return -EINVAL;
 	return 0;
 }
+static inline int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, int node)
+{
+	if (src->user_cpus_ptr)
+		return -EINVAL;
+	return 0;
+}
+static inline void release_user_cpus_ptr(struct task_struct *p)
+{
+	WARN_ON(p->user_cpus_ptr);
+}
 #endif
 
 extern int yield_to(struct task_struct *p, bool preempt);
diff --git a/init/init_task.c b/init/init_task.c
index 562f2ef..2d02406 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -80,6 +80,7 @@ struct task_struct init_task
 	.normal_prio	= MAX_PRIO - 20,
 	.policy		= SCHED_NORMAL,
 	.cpus_ptr	= &init_task.cpus_mask,
+	.user_cpus_ptr	= NULL,
 	.cpus_mask	= CPU_MASK_ALL,
 	.nr_cpus_allowed= NR_CPUS,
 	.mm		= NULL,
diff --git a/kernel/fork.c b/kernel/fork.c
index 1a9af73..5d7addf 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -446,6 +446,7 @@ void put_task_stack(struct task_struct *tsk)
 
 void free_task(struct task_struct *tsk)
 {
+	release_user_cpus_ptr(tsk);
 	scs_release(tsk);
 
 #ifndef CONFIG_THREAD_INFO_IN_TASK
@@ -919,6 +920,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 #endif
 	if (orig->cpus_ptr == &orig->cpus_mask)
 		tsk->cpus_ptr = &tsk->cpus_mask;
+	dup_user_cpus_ptr(tsk, orig, node);
 
 	/*
 	 * One for the user space visible state that goes away when reaped.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8cec0d2..360a3ec 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2480,6 +2480,26 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 	__do_set_cpus_allowed(p, new_mask, 0);
 }
 
+int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
+		      int node)
+{
+	if (!src->user_cpus_ptr)
+		return 0;
+
+	dst->user_cpus_ptr = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
+	if (!dst->user_cpus_ptr)
+		return -ENOMEM;
+
+	cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
+	return 0;
+}
+
+void release_user_cpus_ptr(struct task_struct *p)
+{
+	kfree(p->user_cpus_ptr);
+	p->user_cpus_ptr = NULL;
+}
+
 /*
  * This function is wildly self concurrent; here be dragons.
  *

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip: sched/core] sched: Reject CPU affinity changes based on task_cpu_possible_mask()
  2021-07-30 11:24 ` [PATCH v11 05/16] sched: Reject CPU affinity changes based on task_cpu_possible_mask() Will Deacon
@ 2021-08-23  9:26   ` tip-bot2 for Will Deacon
  0 siblings, 0 replies; 38+ messages in thread
From: tip-bot2 for Will Deacon @ 2021-08-23  9:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Will Deacon, Peter Zijlstra (Intel),
	Valentin Schneider, Quentin Perret, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     234a503e670be01f72841be9fcf68dfb89a1fa8b
Gitweb:        https://git.kernel.org/tip/234a503e670be01f72841be9fcf68dfb89a1fa8b
Author:        Will Deacon <will@kernel.org>
AuthorDate:    Fri, 30 Jul 2021 12:24:32 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 20 Aug 2021 12:32:59 +02:00

sched: Reject CPU affinity changes based on task_cpu_possible_mask()

Reject explicit requests to change the affinity mask of a task via
set_cpus_allowed_ptr() if the requested mask is not a subset of the
mask returned by task_cpu_possible_mask(). This ensures that the
'cpus_mask' for a given task cannot contain CPUs which are incapable of
executing it, except in cases where the affinity is forced.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20210730112443.23245-6-will@kernel.org
---
 kernel/sched/core.c |  9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b9d4bae..8cec0d2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2709,7 +2709,9 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 				  const struct cpumask *new_mask,
 				  u32 flags)
 {
+	const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
 	const struct cpumask *cpu_valid_mask = cpu_active_mask;
+	bool kthread = p->flags & PF_KTHREAD;
 	unsigned int dest_cpu;
 	struct rq_flags rf;
 	struct rq *rq;
@@ -2718,7 +2720,7 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 	rq = task_rq_lock(p, &rf);
 	update_rq_clock(rq);
 
-	if (p->flags & PF_KTHREAD || is_migration_disabled(p)) {
+	if (kthread || is_migration_disabled(p)) {
 		/*
 		 * Kernel threads are allowed on online && !active CPUs,
 		 * however, during cpu-hot-unplug, even these might get pushed
@@ -2732,6 +2734,11 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 		cpu_valid_mask = cpu_online_mask;
 	}
 
+	if (!kthread && !cpumask_subset(new_mask, cpu_allowed_mask)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
 	/*
 	 * Must re-check here, to close a race against __kthread_bind(),
 	 * sched_setaffinity() is not guaranteed to observe the flag.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip: sched/core] cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq()
  2021-07-30 11:24 ` [PATCH v11 04/16] cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq() Will Deacon
@ 2021-08-23  9:26   ` tip-bot2 for Will Deacon
  0 siblings, 0 replies; 38+ messages in thread
From: tip-bot2 for Will Deacon @ 2021-08-23  9:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Valentin Schneider, Will Deacon, Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     97c0054dbe2c3c59d1156fd233f2d44e91981c8e
Gitweb:        https://git.kernel.org/tip/97c0054dbe2c3c59d1156fd233f2d44e91981c8e
Author:        Will Deacon <will@kernel.org>
AuthorDate:    Fri, 30 Jul 2021 12:24:31 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 20 Aug 2021 12:32:59 +02:00

cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq()

select_fallback_rq() only needs to recheck for an allowed CPU if the
affinity mask of the task has changed since the last check.

Return a 'bool' from cpuset_cpus_allowed_fallback() to indicate whether
the affinity mask was updated, and use this to elide the allowed check
when the mask has been left alone.

No functional change.

Suggested-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lore.kernel.org/r/20210730112443.23245-5-will@kernel.org
---
 include/linux/cpuset.h |  5 +++--
 kernel/cgroup/cpuset.c | 10 ++++++++--
 kernel/sched/core.c    |  3 +--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 414a8e6..d2b9c41 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -59,7 +59,7 @@ extern void cpuset_wait_for_hotplug(void);
 extern void cpuset_read_lock(void);
 extern void cpuset_read_unlock(void);
 extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
-extern void cpuset_cpus_allowed_fallback(struct task_struct *p);
+extern bool cpuset_cpus_allowed_fallback(struct task_struct *p);
 extern nodemask_t cpuset_mems_allowed(struct task_struct *p);
 #define cpuset_current_mems_allowed (current->mems_allowed)
 void cpuset_init_current_mems_allowed(void);
@@ -188,8 +188,9 @@ static inline void cpuset_cpus_allowed(struct task_struct *p,
 	cpumask_copy(mask, task_cpu_possible_mask(p));
 }
 
-static inline void cpuset_cpus_allowed_fallback(struct task_struct *p)
+static inline bool cpuset_cpus_allowed_fallback(struct task_struct *p)
 {
+	return false;
 }
 
 static inline nodemask_t cpuset_mems_allowed(struct task_struct *p)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 3918132..6500cbe 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3327,17 +3327,22 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
  * which will not contain a sane cpumask during cases such as cpu hotplugging.
  * This is the absolute last resort for the scheduler and it is only used if
  * _every_ other avenue has been traveled.
+ *
+ * Returns true if the affinity of @tsk was changed, false otherwise.
  **/
 
-void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
+bool cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 {
 	const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
 	const struct cpumask *cs_mask;
+	bool changed = false;
 
 	rcu_read_lock();
 	cs_mask = task_cs(tsk)->cpus_allowed;
-	if (is_in_v2_mode() && cpumask_subset(cs_mask, possible_mask))
+	if (is_in_v2_mode() && cpumask_subset(cs_mask, possible_mask)) {
 		do_set_cpus_allowed(tsk, cs_mask);
+		changed = true;
+	}
 	rcu_read_unlock();
 
 	/*
@@ -3357,6 +3362,7 @@ void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 	 * select_fallback_rq() will fix things ups and set cpu_possible_mask
 	 * if required.
 	 */
+	return changed;
 }
 
 void __init cpuset_init_current_mems_allowed(void)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6f31267..b9d4bae 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3141,8 +3141,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 		/* No more Mr. Nice Guy. */
 		switch (state) {
 		case cpuset:
-			if (IS_ENABLED(CONFIG_CPUSETS)) {
-				cpuset_cpus_allowed_fallback(p);
+			if (cpuset_cpus_allowed_fallback(p)) {
 				state = possible;
 				break;
 			}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip: sched/core] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()
  2021-07-30 11:24 ` [PATCH v11 03/16] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() Will Deacon
@ 2021-08-23  9:26   ` tip-bot2 for Will Deacon
  0 siblings, 0 replies; 38+ messages in thread
From: tip-bot2 for Will Deacon @ 2021-08-23  9:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Will Deacon, Peter Zijlstra (Intel),
	Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     431c69fac05baa7477d61a44f2708e069f2bed6c
Gitweb:        https://git.kernel.org/tip/431c69fac05baa7477d61a44f2708e069f2bed6c
Author:        Will Deacon <will@kernel.org>
AuthorDate:    Fri, 30 Jul 2021 12:24:30 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 20 Aug 2021 12:32:59 +02:00

cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()

Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

Modify guarantee_online_cpus() to take task_cpu_possible_mask() into
account when trying to find a suitable set of online CPUs for a given
task. This will avoid passing an invalid mask to set_cpus_allowed_ptr()
during ->attach() and will subsequently allow the cpuset hierarchy to be
taken into account when forcefully overriding the affinity mask for a
task which requires migration to a compatible CPU.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Link: https://lkml.kernel.org/r/20210730112443.23245-4-will@kernel.org
---
 include/linux/cpuset.h |  2 +-
 kernel/cgroup/cpuset.c | 43 ++++++++++++++++++++++++-----------------
 2 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index ed6ec67..414a8e6 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -185,7 +185,7 @@ static inline void cpuset_read_unlock(void) { }
 static inline void cpuset_cpus_allowed(struct task_struct *p,
 				       struct cpumask *mask)
 {
-	cpumask_copy(mask, cpu_possible_mask);
+	cpumask_copy(mask, task_cpu_possible_mask(p));
 }
 
 static inline void cpuset_cpus_allowed_fallback(struct task_struct *p)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a869378..3918132 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -372,18 +372,29 @@ static inline bool is_in_v2_mode(void)
 }
 
 /*
- * Return in pmask the portion of a cpusets's cpus_allowed that
- * are online.  If none are online, walk up the cpuset hierarchy
- * until we find one that does have some online cpus.
+ * Return in pmask the portion of a task's cpusets's cpus_allowed that
+ * are online and are capable of running the task.  If none are found,
+ * walk up the cpuset hierarchy until we find one that does have some
+ * appropriate cpus.
  *
  * One way or another, we guarantee to return some non-empty subset
  * of cpu_online_mask.
  *
  * Call with callback_lock or cpuset_mutex held.
  */
-static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
+static void guarantee_online_cpus(struct task_struct *tsk,
+				  struct cpumask *pmask)
 {
-	while (!cpumask_intersects(cs->effective_cpus, cpu_online_mask)) {
+	const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
+	struct cpuset *cs;
+
+	if (WARN_ON(!cpumask_and(pmask, possible_mask, cpu_online_mask)))
+		cpumask_copy(pmask, cpu_online_mask);
+
+	rcu_read_lock();
+	cs = task_cs(tsk);
+
+	while (!cpumask_intersects(cs->effective_cpus, pmask)) {
 		cs = parent_cs(cs);
 		if (unlikely(!cs)) {
 			/*
@@ -393,11 +404,13 @@ static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
 			 * cpuset's effective_cpus is on its way to be
 			 * identical to cpu_online_mask.
 			 */
-			cpumask_copy(pmask, cpu_online_mask);
-			return;
+			goto out_unlock;
 		}
 	}
-	cpumask_and(pmask, cs->effective_cpus, cpu_online_mask);
+	cpumask_and(pmask, pmask, cs->effective_cpus);
+
+out_unlock:
+	rcu_read_unlock();
 }
 
 /*
@@ -2199,15 +2212,13 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 
 	percpu_down_write(&cpuset_rwsem);
 
-	/* prepare for attach */
-	if (cs == &top_cpuset)
-		cpumask_copy(cpus_attach, cpu_possible_mask);
-	else
-		guarantee_online_cpus(cs, cpus_attach);
-
 	guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
 
 	cgroup_taskset_for_each(task, css, tset) {
+		if (cs != &top_cpuset)
+			guarantee_online_cpus(task, cpus_attach);
+		else
+			cpumask_copy(cpus_attach, task_cpu_possible_mask(task));
 		/*
 		 * can_attach beforehand should guarantee that this doesn't
 		 * fail.  TODO: have a better way to handle failure here
@@ -3302,9 +3313,7 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 	unsigned long flags;
 
 	spin_lock_irqsave(&callback_lock, flags);
-	rcu_read_lock();
-	guarantee_online_cpus(task_cs(tsk), pmask);
-	rcu_read_unlock();
+	guarantee_online_cpus(tsk, pmask);
 	spin_unlock_irqrestore(&callback_lock, flags);
 }
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip: sched/core] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  2021-07-30 11:24 ` [PATCH v11 02/16] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
@ 2021-08-23  9:26   ` tip-bot2 for Will Deacon
  0 siblings, 0 replies; 38+ messages in thread
From: tip-bot2 for Will Deacon @ 2021-08-23  9:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Will Deacon, Peter Zijlstra (Intel),
	Valentin Schneider, Quentin Perret, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     d4b96fb92ae7fe7533e11e662504d96161928575
Gitweb:        https://git.kernel.org/tip/d4b96fb92ae7fe7533e11e662504d96161928575
Author:        Will Deacon <will@kernel.org>
AuthorDate:    Fri, 30 Jul 2021 12:24:29 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 20 Aug 2021 12:32:58 +02:00

cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1

If the scheduler cannot find an allowed CPU for a task,
cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask
if cgroup v1 is in use.

In preparation for allowing architectures to provide their own fallback
mask, just return early if we're either using cgroup v1 or we're using
cgroup v2 with a mask that contains invalid CPUs. This will allow
select_fallback_rq() to figure out the mask by itself.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://lkml.kernel.org/r/20210730112443.23245-3-will@kernel.org
---
 include/linux/cpuset.h | 1 +
 kernel/cgroup/cpuset.c | 8 ++++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 04c20de..ed6ec67 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -15,6 +15,7 @@
 #include <linux/cpumask.h>
 #include <linux/nodemask.h>
 #include <linux/mm.h>
+#include <linux/mmu_context.h>
 #include <linux/jump_label.h>
 
 #ifdef CONFIG_CPUSETS
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index adb5190..a869378 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3322,9 +3322,13 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 
 void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 {
+	const struct cpumask *possible_mask = task_cpu_possible_mask(tsk);
+	const struct cpumask *cs_mask;
+
 	rcu_read_lock();
-	do_set_cpus_allowed(tsk, is_in_v2_mode() ?
-		task_cs(tsk)->cpus_allowed : cpu_possible_mask);
+	cs_mask = task_cs(tsk)->cpus_allowed;
+	if (is_in_v2_mode() && cpumask_subset(cs_mask, possible_mask))
+		do_set_cpus_allowed(tsk, cs_mask);
 	rcu_read_unlock();
 
 	/*

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip: sched/core] sched: Introduce task_cpu_possible_mask() to limit fallback rq selection
  2021-07-30 11:24 ` [PATCH v11 01/16] sched: Introduce task_cpu_possible_mask() to limit fallback rq selection Will Deacon
@ 2021-08-23  9:26   ` tip-bot2 for Will Deacon
  0 siblings, 0 replies; 38+ messages in thread
From: tip-bot2 for Will Deacon @ 2021-08-23  9:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Will Deacon, Peter Zijlstra (Intel),
	Valentin Schneider, Quentin Perret, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     9ae606bc74dd0e58d4de894e3c5cbb9d45599267
Gitweb:        https://git.kernel.org/tip/9ae606bc74dd0e58d4de894e3c5cbb9d45599267
Author:        Will Deacon <will@kernel.org>
AuthorDate:    Fri, 30 Jul 2021 12:24:28 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 20 Aug 2021 12:32:58 +02:00

sched: Introduce task_cpu_possible_mask() to limit fallback rq selection

Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

On such a system, we must take care not to migrate a task to an
unsupported CPU when forcefully moving tasks in select_fallback_rq()
in response to a CPU hot-unplug operation.

Introduce a task_cpu_possible_mask() hook which, given a task argument,
allows an architecture to return a cpumask of CPUs that are capable of
executing that task. The default implementation returns the
cpu_possible_mask, since sane machines do not suffer from per-cpu ISA
limitations that affect scheduling. The new mask is used when selecting
the fallback runqueue as a last resort before forcing a migration to the
first active CPU.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20210730112443.23245-2-will@kernel.org
---
 include/linux/mmu_context.h | 14 ++++++++++++++
 kernel/sched/core.c         |  9 +++------
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/linux/mmu_context.h b/include/linux/mmu_context.h
index 03dee12..b9b970f 100644
--- a/include/linux/mmu_context.h
+++ b/include/linux/mmu_context.h
@@ -14,4 +14,18 @@
 static inline void leave_mm(int cpu) { }
 #endif
 
+/*
+ * CPUs that are capable of running user task @p. Must contain at least one
+ * active CPU. It is assumed that the kernel can run on all CPUs, so calling
+ * this for a kernel thread is pointless.
+ *
+ * By default, we assume a sane, homogeneous system.
+ */
+#ifndef task_cpu_possible_mask
+# define task_cpu_possible_mask(p)	cpu_possible_mask
+# define task_cpu_possible(cpu, p)	true
+#else
+# define task_cpu_possible(cpu, p)	cpumask_test_cpu((cpu), task_cpu_possible_mask(p))
+#endif
+
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7fa6ce7..6f31267 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2173,7 +2173,7 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
 
 	/* Non kernel threads are not allowed during either online or offline. */
 	if (!(p->flags & PF_KTHREAD))
-		return cpu_active(cpu);
+		return cpu_active(cpu) && task_cpu_possible(cpu, p);
 
 	/* KTHREAD_IS_PER_CPU is always allowed. */
 	if (kthread_is_per_cpu(p))
@@ -3124,9 +3124,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 
 		/* Look for allowed, online CPU in same node. */
 		for_each_cpu(dest_cpu, nodemask) {
-			if (!cpu_active(dest_cpu))
-				continue;
-			if (cpumask_test_cpu(dest_cpu, p->cpus_ptr))
+			if (is_cpu_allowed(p, dest_cpu))
 				return dest_cpu;
 		}
 	}
@@ -3156,10 +3154,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
 			 *
 			 * More yuck to audit.
 			 */
-			do_set_cpus_allowed(p, cpu_possible_mask);
+			do_set_cpus_allowed(p, task_cpu_possible_mask(p));
 			state = fail;
 			break;
-
 		case fail:
 			BUG();
 			break;

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2021-08-23  9:26 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 11:24 [PATCH v11 00/16] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
2021-07-30 11:24 ` [PATCH v11 01/16] sched: Introduce task_cpu_possible_mask() to limit fallback rq selection Will Deacon
2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
2021-07-30 11:24 ` [PATCH v11 02/16] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
2021-07-30 11:24 ` [PATCH v11 03/16] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() Will Deacon
2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
2021-07-30 11:24 ` [PATCH v11 04/16] cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq() Will Deacon
2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
2021-07-30 11:24 ` [PATCH v11 05/16] sched: Reject CPU affinity changes based on task_cpu_possible_mask() Will Deacon
2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
2021-07-30 11:24 ` [PATCH v11 06/16] sched: Introduce task_struct::user_cpus_ptr to track requested affinity Will Deacon
2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
2021-07-30 11:24 ` [PATCH v11 07/16] sched: Split the guts of sched_setaffinity() into a helper function Will Deacon
2021-08-17 15:40   ` Peter Zijlstra
2021-08-18 10:50     ` Will Deacon
2021-08-18 10:56       ` Peter Zijlstra
2021-08-18 11:11         ` Will Deacon
2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
2021-07-30 11:24 ` [PATCH v11 08/16] sched: Allow task CPU affinity to be restricted on asymmetric systems Will Deacon
2021-08-17 15:10   ` Peter Zijlstra
2021-08-18 10:42     ` Will Deacon
2021-08-18 10:56       ` Peter Zijlstra
2021-08-18 11:53         ` Peter Zijlstra
2021-08-18 12:19           ` Will Deacon
2021-08-18 11:06       ` Peter Zijlstra
2021-08-17 15:41   ` Peter Zijlstra
2021-08-18 10:43     ` Will Deacon
2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
2021-07-30 11:24 ` [PATCH v11 09/16] sched: Introduce dl_task_check_affinity() to check proposed affinity Will Deacon
2021-08-23  9:26   ` [tip: sched/core] " tip-bot2 for Will Deacon
2021-07-30 11:24 ` [PATCH v11 10/16] arm64: Implement task_cpu_possible_mask() Will Deacon
2021-07-30 11:24 ` [PATCH v11 11/16] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
2021-07-30 11:24 ` [PATCH v11 12/16] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system Will Deacon
2021-07-30 11:24 ` [PATCH v11 13/16] arm64: Advertise CPUs capable of running 32-bit applications in sysfs Will Deacon
2021-07-30 11:24 ` [PATCH v11 14/16] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 Will Deacon
2021-07-30 11:24 ` [PATCH v11 15/16] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores Will Deacon
2021-07-30 11:24 ` [PATCH v11 16/16] Documentation: arm64: describe asymmetric 32-bit support Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).