All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix broken cpuset affinity handling on heterogeneous systems
@ 2023-01-31 22:17 ` Will Deacon
  0 siblings, 0 replies; 57+ messages in thread
From: Will Deacon @ 2023-01-31 22:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Will Deacon, Peter Zijlstra, Waiman Long, Zefan Li,
	Tejun Heo, Johannes Weiner, cgroups

Hi folks,

These two patches fix a couple of CPU affinity issues involving cpusets
on heterogeneous systems. A concrete example of this is running 32-bit
tasks on recent arm64 SoCs, where some of the cores are only capable of
64-bit execution.

The first patch (from Peter) fixes a regression introduced during the
recent merge window which is causing test failures in Android where the
problematic patches have been backported. The second patch fixes a
longer-standing issue, which I noticed while testing fixes for the
initial regression.

Ideally, both of these would land together, but fixing the regression
for 6.2 is my main concern.

Anyway, I don't think either Peter or I would call ourselves cpuset
experts (far from it!), so please have a look.

Cheers,

Will

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org

--->8

Peter Zijlstra (1):
  cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs

Will Deacon (1):
  cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task

 kernel/cgroup/cpuset.c | 57 +++++++++++++++++++++++++++++++++---------
 1 file changed, 45 insertions(+), 12 deletions(-)

-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 0/2] Fix broken cpuset affinity handling on heterogeneous systems
@ 2023-01-31 22:17 ` Will Deacon
  0 siblings, 0 replies; 57+ messages in thread
From: Will Deacon @ 2023-01-31 22:17 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Will Deacon, Peter Zijlstra,
	Waiman Long, Zefan Li, Tejun Heo, Johannes Weiner,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hi folks,

These two patches fix a couple of CPU affinity issues involving cpusets
on heterogeneous systems. A concrete example of this is running 32-bit
tasks on recent arm64 SoCs, where some of the cores are only capable of
64-bit execution.

The first patch (from Peter) fixes a regression introduced during the
recent merge window which is causing test failures in Android where the
problematic patches have been backported. The second patch fixes a
longer-standing issue, which I noticed while testing fixes for the
initial regression.

Ideally, both of these would land together, but fixing the regression
for 6.2 is my main concern.

Anyway, I don't think either Peter or I would call ourselves cpuset
experts (far from it!), so please have a look.

Cheers,

Will

Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Zefan Li <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

--->8

Peter Zijlstra (1):
  cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs

Will Deacon (1):
  cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task

 kernel/cgroup/cpuset.c | 57 +++++++++++++++++++++++++++++++++---------
 1 file changed, 45 insertions(+), 12 deletions(-)

-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-01-31 22:17   ` Will Deacon
  0 siblings, 0 replies; 57+ messages in thread
From: Will Deacon @ 2023-01-31 22:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Will Deacon, Peter Zijlstra, Waiman Long, Zefan Li,
	Tejun Heo, Johannes Weiner, cgroups

From: Peter Zijlstra <peterz@infradead.org>

There is a difference in behaviour between CPUSET={y,n} that is now
wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().

Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
user requested cpumask")  relax_compatible_cpus_allowed_ptr() is
calling __sched_setaffinity() unconditionally.

But the underlying problem goes back a lot further, possibly to
commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
switched cpuset_cpus_allowed() from cs->cpus_allowed to
cs->effective_cpus.

The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
all offline CPUs. For tasks that are part of a (!root) cpuset this is
then later fixed up by the cpuset hotplug notifiers that re-evaluate
and re-apply cs->effective_cpus, but for (normal) tasks in the root
cpuset this does not happen and they will forever after be excluded
from CPUs onlined later.

As such, rewrite cpuset_cpus_allowed() to return a wider mask,
including the offline CPUs.

Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/cgroup/cpuset.c | 39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a29c0b13706b..8552cc2c586a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3683,23 +3683,52 @@ void __init cpuset_init_smp(void)
 	BUG_ON(!cpuset_migrate_mm_wq);
 }
 
+static const struct cpumask *__cs_cpus_allowed(struct cpuset *cs)
+{
+	const struct cpumask *cs_mask = cs->cpus_allowed;
+	if (!parent_cs(cs))
+		cs_mask = cpu_possible_mask;
+	return cs_mask;
+}
+
+static void cs_cpus_allowed(struct cpuset *cs, struct cpumask *pmask)
+{
+	do {
+		cpumask_and(pmask, pmask, __cs_cpus_allowed(cs));
+		cs = parent_cs(cs);
+	} while (cs);
+}
+
 /**
  * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset.
  * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed.
  * @pmask: pointer to struct cpumask variable to receive cpus_allowed set.
  *
- * Description: Returns the cpumask_var_t cpus_allowed of the cpuset
- * attached to the specified @tsk.  Guaranteed to return some non-empty
- * subset of cpu_online_mask, even if this means going outside the
- * tasks cpuset.
+ * Description: Returns the cpumask_var_t cpus_allowed of the cpuset attached
+ * to the specified @tsk.  Guaranteed to return some non-empty intersection
+ * with cpu_online_mask, even if this means going outside the tasks cpuset.
  **/
 
 void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 {
 	unsigned long flags;
+	struct cpuset *cs;
 
 	spin_lock_irqsave(&callback_lock, flags);
-	guarantee_online_cpus(tsk, pmask);
+	rcu_read_lock();
+
+	cs = task_cs(tsk);
+	do {
+		cpumask_copy(pmask, task_cpu_possible_mask(tsk));
+		cs_cpus_allowed(cs, pmask);
+
+		if (cpumask_intersects(pmask, cpu_online_mask))
+			break;
+
+		cs = parent_cs(cs);
+	} while (cs);
+
+	rcu_read_unlock();
 	spin_unlock_irqrestore(&callback_lock, flags);
 }
 
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-01-31 22:17   ` Will Deacon
  0 siblings, 0 replies; 57+ messages in thread
From: Will Deacon @ 2023-01-31 22:17 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Will Deacon, Peter Zijlstra,
	Waiman Long, Zefan Li, Tejun Heo, Johannes Weiner,
	cgroups-u79uwXL29TY76Z2rM5mHXA

From: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>

There is a difference in behaviour between CPUSET={y,n} that is now
wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().

Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
user requested cpumask")  relax_compatible_cpus_allowed_ptr() is
calling __sched_setaffinity() unconditionally.

But the underlying problem goes back a lot further, possibly to
commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
switched cpuset_cpus_allowed() from cs->cpus_allowed to
cs->effective_cpus.

The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
all offline CPUs. For tasks that are part of a (!root) cpuset this is
then later fixed up by the cpuset hotplug notifiers that re-evaluate
and re-apply cs->effective_cpus, but for (normal) tasks in the root
cpuset this does not happen and they will forever after be excluded
from CPUs onlined later.

As such, rewrite cpuset_cpus_allowed() to return a wider mask,
including the offline CPUs.

Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
Reported-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
Signed-off-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup/cpuset.c | 39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a29c0b13706b..8552cc2c586a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3683,23 +3683,52 @@ void __init cpuset_init_smp(void)
 	BUG_ON(!cpuset_migrate_mm_wq);
 }
 
+static const struct cpumask *__cs_cpus_allowed(struct cpuset *cs)
+{
+	const struct cpumask *cs_mask = cs->cpus_allowed;
+	if (!parent_cs(cs))
+		cs_mask = cpu_possible_mask;
+	return cs_mask;
+}
+
+static void cs_cpus_allowed(struct cpuset *cs, struct cpumask *pmask)
+{
+	do {
+		cpumask_and(pmask, pmask, __cs_cpus_allowed(cs));
+		cs = parent_cs(cs);
+	} while (cs);
+}
+
 /**
  * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset.
  * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed.
  * @pmask: pointer to struct cpumask variable to receive cpus_allowed set.
  *
- * Description: Returns the cpumask_var_t cpus_allowed of the cpuset
- * attached to the specified @tsk.  Guaranteed to return some non-empty
- * subset of cpu_online_mask, even if this means going outside the
- * tasks cpuset.
+ * Description: Returns the cpumask_var_t cpus_allowed of the cpuset attached
+ * to the specified @tsk.  Guaranteed to return some non-empty intersection
+ * with cpu_online_mask, even if this means going outside the tasks cpuset.
  **/
 
 void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 {
 	unsigned long flags;
+	struct cpuset *cs;
 
 	spin_lock_irqsave(&callback_lock, flags);
-	guarantee_online_cpus(tsk, pmask);
+	rcu_read_lock();
+
+	cs = task_cs(tsk);
+	do {
+		cpumask_copy(pmask, task_cpu_possible_mask(tsk));
+		cs_cpus_allowed(cs, pmask);
+
+		if (cpumask_intersects(pmask, cpu_online_mask))
+			break;
+
+		cs = parent_cs(cs);
+	} while (cs);
+
+	rcu_read_unlock();
 	spin_unlock_irqrestore(&callback_lock, flags);
 }
 
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
@ 2023-01-31 22:17   ` Will Deacon
  0 siblings, 0 replies; 57+ messages in thread
From: Will Deacon @ 2023-01-31 22:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Will Deacon, Peter Zijlstra, Waiman Long, Zefan Li,
	Tejun Heo, Johannes Weiner, cgroups

set_cpus_allowed_ptr() will fail with -EINVAL if the requested
affinity mask is not a subset of the task_cpu_possible_mask() for the
task being updated. Consequently, on a heterogeneous system with cpusets
spanning the different CPU types, updates to the cgroup hierarchy can
silently fail to update task affinities when the effective affinity
mask for the cpuset is expanded.

For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
the only cores capable of executing 32-bit tasks. Attaching a 32-bit
task to a cpuset containing CPUs 0-2 will correctly affine the task to
CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
the affinity mask of the 32-bit task because update_tasks_cpumask() will
pass the full 0-3 mask to set_cpus_allowed_ptr().

Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
and use it to mask the 'effective_cpus' mask with the possible mask for
each task being updated.

Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
Signed-off-by: Will Deacon <will@kernel.org>
---

Note: We wondered whether it was worth calling guarantee_online_cpus()
if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
this path is only called when the effective mask changes, it didn't
seem appropriate. Ultimately, if you have 32-bit tasks attached to a
cpuset containing only 64-bit cpus, then the affinity is going to be
forced.

 kernel/cgroup/cpuset.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 8552cc2c586a..f15fb0426707 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1205,12 +1205,13 @@ void rebuild_sched_domains(void)
 /**
  * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
  * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
+ * @new_cpus: the temp variable for the new effective_cpus mask
  *
  * Iterate through each task of @cs updating its cpus_allowed to the
  * effective cpuset's.  As this function is called with cpuset_rwsem held,
  * cpuset membership stays stable.
  */
-static void update_tasks_cpumask(struct cpuset *cs)
+static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
 {
 	struct css_task_iter it;
 	struct task_struct *task;
@@ -1224,7 +1225,10 @@ static void update_tasks_cpumask(struct cpuset *cs)
 		if (top_cs && (task->flags & PF_KTHREAD) &&
 		    kthread_is_per_cpu(task))
 			continue;
-		set_cpus_allowed_ptr(task, cs->effective_cpus);
+
+		cpumask_and(new_cpus, cs->effective_cpus,
+			    task_cpu_possible_mask(task));
+		set_cpus_allowed_ptr(task, new_cpus);
 	}
 	css_task_iter_end(&it);
 }
@@ -1509,7 +1513,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
 	spin_unlock_irq(&callback_lock);
 
 	if (adding || deleting)
-		update_tasks_cpumask(parent);
+		update_tasks_cpumask(parent, tmp->new_cpus);
 
 	/*
 	 * Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary.
@@ -1661,7 +1665,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
 		WARN_ON(!is_in_v2_mode() &&
 			!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
 
-		update_tasks_cpumask(cp);
+		update_tasks_cpumask(cp, tmp->new_cpus);
 
 		/*
 		 * On legacy hierarchy, if the effective cpumask of any non-
@@ -2309,7 +2313,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
 		}
 	}
 
-	update_tasks_cpumask(parent);
+	update_tasks_cpumask(parent, tmpmask.new_cpus);
 
 	if (parent->child_ecpus_count)
 		update_sibling_cpumasks(parent, cs, &tmpmask);
@@ -3347,7 +3351,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
 	 * as the tasks will be migrated to an ancestor.
 	 */
 	if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
-		update_tasks_cpumask(cs);
+		update_tasks_cpumask(cs, new_cpus);
 	if (mems_updated && !nodes_empty(cs->mems_allowed))
 		update_tasks_nodemask(cs);
 
@@ -3384,7 +3388,7 @@ hotplug_update_tasks(struct cpuset *cs,
 	spin_unlock_irq(&callback_lock);
 
 	if (cpus_updated)
-		update_tasks_cpumask(cs);
+		update_tasks_cpumask(cs, new_cpus);
 	if (mems_updated)
 		update_tasks_nodemask(cs);
 }
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
@ 2023-01-31 22:17   ` Will Deacon
  0 siblings, 0 replies; 57+ messages in thread
From: Will Deacon @ 2023-01-31 22:17 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Will Deacon, Peter Zijlstra,
	Waiman Long, Zefan Li, Tejun Heo, Johannes Weiner,
	cgroups-u79uwXL29TY76Z2rM5mHXA

set_cpus_allowed_ptr() will fail with -EINVAL if the requested
affinity mask is not a subset of the task_cpu_possible_mask() for the
task being updated. Consequently, on a heterogeneous system with cpusets
spanning the different CPU types, updates to the cgroup hierarchy can
silently fail to update task affinities when the effective affinity
mask for the cpuset is expanded.

For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
the only cores capable of executing 32-bit tasks. Attaching a 32-bit
task to a cpuset containing CPUs 0-2 will correctly affine the task to
CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
the affinity mask of the 32-bit task because update_tasks_cpumask() will
pass the full 0-3 mask to set_cpus_allowed_ptr().

Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
and use it to mask the 'effective_cpus' mask with the possible mask for
each task being updated.

Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
Signed-off-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---

Note: We wondered whether it was worth calling guarantee_online_cpus()
if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
this path is only called when the effective mask changes, it didn't
seem appropriate. Ultimately, if you have 32-bit tasks attached to a
cpuset containing only 64-bit cpus, then the affinity is going to be
forced.

 kernel/cgroup/cpuset.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 8552cc2c586a..f15fb0426707 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1205,12 +1205,13 @@ void rebuild_sched_domains(void)
 /**
  * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
  * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
+ * @new_cpus: the temp variable for the new effective_cpus mask
  *
  * Iterate through each task of @cs updating its cpus_allowed to the
  * effective cpuset's.  As this function is called with cpuset_rwsem held,
  * cpuset membership stays stable.
  */
-static void update_tasks_cpumask(struct cpuset *cs)
+static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
 {
 	struct css_task_iter it;
 	struct task_struct *task;
@@ -1224,7 +1225,10 @@ static void update_tasks_cpumask(struct cpuset *cs)
 		if (top_cs && (task->flags & PF_KTHREAD) &&
 		    kthread_is_per_cpu(task))
 			continue;
-		set_cpus_allowed_ptr(task, cs->effective_cpus);
+
+		cpumask_and(new_cpus, cs->effective_cpus,
+			    task_cpu_possible_mask(task));
+		set_cpus_allowed_ptr(task, new_cpus);
 	}
 	css_task_iter_end(&it);
 }
@@ -1509,7 +1513,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
 	spin_unlock_irq(&callback_lock);
 
 	if (adding || deleting)
-		update_tasks_cpumask(parent);
+		update_tasks_cpumask(parent, tmp->new_cpus);
 
 	/*
 	 * Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary.
@@ -1661,7 +1665,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
 		WARN_ON(!is_in_v2_mode() &&
 			!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
 
-		update_tasks_cpumask(cp);
+		update_tasks_cpumask(cp, tmp->new_cpus);
 
 		/*
 		 * On legacy hierarchy, if the effective cpumask of any non-
@@ -2309,7 +2313,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
 		}
 	}
 
-	update_tasks_cpumask(parent);
+	update_tasks_cpumask(parent, tmpmask.new_cpus);
 
 	if (parent->child_ecpus_count)
 		update_sibling_cpumasks(parent, cs, &tmpmask);
@@ -3347,7 +3351,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
 	 * as the tasks will be migrated to an ancestor.
 	 */
 	if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
-		update_tasks_cpumask(cs);
+		update_tasks_cpumask(cs, new_cpus);
 	if (mems_updated && !nodes_empty(cs->mems_allowed))
 		update_tasks_nodemask(cs);
 
@@ -3384,7 +3388,7 @@ hotplug_update_tasks(struct cpuset *cs,
 	spin_unlock_irq(&callback_lock);
 
 	if (cpus_updated)
-		update_tasks_cpumask(cs);
+		update_tasks_cpumask(cs, new_cpus);
 	if (mems_updated)
 		update_tasks_nodemask(cs);
 }
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
  2023-01-31 22:17   ` Will Deacon
  (?)
@ 2023-02-01  2:22   ` Waiman Long
  2023-02-01  9:15       ` Peter Zijlstra
  -1 siblings, 1 reply; 57+ messages in thread
From: Waiman Long @ 2023-02-01  2:22 UTC (permalink / raw)
  To: Will Deacon, linux-kernel
  Cc: kernel-team, Peter Zijlstra, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On 1/31/23 17:17, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
>
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
>
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
>
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>
> Note: We wondered whether it was worth calling guarantee_online_cpus()
> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> this path is only called when the effective mask changes, it didn't
> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> cpuset containing only 64-bit cpus, then the affinity is going to be
> forced.

Now I see how the sched_setaffinity() change is impacting arm64. Instead 
of putting in the bandage in cpuset. I would suggest doing another cpu 
masking in __set_cpus_allowed_ptr() similar to what is now done for 
user_cpus_ptr.

Another suggestion that I have is to add a helper like 
has_task_cpu_possible_mask() that returns a true/false value. In this 
way, we only need to do an additional masking if we have some mismatched 
32-bit only cpus available in the system running 32-bit binaries so that 
we can skip this step in most cases compiling them out in non-arm64 systems.

By doing that, we may be able to remove some of the existing usages of 
task_cpu_possible_mask().

Thought?

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-01  4:14     ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-01  4:14 UTC (permalink / raw)
  To: Will Deacon, linux-kernel
  Cc: kernel-team, Peter Zijlstra, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On 1/31/23 17:17, Will Deacon wrote:
> From: Peter Zijlstra <peterz@infradead.org>
>
> There is a difference in behaviour between CPUSET={y,n} that is now
> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
>
> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
> user requested cpumask")  relax_compatible_cpus_allowed_ptr() is
> calling __sched_setaffinity() unconditionally.
>
> But the underlying problem goes back a lot further, possibly to
> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
> switched cpuset_cpus_allowed() from cs->cpus_allowed to
> cs->effective_cpus.
>
> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
> all offline CPUs. For tasks that are part of a (!root) cpuset this is
> then later fixed up by the cpuset hotplug notifiers that re-evaluate
> and re-apply cs->effective_cpus, but for (normal) tasks in the root
> cpuset this does not happen and they will forever after be excluded
> from CPUs onlined later.
>
> As such, rewrite cpuset_cpus_allowed() to return a wider mask,
> including the offline CPUs.
>
> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
> Reported-by: Will Deacon <will@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
> Signed-off-by: Will Deacon <will@kernel.org>

Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only 
tracked online cpus and ignored the offline ones. It behaves more like 
effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed 
and effective_cpus. When cpuset v1 is mounted, cpus_allowed and 
effective_cpus are effectively the same and track online cpus. With 
cpuset v2, cpus_allowed contains what the user has written into and it 
won't be changed until another write happen. However, what the user 
written may not be what the system can give it and effective_cpus is 
what the system decides a cpuset can use.

Cpuset v2 is able to handle hotplug correctly and update the task's 
cpumask accordingly. So missing previously offline cpus won't happen 
with v2.

Since v1 keeps the old behavior, previously offlined cpus are lost in 
the cpuset's cpus_allowed. However tasks in the root cpuset will still 
be fine with cpu hotplug as its cpus_allowed should track 
cpu_online_mask. IOW, only tasks in a non-root cpuset suffer this problem.

It was a known issue in v1 and I believe is one of the major reasons of 
the cpuset v2 redesign.

A major concern I have is the overhead of creating a poor man version of 
v2 cpus_allowed. This issue can be worked around even for cpuset v1 if 
it is mounted with the cpuset_v2_mode option to behave more like v2 in 
its cpumask handling. Alternatively we may be able to provide a config 
option to make this the default for v1 without the special mount option, 
if necessary.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-01  4:14     ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-01  4:14 UTC (permalink / raw)
  To: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Peter Zijlstra, Zefan Li,
	Tejun Heo, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On 1/31/23 17:17, Will Deacon wrote:
> From: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
>
> There is a difference in behaviour between CPUSET={y,n} that is now
> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
>
> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
> user requested cpumask")  relax_compatible_cpus_allowed_ptr() is
> calling __sched_setaffinity() unconditionally.
>
> But the underlying problem goes back a lot further, possibly to
> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
> switched cpuset_cpus_allowed() from cs->cpus_allowed to
> cs->effective_cpus.
>
> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
> all offline CPUs. For tasks that are part of a (!root) cpuset this is
> then later fixed up by the cpuset hotplug notifiers that re-evaluate
> and re-apply cs->effective_cpus, but for (normal) tasks in the root
> cpuset this does not happen and they will forever after be excluded
> from CPUs onlined later.
>
> As such, rewrite cpuset_cpus_allowed() to return a wider mask,
> including the offline CPUs.
>
> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
> Reported-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
> Signed-off-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only 
tracked online cpus and ignored the offline ones. It behaves more like 
effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed 
and effective_cpus. When cpuset v1 is mounted, cpus_allowed and 
effective_cpus are effectively the same and track online cpus. With 
cpuset v2, cpus_allowed contains what the user has written into and it 
won't be changed until another write happen. However, what the user 
written may not be what the system can give it and effective_cpus is 
what the system decides a cpuset can use.

Cpuset v2 is able to handle hotplug correctly and update the task's 
cpumask accordingly. So missing previously offline cpus won't happen 
with v2.

Since v1 keeps the old behavior, previously offlined cpus are lost in 
the cpuset's cpus_allowed. However tasks in the root cpuset will still 
be fine with cpu hotplug as its cpus_allowed should track 
cpu_online_mask. IOW, only tasks in a non-root cpuset suffer this problem.

It was a known issue in v1 and I believe is one of the major reasons of 
the cpuset v2 redesign.

A major concern I have is the overhead of creating a poor man version of 
v2 cpus_allowed. This issue can be worked around even for cpuset v1 if 
it is mounted with the cpuset_v2_mode option to behave more like v2 in 
its cpumask handling. Alternatively we may be able to provide a config 
option to make this the default for v1 without the special mount option, 
if necessary.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-01  9:14       ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-01  9:14 UTC (permalink / raw)
  To: Waiman Long
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
> On 1/31/23 17:17, Will Deacon wrote:
> > From: Peter Zijlstra <peterz@infradead.org>
> > 
> > There is a difference in behaviour between CPUSET={y,n} that is now
> > wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
> > 
> > Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
> > user requested cpumask")  relax_compatible_cpus_allowed_ptr() is
> > calling __sched_setaffinity() unconditionally.
> > 
> > But the underlying problem goes back a lot further, possibly to
> > commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
> > switched cpuset_cpus_allowed() from cs->cpus_allowed to
> > cs->effective_cpus.
> > 
> > The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
> > all offline CPUs. For tasks that are part of a (!root) cpuset this is
> > then later fixed up by the cpuset hotplug notifiers that re-evaluate
> > and re-apply cs->effective_cpus, but for (normal) tasks in the root
> > cpuset this does not happen and they will forever after be excluded
> > from CPUs onlined later.
> > 
> > As such, rewrite cpuset_cpus_allowed() to return a wider mask,
> > including the offline CPUs.
> > 
> > Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
> > Reported-by: Will Deacon <will@kernel.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
> > Signed-off-by: Will Deacon <will@kernel.org>
> 
> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
> tracked online cpus and ignored the offline ones. It behaves more like
> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed and
> effective_cpus. When cpuset v1 is mounted, cpus_allowed and effective_cpus
> are effectively the same and track online cpus. With cpuset v2, cpus_allowed
> contains what the user has written into and it won't be changed until
> another write happen. However, what the user written may not be what the
> system can give it and effective_cpus is what the system decides a cpuset
> can use.
> 
> Cpuset v2 is able to handle hotplug correctly and update the task's cpumask
> accordingly. So missing previously offline cpus won't happen with v2.
> 
> Since v1 keeps the old behavior, previously offlined cpus are lost in the
> cpuset's cpus_allowed. However tasks in the root cpuset will still be fine
> with cpu hotplug as its cpus_allowed should track cpu_online_mask. IOW, only
> tasks in a non-root cpuset suffer this problem.
> 
> It was a known issue in v1 and I believe is one of the major reasons of the
> cpuset v2 redesign.
> 
> A major concern I have is the overhead of creating a poor man version of v2
> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
> handling. Alternatively we may be able to provide a config option to make
> this the default for v1 without the special mount option, if necessary.

You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT*
mask offline cpus for root cgroup tasks, ever. (And the only reason it
gets away with masking offline for !root is that it re-applies the mask
every time it changes.)

Yes it did that for a fair while -- but it is wrong and broken and a
very big behavioural difference between CONFIG_CPUSET={y,n}. This must
not be.

Arguably cpuset-v2 is still wrong for masking offline cpus in it's
effective_cpus mask, but I really didn't want to go rewrite cpuset.c for
something that needs to go into /urgent *now*.

Hence this minimal patch that at least lets sched_setaffinity() work as
intended.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-01  9:14       ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-01  9:14 UTC (permalink / raw)
  To: Waiman Long
  Cc: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
> On 1/31/23 17:17, Will Deacon wrote:
> > From: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> > 
> > There is a difference in behaviour between CPUSET={y,n} that is now
> > wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
> > 
> > Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
> > user requested cpumask")  relax_compatible_cpus_allowed_ptr() is
> > calling __sched_setaffinity() unconditionally.
> > 
> > But the underlying problem goes back a lot further, possibly to
> > commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
> > switched cpuset_cpus_allowed() from cs->cpus_allowed to
> > cs->effective_cpus.
> > 
> > The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
> > all offline CPUs. For tasks that are part of a (!root) cpuset this is
> > then later fixed up by the cpuset hotplug notifiers that re-evaluate
> > and re-apply cs->effective_cpus, but for (normal) tasks in the root
> > cpuset this does not happen and they will forever after be excluded
> > from CPUs onlined later.
> > 
> > As such, rewrite cpuset_cpus_allowed() to return a wider mask,
> > including the offline CPUs.
> > 
> > Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
> > Reported-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> > Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
> > Signed-off-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> 
> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
> tracked online cpus and ignored the offline ones. It behaves more like
> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed and
> effective_cpus. When cpuset v1 is mounted, cpus_allowed and effective_cpus
> are effectively the same and track online cpus. With cpuset v2, cpus_allowed
> contains what the user has written into and it won't be changed until
> another write happen. However, what the user written may not be what the
> system can give it and effective_cpus is what the system decides a cpuset
> can use.
> 
> Cpuset v2 is able to handle hotplug correctly and update the task's cpumask
> accordingly. So missing previously offline cpus won't happen with v2.
> 
> Since v1 keeps the old behavior, previously offlined cpus are lost in the
> cpuset's cpus_allowed. However tasks in the root cpuset will still be fine
> with cpu hotplug as its cpus_allowed should track cpu_online_mask. IOW, only
> tasks in a non-root cpuset suffer this problem.
> 
> It was a known issue in v1 and I believe is one of the major reasons of the
> cpuset v2 redesign.
> 
> A major concern I have is the overhead of creating a poor man version of v2
> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
> handling. Alternatively we may be able to provide a config option to make
> this the default for v1 without the special mount option, if necessary.

You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT*
mask offline cpus for root cgroup tasks, ever. (And the only reason it
gets away with masking offline for !root is that it re-applies the mask
every time it changes.)

Yes it did that for a fair while -- but it is wrong and broken and a
very big behavioural difference between CONFIG_CPUSET={y,n}. This must
not be.

Arguably cpuset-v2 is still wrong for masking offline cpus in it's
effective_cpus mask, but I really didn't want to go rewrite cpuset.c for
something that needs to go into /urgent *now*.

Hence this minimal patch that at least lets sched_setaffinity() work as
intended.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
@ 2023-02-01  9:15       ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-01  9:15 UTC (permalink / raw)
  To: Waiman Long
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On Tue, Jan 31, 2023 at 09:22:44PM -0500, Waiman Long wrote:
> On 1/31/23 17:17, Will Deacon wrote:
> > set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> > affinity mask is not a subset of the task_cpu_possible_mask() for the
> > task being updated. Consequently, on a heterogeneous system with cpusets
> > spanning the different CPU types, updates to the cgroup hierarchy can
> > silently fail to update task affinities when the effective affinity
> > mask for the cpuset is expanded.
> > 
> > For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> > the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> > task to a cpuset containing CPUs 0-2 will correctly affine the task to
> > CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> > the affinity mask of the 32-bit task because update_tasks_cpumask() will
> > pass the full 0-3 mask to set_cpus_allowed_ptr().
> > 
> > Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> > and use it to mask the 'effective_cpus' mask with the possible mask for
> > each task being updated.
> > 
> > Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> > 
> > Note: We wondered whether it was worth calling guarantee_online_cpus()
> > if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> > this path is only called when the effective mask changes, it didn't
> > seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> > cpuset containing only 64-bit cpus, then the affinity is going to be
> > forced.
> 
> Now I see how the sched_setaffinity() change is impacting arm64. Instead of
> putting in the bandage in cpuset. I would suggest doing another cpu masking
> in __set_cpus_allowed_ptr() similar to what is now done for user_cpus_ptr.

NO! cpuset is *BROKEN* it has been for a while, it needs to get fixed.

Masking the offline CPUs is *WRONG*.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
@ 2023-02-01  9:15       ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-01  9:15 UTC (permalink / raw)
  To: Waiman Long
  Cc: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, Jan 31, 2023 at 09:22:44PM -0500, Waiman Long wrote:
> On 1/31/23 17:17, Will Deacon wrote:
> > set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> > affinity mask is not a subset of the task_cpu_possible_mask() for the
> > task being updated. Consequently, on a heterogeneous system with cpusets
> > spanning the different CPU types, updates to the cgroup hierarchy can
> > silently fail to update task affinities when the effective affinity
> > mask for the cpuset is expanded.
> > 
> > For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> > the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> > task to a cpuset containing CPUs 0-2 will correctly affine the task to
> > CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> > the affinity mask of the 32-bit task because update_tasks_cpumask() will
> > pass the full 0-3 mask to set_cpus_allowed_ptr().
> > 
> > Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> > and use it to mask the 'effective_cpus' mask with the possible mask for
> > each task being updated.
> > 
> > Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> > Signed-off-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > ---
> > 
> > Note: We wondered whether it was worth calling guarantee_online_cpus()
> > if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> > this path is only called when the effective mask changes, it didn't
> > seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> > cpuset containing only 64-bit cpus, then the affinity is going to be
> > forced.
> 
> Now I see how the sched_setaffinity() change is impacting arm64. Instead of
> putting in the bandage in cpuset. I would suggest doing another cpu masking
> in __set_cpus_allowed_ptr() similar to what is now done for user_cpus_ptr.

NO! cpuset is *BROKEN* it has been for a while, it needs to get fixed.

Masking the offline CPUs is *WRONG*.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
@ 2023-02-01  9:27     ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-01  9:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, kernel-team, Waiman Long, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On Tue, Jan 31, 2023 at 10:17:19PM +0000, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
> 
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
> 
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
> 
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
> 
> Note: We wondered whether it was worth calling guarantee_online_cpus()
> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> this path is only called when the effective mask changes, it didn't
> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> cpuset containing only 64-bit cpus, then the affinity is going to be
> forced.

Right, so the case above where the cpuset is shrunk to 0-1, then the
intersection between the cpuset and task_cpu_possible_mask() is empty
and it currently simply fails to update mask.

I argued it was probably desired to walk up the tree to find a wider
parent until the intersection of {cpuset, task_cpu_possible_mask,
online} becomes non-empty.

But I suppose that can wait until we have more time.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
@ 2023-02-01  9:27     ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-01  9:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Waiman Long, Zefan Li,
	Tejun Heo, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, Jan 31, 2023 at 10:17:19PM +0000, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
> 
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
> 
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
> 
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
> 
> Note: We wondered whether it was worth calling guarantee_online_cpus()
> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> this path is only called when the effective mask changes, it didn't
> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> cpuset containing only 64-bit cpus, then the affinity is going to be
> forced.

Right, so the case above where the cpuset is shrunk to 0-1, then the
intersection between the cpuset and task_cpu_possible_mask() is empty
and it currently simply fails to update mask.

I argued it was probably desired to walk up the tree to find a wider
parent until the intersection of {cpuset, task_cpu_possible_mask,
online} becomes non-empty.

But I suppose that can wait until we have more time.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
  2023-01-31 22:17   ` Will Deacon
  (?)
  (?)
@ 2023-02-01 10:23   ` Hillf Danton
  -1 siblings, 0 replies; 57+ messages in thread
From: Hillf Danton @ 2023-02-01 10:23 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-kernel, linux-mm, Peter Zijlstra, Waiman Long

On Tue, 31 Jan 2023 22:17:18 +0000 Will Deacon <will@kernel.org>
>  
> +static const struct cpumask *__cs_cpus_allowed(struct cpuset *cs)
> +{
> +	const struct cpumask *cs_mask = cs->cpus_allowed;
> +	if (!parent_cs(cs))
> +		cs_mask = cpu_possible_mask;
> +	return cs_mask;
> +}
> +
> +static void cs_cpus_allowed(struct cpuset *cs, struct cpumask *pmask)
> +{
> +	do {
> +		cpumask_and(pmask, pmask, __cs_cpus_allowed(cs));
> +		cs = parent_cs(cs);
> +	} while (cs);
> +}

If it is not typo, why is it needed here to spiral the cs hierarchy up,
given the same spiralup below?

>  void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
>  {
>  	unsigned long flags;
> +	struct cpuset *cs;
>  
>  	spin_lock_irqsave(&callback_lock, flags);
> -	guarantee_online_cpus(tsk, pmask);
> +	rcu_read_lock();
> +
> +	cs = task_cs(tsk);
> +	do {
> +		cpumask_copy(pmask, task_cpu_possible_mask(tsk));
> +		cs_cpus_allowed(cs, pmask);
> +
> +		if (cpumask_intersects(pmask, cpu_online_mask))
> +			break;
> +
> +		cs = parent_cs(cs);
> +	} while (cs);
> +
> +	rcu_read_unlock();
>  	spin_unlock_irqrestore(&callback_lock, flags);
>  }
>  
> -- 
> 2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
  2023-02-01  9:15       ` Peter Zijlstra
  (?)
@ 2023-02-01 15:03       ` Waiman Long
  -1 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-01 15:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On 2/1/23 04:15, Peter Zijlstra wrote:
> On Tue, Jan 31, 2023 at 09:22:44PM -0500, Waiman Long wrote:
>> On 1/31/23 17:17, Will Deacon wrote:
>>> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
>>> affinity mask is not a subset of the task_cpu_possible_mask() for the
>>> task being updated. Consequently, on a heterogeneous system with cpusets
>>> spanning the different CPU types, updates to the cgroup hierarchy can
>>> silently fail to update task affinities when the effective affinity
>>> mask for the cpuset is expanded.
>>>
>>> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
>>> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
>>> task to a cpuset containing CPUs 0-2 will correctly affine the task to
>>> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
>>> the affinity mask of the 32-bit task because update_tasks_cpumask() will
>>> pass the full 0-3 mask to set_cpus_allowed_ptr().
>>>
>>> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
>>> and use it to mask the 'effective_cpus' mask with the possible mask for
>>> each task being updated.
>>>
>>> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
>>> Signed-off-by: Will Deacon <will@kernel.org>
>>> ---
>>>
>>> Note: We wondered whether it was worth calling guarantee_online_cpus()
>>> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
>>> this path is only called when the effective mask changes, it didn't
>>> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
>>> cpuset containing only 64-bit cpus, then the affinity is going to be
>>> forced.
>> Now I see how the sched_setaffinity() change is impacting arm64. Instead of
>> putting in the bandage in cpuset. I would suggest doing another cpu masking
>> in __set_cpus_allowed_ptr() similar to what is now done for user_cpus_ptr.
> NO! cpuset is *BROKEN* it has been for a while, it needs to get fixed.
>
> Masking the offline CPUs is *WRONG*.
>
This patch is not related to offline cpus at all. It is all about the 
32-bit misfit cpus in some arm64 system.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-01 15:16         ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-01 15:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On 2/1/23 04:14, Peter Zijlstra wrote:
> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
>> On 1/31/23 17:17, Will Deacon wrote:
>>> From: Peter Zijlstra <peterz@infradead.org>
>>>
>>> There is a difference in behaviour between CPUSET={y,n} that is now
>>> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
>>>
>>> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
>>> user requested cpumask")  relax_compatible_cpus_allowed_ptr() is
>>> calling __sched_setaffinity() unconditionally.
>>>
>>> But the underlying problem goes back a lot further, possibly to
>>> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
>>> switched cpuset_cpus_allowed() from cs->cpus_allowed to
>>> cs->effective_cpus.
>>>
>>> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
>>> all offline CPUs. For tasks that are part of a (!root) cpuset this is
>>> then later fixed up by the cpuset hotplug notifiers that re-evaluate
>>> and re-apply cs->effective_cpus, but for (normal) tasks in the root
>>> cpuset this does not happen and they will forever after be excluded
>>> from CPUs onlined later.
>>>
>>> As such, rewrite cpuset_cpus_allowed() to return a wider mask,
>>> including the offline CPUs.
>>>
>>> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
>>> Reported-by: Will Deacon <will@kernel.org>
>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
>>> Signed-off-by: Will Deacon <will@kernel.org>
>> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
>> tracked online cpus and ignored the offline ones. It behaves more like
>> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed and
>> effective_cpus. When cpuset v1 is mounted, cpus_allowed and effective_cpus
>> are effectively the same and track online cpus. With cpuset v2, cpus_allowed
>> contains what the user has written into and it won't be changed until
>> another write happen. However, what the user written may not be what the
>> system can give it and effective_cpus is what the system decides a cpuset
>> can use.
>>
>> Cpuset v2 is able to handle hotplug correctly and update the task's cpumask
>> accordingly. So missing previously offline cpus won't happen with v2.
>>
>> Since v1 keeps the old behavior, previously offlined cpus are lost in the
>> cpuset's cpus_allowed. However tasks in the root cpuset will still be fine
>> with cpu hotplug as its cpus_allowed should track cpu_online_mask. IOW, only
>> tasks in a non-root cpuset suffer this problem.
>>
>> It was a known issue in v1 and I believe is one of the major reasons of the
>> cpuset v2 redesign.
>>
>> A major concern I have is the overhead of creating a poor man version of v2
>> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
>> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
>> handling. Alternatively we may be able to provide a config option to make
>> this the default for v1 without the special mount option, if necessary.
> You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT*
> mask offline cpus for root cgroup tasks, ever. (And the only reason it
> gets away with masking offline for !root is that it re-applies the mask
> every time it changes.)
>
> Yes it did that for a fair while -- but it is wrong and broken and a
> very big behavioural difference between CONFIG_CPUSET={y,n}. This must
> not be.
>
> Arguably cpuset-v2 is still wrong for masking offline cpus in it's
> effective_cpus mask, but I really didn't want to go rewrite cpuset.c for
> something that needs to go into /urgent *now*.
>
> Hence this minimal patch that at least lets sched_setaffinity() work as
> intended.

I don't object to the general idea of keeping offline cpus in a task's 
cpu affinity. In the case of cpu offline event, we can skip removing 
that offline cpu from the task's cpu affinity. That will partially solve 
the problem here and is also simpler.

I believe a main reason why effective_cpus holds only online cpus is 
because of the need to detect when there is no online cpus available in 
a given cpuset. In this case, it will fall back to the nearest ancestors 
with online cpus.

This offline cpu problem with cpuset v1 is a known problem for a long 
time. It is not a recent regression.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-01 15:16         ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-01 15:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On 2/1/23 04:14, Peter Zijlstra wrote:
> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
>> On 1/31/23 17:17, Will Deacon wrote:
>>> From: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
>>>
>>> There is a difference in behaviour between CPUSET={y,n} that is now
>>> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
>>>
>>> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
>>> user requested cpumask")  relax_compatible_cpus_allowed_ptr() is
>>> calling __sched_setaffinity() unconditionally.
>>>
>>> But the underlying problem goes back a lot further, possibly to
>>> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
>>> switched cpuset_cpus_allowed() from cs->cpus_allowed to
>>> cs->effective_cpus.
>>>
>>> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
>>> all offline CPUs. For tasks that are part of a (!root) cpuset this is
>>> then later fixed up by the cpuset hotplug notifiers that re-evaluate
>>> and re-apply cs->effective_cpus, but for (normal) tasks in the root
>>> cpuset this does not happen and they will forever after be excluded
>>> from CPUs onlined later.
>>>
>>> As such, rewrite cpuset_cpus_allowed() to return a wider mask,
>>> including the offline CPUs.
>>>
>>> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
>>> Reported-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>> Signed-off-by: Peter Zijlstra (Intel) <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
>>> Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
>>> Signed-off-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
>> tracked online cpus and ignored the offline ones. It behaves more like
>> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed and
>> effective_cpus. When cpuset v1 is mounted, cpus_allowed and effective_cpus
>> are effectively the same and track online cpus. With cpuset v2, cpus_allowed
>> contains what the user has written into and it won't be changed until
>> another write happen. However, what the user written may not be what the
>> system can give it and effective_cpus is what the system decides a cpuset
>> can use.
>>
>> Cpuset v2 is able to handle hotplug correctly and update the task's cpumask
>> accordingly. So missing previously offline cpus won't happen with v2.
>>
>> Since v1 keeps the old behavior, previously offlined cpus are lost in the
>> cpuset's cpus_allowed. However tasks in the root cpuset will still be fine
>> with cpu hotplug as its cpus_allowed should track cpu_online_mask. IOW, only
>> tasks in a non-root cpuset suffer this problem.
>>
>> It was a known issue in v1 and I believe is one of the major reasons of the
>> cpuset v2 redesign.
>>
>> A major concern I have is the overhead of creating a poor man version of v2
>> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
>> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
>> handling. Alternatively we may be able to provide a config option to make
>> this the default for v1 without the special mount option, if necessary.
> You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT*
> mask offline cpus for root cgroup tasks, ever. (And the only reason it
> gets away with masking offline for !root is that it re-applies the mask
> every time it changes.)
>
> Yes it did that for a fair while -- but it is wrong and broken and a
> very big behavioural difference between CONFIG_CPUSET={y,n}. This must
> not be.
>
> Arguably cpuset-v2 is still wrong for masking offline cpus in it's
> effective_cpus mask, but I really didn't want to go rewrite cpuset.c for
> something that needs to go into /urgent *now*.
>
> Hence this minimal patch that at least lets sched_setaffinity() work as
> intended.

I don't object to the general idea of keeping offline cpus in a task's 
cpu affinity. In the case of cpu offline event, we can skip removing 
that offline cpu from the task's cpu affinity. That will partially solve 
the problem here and is also simpler.

I believe a main reason why effective_cpus holds only online cpus is 
because of the need to detect when there is no online cpus available in 
a given cpuset. In this case, it will fall back to the nearest ancestors 
with online cpus.

This offline cpu problem with cpuset v1 is a known problem for a long 
time. It is not a recent regression.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
  2023-02-01 15:16         ` Waiman Long
  (?)
@ 2023-02-01 18:46         ` Waiman Long
  2023-02-01 19:14             ` Waiman Long
  2023-02-01 21:10           ` Peter Zijlstra
  -1 siblings, 2 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-01 18:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On 2/1/23 10:16, Waiman Long wrote:
> On 2/1/23 04:14, Peter Zijlstra wrote:
>> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
>>> On 1/31/23 17:17, Will Deacon wrote:
>>>> From: Peter Zijlstra <peterz@infradead.org>
>>>>
>>>> There is a difference in behaviour between CPUSET={y,n} that is now
>>>> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
>>>>
>>>> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
>>>> user requested cpumask") relax_compatible_cpus_allowed_ptr() is
>>>> calling __sched_setaffinity() unconditionally.
>>>>
>>>> But the underlying problem goes back a lot further, possibly to
>>>> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
>>>> switched cpuset_cpus_allowed() from cs->cpus_allowed to
>>>> cs->effective_cpus.
>>>>
>>>> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
>>>> all offline CPUs. For tasks that are part of a (!root) cpuset this is
>>>> then later fixed up by the cpuset hotplug notifiers that re-evaluate
>>>> and re-apply cs->effective_cpus, but for (normal) tasks in the root
>>>> cpuset this does not happen and they will forever after be excluded
>>>> from CPUs onlined later.
>>>>
>>>> As such, rewrite cpuset_cpus_allowed() to return a wider mask,
>>>> including the offline CPUs.
>>>>
>>>> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested 
>>>> cpumask")
>>>> Reported-by: Will Deacon <will@kernel.org>
>>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>>> Link: 
>>>> https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
>>>> Signed-off-by: Will Deacon <will@kernel.org>
>>> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
>>> tracked online cpus and ignored the offline ones. It behaves more like
>>> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - 
>>> cpus_allowed and
>>> effective_cpus. When cpuset v1 is mounted, cpus_allowed and 
>>> effective_cpus
>>> are effectively the same and track online cpus. With cpuset v2, 
>>> cpus_allowed
>>> contains what the user has written into and it won't be changed until
>>> another write happen. However, what the user written may not be what 
>>> the
>>> system can give it and effective_cpus is what the system decides a 
>>> cpuset
>>> can use.
>>>
>>> Cpuset v2 is able to handle hotplug correctly and update the task's 
>>> cpumask
>>> accordingly. So missing previously offline cpus won't happen with v2.
>>>
>>> Since v1 keeps the old behavior, previously offlined cpus are lost 
>>> in the
>>> cpuset's cpus_allowed. However tasks in the root cpuset will still 
>>> be fine
>>> with cpu hotplug as its cpus_allowed should track cpu_online_mask. 
>>> IOW, only
>>> tasks in a non-root cpuset suffer this problem.
>>>
>>> It was a known issue in v1 and I believe is one of the major reasons 
>>> of the
>>> cpuset v2 redesign.
>>>
>>> A major concern I have is the overhead of creating a poor man 
>>> version of v2
>>> cpus_allowed. This issue can be worked around even for cpuset v1 if 
>>> it is
>>> mounted with the cpuset_v2_mode option to behave more like v2 in its 
>>> cpumask
>>> handling. Alternatively we may be able to provide a config option to 
>>> make
>>> this the default for v1 without the special mount option, if necessary.
>> You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT*
>> mask offline cpus for root cgroup tasks, ever. (And the only reason it
>> gets away with masking offline for !root is that it re-applies the mask
>> every time it changes.)
>>
>> Yes it did that for a fair while -- but it is wrong and broken and a
>> very big behavioural difference between CONFIG_CPUSET={y,n}. This must
>> not be.
>>
>> Arguably cpuset-v2 is still wrong for masking offline cpus in it's
>> effective_cpus mask, but I really didn't want to go rewrite cpuset.c for
>> something that needs to go into /urgent *now*.
>>
>> Hence this minimal patch that at least lets sched_setaffinity() work as
>> intended.
>
> I don't object to the general idea of keeping offline cpus in a task's 
> cpu affinity. In the case of cpu offline event, we can skip removing 
> that offline cpu from the task's cpu affinity. That will partially 
> solve the problem here and is also simpler.
>
> I believe a main reason why effective_cpus holds only online cpus is 
> because of the need to detect when there is no online cpus available 
> in a given cpuset. In this case, it will fall back to the nearest 
> ancestors with online cpus.
>
> This offline cpu problem with cpuset v1 is a known problem for a long 
> time. It is not a recent regression.

Note that using cpus_allowed directly in cgroup v2 may not be right 
because cpus_allowed may have no relationship to effective_cpus at all 
in some cases, e.g.

    root
     |
     V
     A (cpus_allowed = 1-4, effective_cpus = 1-4)
     |
     V
     B (cpus_allowed = 5-8, effective_cpus = 1-4)

In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.

I wonder how often is cpu hotplug happening in those arm64 cpu systems 
that only have a subset of cpus that can run 32-bit programs.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-01 19:14             ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-01 19:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On 2/1/23 13:46, Waiman Long wrote:
> On 2/1/23 10:16, Waiman Long wrote:
>> On 2/1/23 04:14, Peter Zijlstra wrote:
>>> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
>>>> On 1/31/23 17:17, Will Deacon wrote:
>>>>> From: Peter Zijlstra <peterz@infradead.org>
>>>>>
>>>>> There is a difference in behaviour between CPUSET={y,n} that is now
>>>>> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
>>>>>
>>>>> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
>>>>> user requested cpumask") relax_compatible_cpus_allowed_ptr() is
>>>>> calling __sched_setaffinity() unconditionally.
>>>>>
>>>>> But the underlying problem goes back a lot further, possibly to
>>>>> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") 
>>>>> which
>>>>> switched cpuset_cpus_allowed() from cs->cpus_allowed to
>>>>> cs->effective_cpus.
>>>>>
>>>>> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter 
>>>>> out
>>>>> all offline CPUs. For tasks that are part of a (!root) cpuset this is
>>>>> then later fixed up by the cpuset hotplug notifiers that re-evaluate
>>>>> and re-apply cs->effective_cpus, but for (normal) tasks in the root
>>>>> cpuset this does not happen and they will forever after be excluded
>>>>> from CPUs onlined later.
>>>>>
>>>>> As such, rewrite cpuset_cpus_allowed() to return a wider mask,
>>>>> including the offline CPUs.
>>>>>
>>>>> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested 
>>>>> cpumask")
>>>>> Reported-by: Will Deacon <will@kernel.org>
>>>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>>>> Link: 
>>>>> https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
>>>>> Signed-off-by: Will Deacon <will@kernel.org>
>>>> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
>>>> tracked online cpus and ignored the offline ones. It behaves more like
>>>> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - 
>>>> cpus_allowed and
>>>> effective_cpus. When cpuset v1 is mounted, cpus_allowed and 
>>>> effective_cpus
>>>> are effectively the same and track online cpus. With cpuset v2, 
>>>> cpus_allowed
>>>> contains what the user has written into and it won't be changed until
>>>> another write happen. However, what the user written may not be 
>>>> what the
>>>> system can give it and effective_cpus is what the system decides a 
>>>> cpuset
>>>> can use.
>>>>
>>>> Cpuset v2 is able to handle hotplug correctly and update the task's 
>>>> cpumask
>>>> accordingly. So missing previously offline cpus won't happen with v2.
>>>>
>>>> Since v1 keeps the old behavior, previously offlined cpus are lost 
>>>> in the
>>>> cpuset's cpus_allowed. However tasks in the root cpuset will still 
>>>> be fine
>>>> with cpu hotplug as its cpus_allowed should track cpu_online_mask. 
>>>> IOW, only
>>>> tasks in a non-root cpuset suffer this problem.
>>>>
>>>> It was a known issue in v1 and I believe is one of the major 
>>>> reasons of the
>>>> cpuset v2 redesign.
>>>>
>>>> A major concern I have is the overhead of creating a poor man 
>>>> version of v2
>>>> cpus_allowed. This issue can be worked around even for cpuset v1 if 
>>>> it is
>>>> mounted with the cpuset_v2_mode option to behave more like v2 in 
>>>> its cpumask
>>>> handling. Alternatively we may be able to provide a config option 
>>>> to make
>>>> this the default for v1 without the special mount option, if 
>>>> necessary.
>>> You're still not getting it -- even cpuset (be it v1 or v2) *MUST* 
>>> *NOT*
>>> mask offline cpus for root cgroup tasks, ever. (And the only reason it
>>> gets away with masking offline for !root is that it re-applies the mask
>>> every time it changes.)
>>>
>>> Yes it did that for a fair while -- but it is wrong and broken and a
>>> very big behavioural difference between CONFIG_CPUSET={y,n}. This must
>>> not be.
>>>
>>> Arguably cpuset-v2 is still wrong for masking offline cpus in it's
>>> effective_cpus mask, but I really didn't want to go rewrite cpuset.c 
>>> for
>>> something that needs to go into /urgent *now*.
>>>
>>> Hence this minimal patch that at least lets sched_setaffinity() work as
>>> intended.
>>
>> I don't object to the general idea of keeping offline cpus in a 
>> task's cpu affinity. In the case of cpu offline event, we can skip 
>> removing that offline cpu from the task's cpu affinity. That will 
>> partially solve the problem here and is also simpler.
>>
>> I believe a main reason why effective_cpus holds only online cpus is 
>> because of the need to detect when there is no online cpus available 
>> in a given cpuset. In this case, it will fall back to the nearest 
>> ancestors with online cpus.
>>
>> This offline cpu problem with cpuset v1 is a known problem for a long 
>> time. It is not a recent regression.
>
> Note that using cpus_allowed directly in cgroup v2 may not be right 
> because cpus_allowed may have no relationship to effective_cpus at all 
> in some cases, e.g.
>
>    root
>     |
>     V
>     A (cpus_allowed = 1-4, effective_cpus = 1-4)
>     |
>     V
>     B (cpus_allowed = 5-8, effective_cpus = 1-4)
>
> In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is 
> wrong.
>
> I wonder how often is cpu hotplug happening in those arm64 cpu systems 
> that only have a subset of cpus that can run 32-bit programs.

One possible solution is to use cpuset_cpus_allowed_fallback() in case 
none of the cpus in the current cpuset is allowed to be used to run a 
given task.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-01 19:14             ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-01 19:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On 2/1/23 13:46, Waiman Long wrote:
> On 2/1/23 10:16, Waiman Long wrote:
>> On 2/1/23 04:14, Peter Zijlstra wrote:
>>> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
>>>> On 1/31/23 17:17, Will Deacon wrote:
>>>>> From: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
>>>>>
>>>>> There is a difference in behaviour between CPUSET={y,n} that is now
>>>>> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
>>>>>
>>>>> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
>>>>> user requested cpumask") relax_compatible_cpus_allowed_ptr() is
>>>>> calling __sched_setaffinity() unconditionally.
>>>>>
>>>>> But the underlying problem goes back a lot further, possibly to
>>>>> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") 
>>>>> which
>>>>> switched cpuset_cpus_allowed() from cs->cpus_allowed to
>>>>> cs->effective_cpus.
>>>>>
>>>>> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter 
>>>>> out
>>>>> all offline CPUs. For tasks that are part of a (!root) cpuset this is
>>>>> then later fixed up by the cpuset hotplug notifiers that re-evaluate
>>>>> and re-apply cs->effective_cpus, but for (normal) tasks in the root
>>>>> cpuset this does not happen and they will forever after be excluded
>>>>> from CPUs onlined later.
>>>>>
>>>>> As such, rewrite cpuset_cpus_allowed() to return a wider mask,
>>>>> including the offline CPUs.
>>>>>
>>>>> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested 
>>>>> cpumask")
>>>>> Reported-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>>>> Signed-off-by: Peter Zijlstra (Intel) <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
>>>>> Link: 
>>>>> https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
>>>>> Signed-off-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>>> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
>>>> tracked online cpus and ignored the offline ones. It behaves more like
>>>> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - 
>>>> cpus_allowed and
>>>> effective_cpus. When cpuset v1 is mounted, cpus_allowed and 
>>>> effective_cpus
>>>> are effectively the same and track online cpus. With cpuset v2, 
>>>> cpus_allowed
>>>> contains what the user has written into and it won't be changed until
>>>> another write happen. However, what the user written may not be 
>>>> what the
>>>> system can give it and effective_cpus is what the system decides a 
>>>> cpuset
>>>> can use.
>>>>
>>>> Cpuset v2 is able to handle hotplug correctly and update the task's 
>>>> cpumask
>>>> accordingly. So missing previously offline cpus won't happen with v2.
>>>>
>>>> Since v1 keeps the old behavior, previously offlined cpus are lost 
>>>> in the
>>>> cpuset's cpus_allowed. However tasks in the root cpuset will still 
>>>> be fine
>>>> with cpu hotplug as its cpus_allowed should track cpu_online_mask. 
>>>> IOW, only
>>>> tasks in a non-root cpuset suffer this problem.
>>>>
>>>> It was a known issue in v1 and I believe is one of the major 
>>>> reasons of the
>>>> cpuset v2 redesign.
>>>>
>>>> A major concern I have is the overhead of creating a poor man 
>>>> version of v2
>>>> cpus_allowed. This issue can be worked around even for cpuset v1 if 
>>>> it is
>>>> mounted with the cpuset_v2_mode option to behave more like v2 in 
>>>> its cpumask
>>>> handling. Alternatively we may be able to provide a config option 
>>>> to make
>>>> this the default for v1 without the special mount option, if 
>>>> necessary.
>>> You're still not getting it -- even cpuset (be it v1 or v2) *MUST* 
>>> *NOT*
>>> mask offline cpus for root cgroup tasks, ever. (And the only reason it
>>> gets away with masking offline for !root is that it re-applies the mask
>>> every time it changes.)
>>>
>>> Yes it did that for a fair while -- but it is wrong and broken and a
>>> very big behavioural difference between CONFIG_CPUSET={y,n}. This must
>>> not be.
>>>
>>> Arguably cpuset-v2 is still wrong for masking offline cpus in it's
>>> effective_cpus mask, but I really didn't want to go rewrite cpuset.c 
>>> for
>>> something that needs to go into /urgent *now*.
>>>
>>> Hence this minimal patch that at least lets sched_setaffinity() work as
>>> intended.
>>
>> I don't object to the general idea of keeping offline cpus in a 
>> task's cpu affinity. In the case of cpu offline event, we can skip 
>> removing that offline cpu from the task's cpu affinity. That will 
>> partially solve the problem here and is also simpler.
>>
>> I believe a main reason why effective_cpus holds only online cpus is 
>> because of the need to detect when there is no online cpus available 
>> in a given cpuset. In this case, it will fall back to the nearest 
>> ancestors with online cpus.
>>
>> This offline cpu problem with cpuset v1 is a known problem for a long 
>> time. It is not a recent regression.
>
> Note that using cpus_allowed directly in cgroup v2 may not be right 
> because cpus_allowed may have no relationship to effective_cpus at all 
> in some cases, e.g.
>
>    root
>     |
>     V
>     A (cpus_allowed = 1-4, effective_cpus = 1-4)
>     |
>     V
>     B (cpus_allowed = 5-8, effective_cpus = 1-4)
>
> In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is 
> wrong.
>
> I wonder how often is cpu hotplug happening in those arm64 cpu systems 
> that only have a subset of cpus that can run 32-bit programs.

One possible solution is to use cpuset_cpus_allowed_fallback() in case 
none of the cpus in the current cpuset is allowed to be used to run a 
given task.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-01 19:17               ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-01 19:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On 2/1/23 14:14, Waiman Long wrote:
> One possible solution is to use cpuset_cpus_allowed_fallback() in case 
> none of the cpus in the current cpuset is allowed to be used to run a 
> given task.

It looks like we will need to enhance cpuset_cpus_allowed_fallback() to 
walk up cpuset tree to find one that have useable cpus.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-01 19:17               ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-01 19:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On 2/1/23 14:14, Waiman Long wrote:
> One possible solution is to use cpuset_cpus_allowed_fallback() in case 
> none of the cpus in the current cpuset is allowed to be used to run a 
> given task.

It looks like we will need to enhance cpuset_cpus_allowed_fallback() to 
walk up cpuset tree to find one that have useable cpus.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
  2023-02-01 18:46         ` Waiman Long
  2023-02-01 19:14             ` Waiman Long
@ 2023-02-01 21:10           ` Peter Zijlstra
  2023-02-02  3:34             ` Waiman Long
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-01 21:10 UTC (permalink / raw)
  To: Waiman Long
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote:

> Note that using cpus_allowed directly in cgroup v2 may not be right because
> cpus_allowed may have no relationship to effective_cpus at all in some
> cases, e.g.
> 
>    root
>     |
>     V
>     A (cpus_allowed = 1-4, effective_cpus = 1-4)
>     |
>     V
>     B (cpus_allowed = 5-8, effective_cpus = 1-4)
> 
> In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.

I think my patch as written does the right thing here. Since the
intersection of (1-4) and (5-8) is empty it will move up the hierarchy
and we'll end up with (1-4) from the cgroup side of things.

So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of
the root set and force it to cpu_possible_mask.

Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and
all it's parents. This will, in the case of B above, result in the empty
mask.

Then cpuset_cpus_allowed() has a loop that starts with
task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if
the intersection of that and cpu_online_mask is empty, moves up the
hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A.

Note that since we force the mask of root to cpu_possible_mask,
cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code)
that cpu_online_mask always has a non-empty intersection with
task_cpu_possible_mask(), this loop is guaranteed to terminate with a
viable mask.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
  2023-02-01 21:10           ` Peter Zijlstra
@ 2023-02-02  3:34             ` Waiman Long
  2023-02-03 11:50                 ` Will Deacon
  0 siblings, 1 reply; 57+ messages in thread
From: Waiman Long @ 2023-02-02  3:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On 2/1/23 16:10, Peter Zijlstra wrote:
> On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote:
>
>> Note that using cpus_allowed directly in cgroup v2 may not be right because
>> cpus_allowed may have no relationship to effective_cpus at all in some
>> cases, e.g.
>>
>>     root
>>      |
>>      V
>>      A (cpus_allowed = 1-4, effective_cpus = 1-4)
>>      |
>>      V
>>      B (cpus_allowed = 5-8, effective_cpus = 1-4)
>>
>> In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.
> I think my patch as written does the right thing here. Since the
> intersection of (1-4) and (5-8) is empty it will move up the hierarchy
> and we'll end up with (1-4) from the cgroup side of things.
>
> So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of
> the root set and force it to cpu_possible_mask.
>
> Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and
> all it's parents. This will, in the case of B above, result in the empty
> mask.
>
> Then cpuset_cpus_allowed() has a loop that starts with
> task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if
> the intersection of that and cpu_online_mask is empty, moves up the
> hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A.
>
> Note that since we force the mask of root to cpu_possible_mask,
> cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code)
> that cpu_online_mask always has a non-empty intersection with
> task_cpu_possible_mask(), this loop is guaranteed to terminate with a
> viable mask.

I will take a closer look at that tomorrow. I will be more comfortable 
ack'ing that if this is specific to v1 cpuset instead of applying this 
in both v1 and v2 since it is only v1 that is problematic.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02  8:34       ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-02  8:34 UTC (permalink / raw)
  To: Waiman Long
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:

> A major concern I have is the overhead of creating a poor man version of v2
> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
> handling. Alternatively we may be able to provide a config option to make
> this the default for v1 without the special mount option, if necessary.

It is equally broken for v2, it masks against effective_cpus. Not to
mention it explicitly starts with cpu_online_mask.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02  8:34       ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-02  8:34 UTC (permalink / raw)
  To: Waiman Long
  Cc: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:

> A major concern I have is the overhead of creating a poor man version of v2
> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
> handling. Alternatively we may be able to provide a config option to make
> this the default for v1 without the special mount option, if necessary.

It is equally broken for v2, it masks against effective_cpus. Not to
mention it explicitly starts with cpu_online_mask.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02 16:06         ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-02 16:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On 2/2/23 03:34, Peter Zijlstra wrote:
> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
>
>> A major concern I have is the overhead of creating a poor man version of v2
>> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
>> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
>> handling. Alternatively we may be able to provide a config option to make
>> this the default for v1 without the special mount option, if necessary.
> It is equally broken for v2, it masks against effective_cpus. Not to
> mention it explicitly starts with cpu_online_mask.

After taking a close look at the patch, my understanding of what it is 
doing is as follows:

v2: cpus_allowed will not be affected by hotplug. So the new 
cpuset_cpus_allowed() will return effective_cpus + offline cpus that 
should have been part of effective_cpus if online before masking it with 
allowable cpus and then go up the cpuset hierarchy if necessary.

v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the 
current cpuset and move up the hierarchy if necessary to find a cpuset 
that have at least one allowable cpu.

First of all, it does not take into account of the v2 partition feature 
that may cause it to produce incorrect result if partition is enabled 
somewhere. Secondly, I don't see any benefit other than having some 
additional offline cpu available in a task's cpumask which the scheduler 
will ignore anyway. v2 is able to recover a previously offlined cpu. So 
we don't gain any net benefit other than the going up the cpuset 
hierarchy part.

For v1, I agree we should go up the cpuset hierarchy to find a usable 
cpuset. Instead of introducing such a complexity in 
cpuset_cpus_allowed(), my current preference is to do the hierarchy 
climbing part in an enhanced cpuset_cpus_allowed_fallback() after an 
initial failure of cpuset_cpus_allowed(). That will be easier to 
understand than having such complexity and overhead in 
cpuset_cpus_allowed() alone.

I will work on a patchset to do that as a counter offer.

Cheers,
Longman




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02 16:06         ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-02 16:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On 2/2/23 03:34, Peter Zijlstra wrote:
> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
>
>> A major concern I have is the overhead of creating a poor man version of v2
>> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
>> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
>> handling. Alternatively we may be able to provide a config option to make
>> this the default for v1 without the special mount option, if necessary.
> It is equally broken for v2, it masks against effective_cpus. Not to
> mention it explicitly starts with cpu_online_mask.

After taking a close look at the patch, my understanding of what it is 
doing is as follows:

v2: cpus_allowed will not be affected by hotplug. So the new 
cpuset_cpus_allowed() will return effective_cpus + offline cpus that 
should have been part of effective_cpus if online before masking it with 
allowable cpus and then go up the cpuset hierarchy if necessary.

v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the 
current cpuset and move up the hierarchy if necessary to find a cpuset 
that have at least one allowable cpu.

First of all, it does not take into account of the v2 partition feature 
that may cause it to produce incorrect result if partition is enabled 
somewhere. Secondly, I don't see any benefit other than having some 
additional offline cpu available in a task's cpumask which the scheduler 
will ignore anyway. v2 is able to recover a previously offlined cpu. So 
we don't gain any net benefit other than the going up the cpuset 
hierarchy part.

For v1, I agree we should go up the cpuset hierarchy to find a usable 
cpuset. Instead of introducing such a complexity in 
cpuset_cpus_allowed(), my current preference is to do the hierarchy 
climbing part in an enhanced cpuset_cpus_allowed_fallback() after an 
initial failure of cpuset_cpus_allowed(). That will be easier to 
understand than having such complexity and overhead in 
cpuset_cpus_allowed() alone.

I will work on a patchset to do that as a counter offer.

Cheers,
Longman




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02 19:42           ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-02 19:42 UTC (permalink / raw)
  To: Waiman Long
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On Thu, Feb 02, 2023 at 11:06:51AM -0500, Waiman Long wrote:

> After taking a close look at the patch, my understanding of what it is doing
> is as follows:
> 
> v2: cpus_allowed will not be affected by hotplug. So the new
> cpuset_cpus_allowed() will return effective_cpus + offline cpus that should
> have been part of effective_cpus if online before masking it with allowable
> cpus and then go up the cpuset hierarchy if necessary.
> 
> v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the
> current cpuset and move up the hierarchy if necessary to find a cpuset that
> have at least one allowable cpu.
> 
> First of all, it does not take into account of the v2 partition feature that
> may cause it to produce incorrect result if partition is enabled somewhere.

How so? For a partition the cpus_allowed mask should be the parition
CPUs. The only magical bit about partitions is that any one CPU cannot
belong to two partitions and load-balancing is split.

> Secondly, I don't see any benefit other than having some additional offline
> cpu available in a task's cpumask which the scheduler will ignore anyway.

Those CPUs can come online again -- you're *again* dismissing the true
bug :/

If you filter out the offline CPUs at sched_setaffinity() time, you
forever lose those CPUs, the task will never again move to those CPUs,
even if they do come online after.

It is really simple to reproduce this:

 - boot machine
 - offline all CPUs except one
 - taskset -p ffffffff $$
 - online all CPUs

and observe your shell (and all its decendants) being stuck to the one
CPU. Do the same thing on a CPUSET=n build and note the difference (you
retain the full mask).

> v2 is able to recover a previously offlined cpu. So we don't gain any
> net benefit other than the going up the cpuset hierarchy part.

Only for !root tasks. Not even v2 will re-set the affinity of root tasks
afaict.

> For v1, I agree we should go up the cpuset hierarchy to find a usable
> cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(),
> my current preference is to do the hierarchy climbing part in an enhanced
> cpuset_cpus_allowed_fallback() after an initial failure of
> cpuset_cpus_allowed(). That will be easier to understand than having such
> complexity and overhead in cpuset_cpus_allowed() alone.
> 
> I will work on a patchset to do that as a counter offer.

We will need a small and simple patch for /urgent, or I will need to
revert all your patches -- your call.

I also don't tihnk you fully appreciate the ramifications of
task_cpu_possible_mask(), cpuset currently gets that quite wrong.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02 19:42           ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-02 19:42 UTC (permalink / raw)
  To: Waiman Long
  Cc: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu, Feb 02, 2023 at 11:06:51AM -0500, Waiman Long wrote:

> After taking a close look at the patch, my understanding of what it is doing
> is as follows:
> 
> v2: cpus_allowed will not be affected by hotplug. So the new
> cpuset_cpus_allowed() will return effective_cpus + offline cpus that should
> have been part of effective_cpus if online before masking it with allowable
> cpus and then go up the cpuset hierarchy if necessary.
> 
> v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the
> current cpuset and move up the hierarchy if necessary to find a cpuset that
> have at least one allowable cpu.
> 
> First of all, it does not take into account of the v2 partition feature that
> may cause it to produce incorrect result if partition is enabled somewhere.

How so? For a partition the cpus_allowed mask should be the parition
CPUs. The only magical bit about partitions is that any one CPU cannot
belong to two partitions and load-balancing is split.

> Secondly, I don't see any benefit other than having some additional offline
> cpu available in a task's cpumask which the scheduler will ignore anyway.

Those CPUs can come online again -- you're *again* dismissing the true
bug :/

If you filter out the offline CPUs at sched_setaffinity() time, you
forever lose those CPUs, the task will never again move to those CPUs,
even if they do come online after.

It is really simple to reproduce this:

 - boot machine
 - offline all CPUs except one
 - taskset -p ffffffff $$
 - online all CPUs

and observe your shell (and all its decendants) being stuck to the one
CPU. Do the same thing on a CPUSET=n build and note the difference (you
retain the full mask).

> v2 is able to recover a previously offlined cpu. So we don't gain any
> net benefit other than the going up the cpuset hierarchy part.

Only for !root tasks. Not even v2 will re-set the affinity of root tasks
afaict.

> For v1, I agree we should go up the cpuset hierarchy to find a usable
> cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(),
> my current preference is to do the hierarchy climbing part in an enhanced
> cpuset_cpus_allowed_fallback() after an initial failure of
> cpuset_cpus_allowed(). That will be easier to understand than having such
> complexity and overhead in cpuset_cpus_allowed() alone.
> 
> I will work on a patchset to do that as a counter offer.

We will need a small and simple patch for /urgent, or I will need to
revert all your patches -- your call.

I also don't tihnk you fully appreciate the ramifications of
task_cpu_possible_mask(), cpuset currently gets that quite wrong.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02 20:46             ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-02 20:46 UTC (permalink / raw)
  To: Peter Zijlstra, Tejun Heo
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li,
	Johannes Weiner, cgroups

On 2/2/23 14:42, Peter Zijlstra wrote:
> On Thu, Feb 02, 2023 at 11:06:51AM -0500, Waiman Long wrote:
>
>> After taking a close look at the patch, my understanding of what it is doing
>> is as follows:
>>
>> v2: cpus_allowed will not be affected by hotplug. So the new
>> cpuset_cpus_allowed() will return effective_cpus + offline cpus that should
>> have been part of effective_cpus if online before masking it with allowable
>> cpus and then go up the cpuset hierarchy if necessary.
>>
>> v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the
>> current cpuset and move up the hierarchy if necessary to find a cpuset that
>> have at least one allowable cpu.
>>
>> First of all, it does not take into account of the v2 partition feature that
>> may cause it to produce incorrect result if partition is enabled somewhere.
> How so? For a partition the cpus_allowed mask should be the parition
> CPUs. The only magical bit about partitions is that any one CPU cannot
> belong to two partitions and load-balancing is split.
There can be a child partition underneath it that uses up some of the 
cpus in cpus_allowed mask. So if you cascading up the cpuset tree from 
another child, your code will wrongly include those cpus that are 
dedicated to the other child partition.
>
>> Secondly, I don't see any benefit other than having some additional offline
>> cpu available in a task's cpumask which the scheduler will ignore anyway.
> Those CPUs can come online again -- you're *again* dismissing the true
> bug :/
>
> If you filter out the offline CPUs at sched_setaffinity() time, you
> forever lose those CPUs, the task will never again move to those CPUs,
> even if they do come online after.
>
> It is really simple to reproduce this:
>
>   - boot machine
>   - offline all CPUs except one
>   - taskset -p ffffffff $$
>   - online all CPUs
>
> and observe your shell (and all its decendants) being stuck to the one
> CPU. Do the same thing on a CPUSET=n build and note the difference (you
> retain the full mask).

With the new sched_setaffinity(), the original mask should be stored in 
user_cpus_ptr. The cpuset code is supposed to call 
set_cpus_allowed_ptr() which then set the mask correctly. I will run 
your test and figure out why it does not work as I would have expected.


>
>> v2 is able to recover a previously offlined cpu. So we don't gain any
>> net benefit other than the going up the cpuset hierarchy part.
> Only for !root tasks. Not even v2 will re-set the affinity of root tasks
> afaict.
>
>> For v1, I agree we should go up the cpuset hierarchy to find a usable
>> cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(),
>> my current preference is to do the hierarchy climbing part in an enhanced
>> cpuset_cpus_allowed_fallback() after an initial failure of
>> cpuset_cpus_allowed(). That will be easier to understand than having such
>> complexity and overhead in cpuset_cpus_allowed() alone.
>>
>> I will work on a patchset to do that as a counter offer.
> We will need a small and simple patch for /urgent, or I will need to
> revert all your patches -- your call.
>
> I also don't tihnk you fully appreciate the ramifications of
> task_cpu_possible_mask(), cpuset currently gets that quite wrong.

OK, I don't realize the urgency of that. If it is that urgent, I will 
have no objection to get it in for now. We can improve it later on. So 
are you planning to get it into the current 6.2 rc or 6.3?

Tejun, are you OK with that as you are the cgroup maintainer?

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02 20:46             ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-02 20:46 UTC (permalink / raw)
  To: Peter Zijlstra, Tejun Heo
  Cc: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Johannes Weiner,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 2/2/23 14:42, Peter Zijlstra wrote:
> On Thu, Feb 02, 2023 at 11:06:51AM -0500, Waiman Long wrote:
>
>> After taking a close look at the patch, my understanding of what it is doing
>> is as follows:
>>
>> v2: cpus_allowed will not be affected by hotplug. So the new
>> cpuset_cpus_allowed() will return effective_cpus + offline cpus that should
>> have been part of effective_cpus if online before masking it with allowable
>> cpus and then go up the cpuset hierarchy if necessary.
>>
>> v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the
>> current cpuset and move up the hierarchy if necessary to find a cpuset that
>> have at least one allowable cpu.
>>
>> First of all, it does not take into account of the v2 partition feature that
>> may cause it to produce incorrect result if partition is enabled somewhere.
> How so? For a partition the cpus_allowed mask should be the parition
> CPUs. The only magical bit about partitions is that any one CPU cannot
> belong to two partitions and load-balancing is split.
There can be a child partition underneath it that uses up some of the 
cpus in cpus_allowed mask. So if you cascading up the cpuset tree from 
another child, your code will wrongly include those cpus that are 
dedicated to the other child partition.
>
>> Secondly, I don't see any benefit other than having some additional offline
>> cpu available in a task's cpumask which the scheduler will ignore anyway.
> Those CPUs can come online again -- you're *again* dismissing the true
> bug :/
>
> If you filter out the offline CPUs at sched_setaffinity() time, you
> forever lose those CPUs, the task will never again move to those CPUs,
> even if they do come online after.
>
> It is really simple to reproduce this:
>
>   - boot machine
>   - offline all CPUs except one
>   - taskset -p ffffffff $$
>   - online all CPUs
>
> and observe your shell (and all its decendants) being stuck to the one
> CPU. Do the same thing on a CPUSET=n build and note the difference (you
> retain the full mask).

With the new sched_setaffinity(), the original mask should be stored in 
user_cpus_ptr. The cpuset code is supposed to call 
set_cpus_allowed_ptr() which then set the mask correctly. I will run 
your test and figure out why it does not work as I would have expected.


>
>> v2 is able to recover a previously offlined cpu. So we don't gain any
>> net benefit other than the going up the cpuset hierarchy part.
> Only for !root tasks. Not even v2 will re-set the affinity of root tasks
> afaict.
>
>> For v1, I agree we should go up the cpuset hierarchy to find a usable
>> cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(),
>> my current preference is to do the hierarchy climbing part in an enhanced
>> cpuset_cpus_allowed_fallback() after an initial failure of
>> cpuset_cpus_allowed(). That will be easier to understand than having such
>> complexity and overhead in cpuset_cpus_allowed() alone.
>>
>> I will work on a patchset to do that as a counter offer.
> We will need a small and simple patch for /urgent, or I will need to
> revert all your patches -- your call.
>
> I also don't tihnk you fully appreciate the ramifications of
> task_cpu_possible_mask(), cpuset currently gets that quite wrong.

OK, I don't realize the urgency of that. If it is that urgent, I will 
have no objection to get it in for now. We can improve it later on. So 
are you planning to get it into the current 6.2 rc or 6.3?

Tejun, are you OK with that as you are the cgroup maintainer?

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02 20:48               ` Tejun Heo
  0 siblings, 0 replies; 57+ messages in thread
From: Tejun Heo @ 2023-02-02 20:48 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Will Deacon, linux-kernel, kernel-team, Zefan Li,
	Johannes Weiner, cgroups

On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
> > > I will work on a patchset to do that as a counter offer.
> > We will need a small and simple patch for /urgent, or I will need to
> > revert all your patches -- your call.
> > 
> > I also don't tihnk you fully appreciate the ramifications of
> > task_cpu_possible_mask(), cpuset currently gets that quite wrong.
> 
> OK, I don't realize the urgency of that. If it is that urgent, I will have
> no objection to get it in for now. We can improve it later on. So are you
> planning to get it into the current 6.2 rc or 6.3?
> 
> Tejun, are you OK with that as you are the cgroup maintainer?

Yeah, gotta fix the regression but is there currently a solution which fixes
the regression but doesn't further break other stuff?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02 20:48               ` Tejun Heo
  0 siblings, 0 replies; 57+ messages in thread
From: Tejun Heo @ 2023-02-02 20:48 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Johannes Weiner,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
> > > I will work on a patchset to do that as a counter offer.
> > We will need a small and simple patch for /urgent, or I will need to
> > revert all your patches -- your call.
> > 
> > I also don't tihnk you fully appreciate the ramifications of
> > task_cpu_possible_mask(), cpuset currently gets that quite wrong.
> 
> OK, I don't realize the urgency of that. If it is that urgent, I will have
> no objection to get it in for now. We can improve it later on. So are you
> planning to get it into the current 6.2 rc or 6.3?
> 
> Tejun, are you OK with that as you are the cgroup maintainer?

Yeah, gotta fix the regression but is there currently a solution which fixes
the regression but doesn't further break other stuff?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02 20:53                 ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-02 20:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Will Deacon, linux-kernel, kernel-team, Zefan Li,
	Johannes Weiner, cgroups


On 2/2/23 15:48, Tejun Heo wrote:
> On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
>>>> I will work on a patchset to do that as a counter offer.
>>> We will need a small and simple patch for /urgent, or I will need to
>>> revert all your patches -- your call.
>>>
>>> I also don't tihnk you fully appreciate the ramifications of
>>> task_cpu_possible_mask(), cpuset currently gets that quite wrong.
>> OK, I don't realize the urgency of that. If it is that urgent, I will have
>> no objection to get it in for now. We can improve it later on. So are you
>> planning to get it into the current 6.2 rc or 6.3?
>>
>> Tejun, are you OK with that as you are the cgroup maintainer?
> Yeah, gotta fix the regression but is there currently a solution which fixes
> the regression but doesn't further break other stuff?

I believe there is a better way to do that, but it will need more time 
to flex out. Since cpuset_cpus_allowed() is only used by 
kernel/sched/core.c, Peter will be responsible if it somehow breaks 
other stuff.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02 20:53                 ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-02 20:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Johannes Weiner,
	cgroups-u79uwXL29TY76Z2rM5mHXA


On 2/2/23 15:48, Tejun Heo wrote:
> On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
>>>> I will work on a patchset to do that as a counter offer.
>>> We will need a small and simple patch for /urgent, or I will need to
>>> revert all your patches -- your call.
>>>
>>> I also don't tihnk you fully appreciate the ramifications of
>>> task_cpu_possible_mask(), cpuset currently gets that quite wrong.
>> OK, I don't realize the urgency of that. If it is that urgent, I will have
>> no objection to get it in for now. We can improve it later on. So are you
>> planning to get it into the current 6.2 rc or 6.3?
>>
>> Tejun, are you OK with that as you are the cgroup maintainer?
> Yeah, gotta fix the regression but is there currently a solution which fixes
> the regression but doesn't further break other stuff?

I believe there is a better way to do that, but it will need more time 
to flex out. Since cpuset_cpus_allowed() is only used by 
kernel/sched/core.c, Peter will be responsible if it somehow breaks 
other stuff.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
  2023-02-02 20:53                 ` Waiman Long
  (?)
@ 2023-02-02 21:05                 ` Waiman Long
  2023-02-02 21:50                     ` Tejun Heo
  -1 siblings, 1 reply; 57+ messages in thread
From: Waiman Long @ 2023-02-02 21:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Will Deacon, linux-kernel, kernel-team, Zefan Li,
	Johannes Weiner, cgroups


On 2/2/23 15:53, Waiman Long wrote:
>
> On 2/2/23 15:48, Tejun Heo wrote:
>> On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
>>>>> I will work on a patchset to do that as a counter offer.
>>>> We will need a small and simple patch for /urgent, or I will need to
>>>> revert all your patches -- your call.
>>>>
>>>> I also don't tihnk you fully appreciate the ramifications of
>>>> task_cpu_possible_mask(), cpuset currently gets that quite wrong.
>>> OK, I don't realize the urgency of that. If it is that urgent, I 
>>> will have
>>> no objection to get it in for now. We can improve it later on. So 
>>> are you
>>> planning to get it into the current 6.2 rc or 6.3?
>>>
>>> Tejun, are you OK with that as you are the cgroup maintainer?
>> Yeah, gotta fix the regression but is there currently a solution 
>> which fixes
>> the regression but doesn't further break other stuff?
>
> I believe there is a better way to do that, but it will need more time 
> to flex out. Since cpuset_cpus_allowed() is only used by 
> kernel/sched/core.c, Peter will be responsible if it somehow breaks 
> other stuff.

Maybe my cpuset patch that don't update task's cpumask on cpu offline 
event can help. However, I don't know the exact scenario where the 
regression happen, so it may not.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02 21:50                     ` Tejun Heo
  0 siblings, 0 replies; 57+ messages in thread
From: Tejun Heo @ 2023-02-02 21:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Will Deacon, linux-kernel, kernel-team, Zefan Li,
	Johannes Weiner, cgroups

On Thu, Feb 02, 2023 at 04:05:14PM -0500, Waiman Long wrote:
> 
> On 2/2/23 15:53, Waiman Long wrote:
> > 
> > On 2/2/23 15:48, Tejun Heo wrote:
> > > On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
> > > > > > I will work on a patchset to do that as a counter offer.
> > > > > We will need a small and simple patch for /urgent, or I will need to
> > > > > revert all your patches -- your call.
> > > > > 
> > > > > I also don't tihnk you fully appreciate the ramifications of
> > > > > task_cpu_possible_mask(), cpuset currently gets that quite wrong.
> > > > OK, I don't realize the urgency of that. If it is that urgent, I
> > > > will have
> > > > no objection to get it in for now. We can improve it later on.
> > > > So are you
> > > > planning to get it into the current 6.2 rc or 6.3?
> > > > 
> > > > Tejun, are you OK with that as you are the cgroup maintainer?
> > > Yeah, gotta fix the regression but is there currently a solution
> > > which fixes
> > > the regression but doesn't further break other stuff?
> > 
> > I believe there is a better way to do that, but it will need more time
> > to flex out. Since cpuset_cpus_allowed() is only used by
> > kernel/sched/core.c, Peter will be responsible if it somehow breaks
> > other stuff.
> 
> Maybe my cpuset patch that don't update task's cpumask on cpu offline event
> can help. However, I don't know the exact scenario where the regression
> happen, so it may not.

Neither patch looks like they would break anything. That said, the patches
aren't trivial and we're really close to the merge window, so I'd really
appreciate if you can take a look and test a bit before we send these
Linus's way. We can replace it with a better solution afterwards.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-02 21:50                     ` Tejun Heo
  0 siblings, 0 replies; 57+ messages in thread
From: Tejun Heo @ 2023-02-02 21:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Johannes Weiner,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu, Feb 02, 2023 at 04:05:14PM -0500, Waiman Long wrote:
> 
> On 2/2/23 15:53, Waiman Long wrote:
> > 
> > On 2/2/23 15:48, Tejun Heo wrote:
> > > On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
> > > > > > I will work on a patchset to do that as a counter offer.
> > > > > We will need a small and simple patch for /urgent, or I will need to
> > > > > revert all your patches -- your call.
> > > > > 
> > > > > I also don't tihnk you fully appreciate the ramifications of
> > > > > task_cpu_possible_mask(), cpuset currently gets that quite wrong.
> > > > OK, I don't realize the urgency of that. If it is that urgent, I
> > > > will have
> > > > no objection to get it in for now. We can improve it later on.
> > > > So are you
> > > > planning to get it into the current 6.2 rc or 6.3?
> > > > 
> > > > Tejun, are you OK with that as you are the cgroup maintainer?
> > > Yeah, gotta fix the regression but is there currently a solution
> > > which fixes
> > > the regression but doesn't further break other stuff?
> > 
> > I believe there is a better way to do that, but it will need more time
> > to flex out. Since cpuset_cpus_allowed() is only used by
> > kernel/sched/core.c, Peter will be responsible if it somehow breaks
> > other stuff.
> 
> Maybe my cpuset patch that don't update task's cpumask on cpu offline event
> can help. However, I don't know the exact scenario where the regression
> happen, so it may not.

Neither patch looks like they would break anything. That said, the patches
aren't trivial and we're really close to the merge window, so I'd really
appreciate if you can take a look and test a bit before we send these
Linus's way. We can replace it with a better solution afterwards.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-03  0:54                       ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-03  0:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Will Deacon, linux-kernel, kernel-team, Zefan Li,
	Johannes Weiner, cgroups

On 2/2/23 16:50, Tejun Heo wrote:
> On Thu, Feb 02, 2023 at 04:05:14PM -0500, Waiman Long wrote:
>> On 2/2/23 15:53, Waiman Long wrote:
>>> On 2/2/23 15:48, Tejun Heo wrote:
>>>> On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
>>>>>>> I will work on a patchset to do that as a counter offer.
>>>>>> We will need a small and simple patch for /urgent, or I will need to
>>>>>> revert all your patches -- your call.
>>>>>>
>>>>>> I also don't tihnk you fully appreciate the ramifications of
>>>>>> task_cpu_possible_mask(), cpuset currently gets that quite wrong.
>>>>> OK, I don't realize the urgency of that. If it is that urgent, I
>>>>> will have
>>>>> no objection to get it in for now. We can improve it later on.
>>>>> So are you
>>>>> planning to get it into the current 6.2 rc or 6.3?
>>>>>
>>>>> Tejun, are you OK with that as you are the cgroup maintainer?
>>>> Yeah, gotta fix the regression but is there currently a solution
>>>> which fixes
>>>> the regression but doesn't further break other stuff?
>>> I believe there is a better way to do that, but it will need more time
>>> to flex out. Since cpuset_cpus_allowed() is only used by
>>> kernel/sched/core.c, Peter will be responsible if it somehow breaks
>>> other stuff.
>> Maybe my cpuset patch that don't update task's cpumask on cpu offline event
>> can help. However, I don't know the exact scenario where the regression
>> happen, so it may not.
> Neither patch looks like they would break anything. That said, the patches
> aren't trivial and we're really close to the merge window, so I'd really
> appreciate if you can take a look and test a bit before we send these
> Linus's way. We can replace it with a better solution afterwards.

OK, will do.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-03  0:54                       ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-03  0:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Johannes Weiner,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 2/2/23 16:50, Tejun Heo wrote:
> On Thu, Feb 02, 2023 at 04:05:14PM -0500, Waiman Long wrote:
>> On 2/2/23 15:53, Waiman Long wrote:
>>> On 2/2/23 15:48, Tejun Heo wrote:
>>>> On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
>>>>>>> I will work on a patchset to do that as a counter offer.
>>>>>> We will need a small and simple patch for /urgent, or I will need to
>>>>>> revert all your patches -- your call.
>>>>>>
>>>>>> I also don't tihnk you fully appreciate the ramifications of
>>>>>> task_cpu_possible_mask(), cpuset currently gets that quite wrong.
>>>>> OK, I don't realize the urgency of that. If it is that urgent, I
>>>>> will have
>>>>> no objection to get it in for now. We can improve it later on.
>>>>> So are you
>>>>> planning to get it into the current 6.2 rc or 6.3?
>>>>>
>>>>> Tejun, are you OK with that as you are the cgroup maintainer?
>>>> Yeah, gotta fix the regression but is there currently a solution
>>>> which fixes
>>>> the regression but doesn't further break other stuff?
>>> I believe there is a better way to do that, but it will need more time
>>> to flex out. Since cpuset_cpus_allowed() is only used by
>>> kernel/sched/core.c, Peter will be responsible if it somehow breaks
>>> other stuff.
>> Maybe my cpuset patch that don't update task's cpumask on cpu offline event
>> can help. However, I don't know the exact scenario where the regression
>> happen, so it may not.
> Neither patch looks like they would break anything. That said, the patches
> aren't trivial and we're really close to the merge window, so I'd really
> appreciate if you can take a look and test a bit before we send these
> Linus's way. We can replace it with a better solution afterwards.

OK, will do.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-03 11:50                 ` Will Deacon
  0 siblings, 0 replies; 57+ messages in thread
From: Will Deacon @ 2023-02-03 11:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On Wed, Feb 01, 2023 at 10:34:00PM -0500, Waiman Long wrote:
> On 2/1/23 16:10, Peter Zijlstra wrote:
> > On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote:
> > 
> > > Note that using cpus_allowed directly in cgroup v2 may not be right because
> > > cpus_allowed may have no relationship to effective_cpus at all in some
> > > cases, e.g.
> > > 
> > >     root
> > >      |
> > >      V
> > >      A (cpus_allowed = 1-4, effective_cpus = 1-4)
> > >      |
> > >      V
> > >      B (cpus_allowed = 5-8, effective_cpus = 1-4)
> > > 
> > > In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.
> > I think my patch as written does the right thing here. Since the
> > intersection of (1-4) and (5-8) is empty it will move up the hierarchy
> > and we'll end up with (1-4) from the cgroup side of things.
> > 
> > So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of
> > the root set and force it to cpu_possible_mask.
> > 
> > Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and
> > all it's parents. This will, in the case of B above, result in the empty
> > mask.
> > 
> > Then cpuset_cpus_allowed() has a loop that starts with
> > task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if
> > the intersection of that and cpu_online_mask is empty, moves up the
> > hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A.
> > 
> > Note that since we force the mask of root to cpu_possible_mask,
> > cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code)
> > that cpu_online_mask always has a non-empty intersection with
> > task_cpu_possible_mask(), this loop is guaranteed to terminate with a
> > viable mask.
> 
> I will take a closer look at that tomorrow. I will be more comfortable
> ack'ing that if this is specific to v1 cpuset instead of applying this in
> both v1 and v2 since it is only v1 that is problematic.

fwiw, the regression I'm seeing is with cgroup2. I haven't tried v1.

WIll

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-03 11:50                 ` Will Deacon
  0 siblings, 0 replies; 57+ messages in thread
From: Will Deacon @ 2023-02-03 11:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, Feb 01, 2023 at 10:34:00PM -0500, Waiman Long wrote:
> On 2/1/23 16:10, Peter Zijlstra wrote:
> > On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote:
> > 
> > > Note that using cpus_allowed directly in cgroup v2 may not be right because
> > > cpus_allowed may have no relationship to effective_cpus at all in some
> > > cases, e.g.
> > > 
> > >     root
> > >      |
> > >      V
> > >      A (cpus_allowed = 1-4, effective_cpus = 1-4)
> > >      |
> > >      V
> > >      B (cpus_allowed = 5-8, effective_cpus = 1-4)
> > > 
> > > In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.
> > I think my patch as written does the right thing here. Since the
> > intersection of (1-4) and (5-8) is empty it will move up the hierarchy
> > and we'll end up with (1-4) from the cgroup side of things.
> > 
> > So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of
> > the root set and force it to cpu_possible_mask.
> > 
> > Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and
> > all it's parents. This will, in the case of B above, result in the empty
> > mask.
> > 
> > Then cpuset_cpus_allowed() has a loop that starts with
> > task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if
> > the intersection of that and cpu_online_mask is empty, moves up the
> > hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A.
> > 
> > Note that since we force the mask of root to cpu_possible_mask,
> > cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code)
> > that cpu_online_mask always has a non-empty intersection with
> > task_cpu_possible_mask(), this loop is guaranteed to terminate with a
> > viable mask.
> 
> I will take a closer look at that tomorrow. I will be more comfortable
> ack'ing that if this is specific to v1 cpuset instead of applying this in
> both v1 and v2 since it is only v1 that is problematic.

fwiw, the regression I'm seeing is with cgroup2. I haven't tried v1.

WIll

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
  2023-02-03 11:50                 ` Will Deacon
@ 2023-02-03 15:13                   ` Waiman Long
  -1 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-03 15:13 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On 2/3/23 06:50, Will Deacon wrote:
> On Wed, Feb 01, 2023 at 10:34:00PM -0500, Waiman Long wrote:
>> On 2/1/23 16:10, Peter Zijlstra wrote:
>>> On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote:
>>>
>>>> Note that using cpus_allowed directly in cgroup v2 may not be right because
>>>> cpus_allowed may have no relationship to effective_cpus at all in some
>>>> cases, e.g.
>>>>
>>>>      root
>>>>       |
>>>>       V
>>>>       A (cpus_allowed = 1-4, effective_cpus = 1-4)
>>>>       |
>>>>       V
>>>>       B (cpus_allowed = 5-8, effective_cpus = 1-4)
>>>>
>>>> In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.
>>> I think my patch as written does the right thing here. Since the
>>> intersection of (1-4) and (5-8) is empty it will move up the hierarchy
>>> and we'll end up with (1-4) from the cgroup side of things.
>>>
>>> So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of
>>> the root set and force it to cpu_possible_mask.
>>>
>>> Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and
>>> all it's parents. This will, in the case of B above, result in the empty
>>> mask.
>>>
>>> Then cpuset_cpus_allowed() has a loop that starts with
>>> task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if
>>> the intersection of that and cpu_online_mask is empty, moves up the
>>> hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A.
>>>
>>> Note that since we force the mask of root to cpu_possible_mask,
>>> cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code)
>>> that cpu_online_mask always has a non-empty intersection with
>>> task_cpu_possible_mask(), this loop is guaranteed to terminate with a
>>> viable mask.
>> I will take a closer look at that tomorrow. I will be more comfortable
>> ack'ing that if this is specific to v1 cpuset instead of applying this in
>> both v1 and v2 since it is only v1 that is problematic.
> fwiw, the regression I'm seeing is with cgroup2. I haven't tried v1.

I think I know where the problem is. It is due to the fact the cpuset 
hotplug code doesn't update cpumasks of the tasks in the top cpuset 
(root) at all when there is a cpu offline or online event. It is 
probably because for some of the tasks in the top cpuset, especially the 
percpu kthread, changing their cpumasks can be catastrophic. The hotplug 
code does update the cpumasks of the tasks that are not in the top 
cpuset. This problem is irrespective of whether v1 or v2 is in use.

The partition code does try to update the cpumasks of the tasks in the 
top cpuset, but skip the percpu kthreads. My testing show that is 
probably OK. For safety, I agree that is better to extend the allowed 
cpu list to all the possible cpus (including offline ones) for now until 
more testings are done to find a safe way to do that. That special case 
should apply only to tasks in the top cpuset. For the rests, the current 
code should be OK and is less risky than adopting code in this patch.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-03 15:13                   ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-03 15:13 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On 2/3/23 06:50, Will Deacon wrote:
> On Wed, Feb 01, 2023 at 10:34:00PM -0500, Waiman Long wrote:
>> On 2/1/23 16:10, Peter Zijlstra wrote:
>>> On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote:
>>>
>>>> Note that using cpus_allowed directly in cgroup v2 may not be right because
>>>> cpus_allowed may have no relationship to effective_cpus at all in some
>>>> cases, e.g.
>>>>
>>>>      root
>>>>       |
>>>>       V
>>>>       A (cpus_allowed = 1-4, effective_cpus = 1-4)
>>>>       |
>>>>       V
>>>>       B (cpus_allowed = 5-8, effective_cpus = 1-4)
>>>>
>>>> In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.
>>> I think my patch as written does the right thing here. Since the
>>> intersection of (1-4) and (5-8) is empty it will move up the hierarchy
>>> and we'll end up with (1-4) from the cgroup side of things.
>>>
>>> So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of
>>> the root set and force it to cpu_possible_mask.
>>>
>>> Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and
>>> all it's parents. This will, in the case of B above, result in the empty
>>> mask.
>>>
>>> Then cpuset_cpus_allowed() has a loop that starts with
>>> task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if
>>> the intersection of that and cpu_online_mask is empty, moves up the
>>> hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A.
>>>
>>> Note that since we force the mask of root to cpu_possible_mask,
>>> cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code)
>>> that cpu_online_mask always has a non-empty intersection with
>>> task_cpu_possible_mask(), this loop is guaranteed to terminate with a
>>> viable mask.
>> I will take a closer look at that tomorrow. I will be more comfortable
>> ack'ing that if this is specific to v1 cpuset instead of applying this in
>> both v1 and v2 since it is only v1 that is problematic.
> fwiw, the regression I'm seeing is with cgroup2. I haven't tried v1.

I think I know where the problem is. It is due to the fact the cpuset 
hotplug code doesn't update cpumasks of the tasks in the top cpuset 
(root) at all when there is a cpu offline or online event. It is 
probably because for some of the tasks in the top cpuset, especially the 
percpu kthread, changing their cpumasks can be catastrophic. The hotplug 
code does update the cpumasks of the tasks that are not in the top 
cpuset. This problem is irrespective of whether v1 or v2 is in use.

The partition code does try to update the cpumasks of the tasks in the 
top cpuset, but skip the percpu kthreads. My testing show that is 
probably OK. For safety, I agree that is better to extend the allowed 
cpu list to all the possible cpus (including offline ones) for now until 
more testings are done to find a safe way to do that. That special case 
should apply only to tasks in the top cpuset. For the rests, the current 
code should be OK and is less risky than adopting code in this patch.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-03 15:26                     ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-03 15:26 UTC (permalink / raw)
  To: Waiman Long
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On Fri, Feb 03, 2023 at 10:13:10AM -0500, Waiman Long wrote:

> I think I know where the problem is. It is due to the fact the cpuset
> hotplug code doesn't update cpumasks of the tasks in the top cpuset (root)
> at all when there is a cpu offline or online event. It is probably because
> for some of the tasks in the top cpuset, especially the percpu kthread,
> changing their cpumasks can be catastrophic. The hotplug code does update
> the cpumasks of the tasks that are not in the top cpuset. This problem is
> irrespective of whether v1 or v2 is in use.

I've been saying this exact thing for how many mails now?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-03 15:26                     ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2023-02-03 15:26 UTC (permalink / raw)
  To: Waiman Long
  Cc: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri, Feb 03, 2023 at 10:13:10AM -0500, Waiman Long wrote:

> I think I know where the problem is. It is due to the fact the cpuset
> hotplug code doesn't update cpumasks of the tasks in the top cpuset (root)
> at all when there is a cpu offline or online event. It is probably because
> for some of the tasks in the top cpuset, especially the percpu kthread,
> changing their cpumasks can be catastrophic. The hotplug code does update
> the cpumasks of the tasks that are not in the top cpuset. This problem is
> irrespective of whether v1 or v2 is in use.

I've been saying this exact thing for how many mails now?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-03 15:35                       ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-03 15:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups


On 2/3/23 10:26, Peter Zijlstra wrote:
> On Fri, Feb 03, 2023 at 10:13:10AM -0500, Waiman Long wrote:
>
>> I think I know where the problem is. It is due to the fact the cpuset
>> hotplug code doesn't update cpumasks of the tasks in the top cpuset (root)
>> at all when there is a cpu offline or online event. It is probably because
>> for some of the tasks in the top cpuset, especially the percpu kthread,
>> changing their cpumasks can be catastrophic. The hotplug code does update
>> the cpumasks of the tasks that are not in the top cpuset. This problem is
>> irrespective of whether v1 or v2 is in use.
> I've been saying this exact thing for how many mails now?

My bad. The fact that sched_getaffinity() masks off the offline cpus 
makes me thought incorrectly that tasks in the top cpuset were also 
updated by the hotplug code. Further testing indicates this is the case.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-03 15:35                       ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-03 15:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA


On 2/3/23 10:26, Peter Zijlstra wrote:
> On Fri, Feb 03, 2023 at 10:13:10AM -0500, Waiman Long wrote:
>
>> I think I know where the problem is. It is due to the fact the cpuset
>> hotplug code doesn't update cpumasks of the tasks in the top cpuset (root)
>> at all when there is a cpu offline or online event. It is probably because
>> for some of the tasks in the top cpuset, especially the percpu kthread,
>> changing their cpumasks can be catastrophic. The hotplug code does update
>> the cpumasks of the tasks that are not in the top cpuset. This problem is
>> irrespective of whether v1 or v2 is in use.
> I've been saying this exact thing for how many mails now?

My bad. The fact that sched_getaffinity() masks off the offline cpus 
makes me thought incorrectly that tasks in the top cpuset were also 
updated by the hotplug code. Further testing indicates this is the case.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-03 16:31                       ` Will Deacon
  0 siblings, 0 replies; 57+ messages in thread
From: Will Deacon @ 2023-02-03 16:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Waiman Long, Peter Zijlstra, linux-kernel, kernel-team, Zefan Li,
	Johannes Weiner, cgroups

On Thu, Feb 02, 2023 at 11:50:55AM -1000, Tejun Heo wrote:
> On Thu, Feb 02, 2023 at 04:05:14PM -0500, Waiman Long wrote:
> > 
> > On 2/2/23 15:53, Waiman Long wrote:
> > > 
> > > On 2/2/23 15:48, Tejun Heo wrote:
> > > > On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
> > > > > > > I will work on a patchset to do that as a counter offer.
> > > > > > We will need a small and simple patch for /urgent, or I will need to
> > > > > > revert all your patches -- your call.
> > > > > > 
> > > > > > I also don't tihnk you fully appreciate the ramifications of
> > > > > > task_cpu_possible_mask(), cpuset currently gets that quite wrong.
> > > > > OK, I don't realize the urgency of that. If it is that urgent, I
> > > > > will have
> > > > > no objection to get it in for now. We can improve it later on.
> > > > > So are you
> > > > > planning to get it into the current 6.2 rc or 6.3?
> > > > > 
> > > > > Tejun, are you OK with that as you are the cgroup maintainer?
> > > > Yeah, gotta fix the regression but is there currently a solution
> > > > which fixes
> > > > the regression but doesn't further break other stuff?
> > > 
> > > I believe there is a better way to do that, but it will need more time
> > > to flex out. Since cpuset_cpus_allowed() is only used by
> > > kernel/sched/core.c, Peter will be responsible if it somehow breaks
> > > other stuff.
> > 
> > Maybe my cpuset patch that don't update task's cpumask on cpu offline event
> > can help. However, I don't know the exact scenario where the regression
> > happen, so it may not.
> 
> Neither patch looks like they would break anything. That said, the patches
> aren't trivial and we're really close to the merge window, so I'd really
> appreciate if you can take a look and test a bit before we send these
> Linus's way. We can replace it with a better solution afterwards.

FWIW, I tested this series in an arm64 heterogeneous setup with things
like hotplug and exec()ing between 32-bit and 64-bit tasks and it all
seems good.

The alternative would be to revert Waiman's setaffinity changes, which
I've had a go at here:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=ssa-reverts

and I _think_ I've rescued the UAF fix too.

What do people prefer?

Will

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
@ 2023-02-03 16:31                       ` Will Deacon
  0 siblings, 0 replies; 57+ messages in thread
From: Will Deacon @ 2023-02-03 16:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Waiman Long, Peter Zijlstra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Zefan Li, Johannes Weiner,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu, Feb 02, 2023 at 11:50:55AM -1000, Tejun Heo wrote:
> On Thu, Feb 02, 2023 at 04:05:14PM -0500, Waiman Long wrote:
> > 
> > On 2/2/23 15:53, Waiman Long wrote:
> > > 
> > > On 2/2/23 15:48, Tejun Heo wrote:
> > > > On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
> > > > > > > I will work on a patchset to do that as a counter offer.
> > > > > > We will need a small and simple patch for /urgent, or I will need to
> > > > > > revert all your patches -- your call.
> > > > > > 
> > > > > > I also don't tihnk you fully appreciate the ramifications of
> > > > > > task_cpu_possible_mask(), cpuset currently gets that quite wrong.
> > > > > OK, I don't realize the urgency of that. If it is that urgent, I
> > > > > will have
> > > > > no objection to get it in for now. We can improve it later on.
> > > > > So are you
> > > > > planning to get it into the current 6.2 rc or 6.3?
> > > > > 
> > > > > Tejun, are you OK with that as you are the cgroup maintainer?
> > > > Yeah, gotta fix the regression but is there currently a solution
> > > > which fixes
> > > > the regression but doesn't further break other stuff?
> > > 
> > > I believe there is a better way to do that, but it will need more time
> > > to flex out. Since cpuset_cpus_allowed() is only used by
> > > kernel/sched/core.c, Peter will be responsible if it somehow breaks
> > > other stuff.
> > 
> > Maybe my cpuset patch that don't update task's cpumask on cpu offline event
> > can help. However, I don't know the exact scenario where the regression
> > happen, so it may not.
> 
> Neither patch looks like they would break anything. That said, the patches
> aren't trivial and we're really close to the merge window, so I'd really
> appreciate if you can take a look and test a bit before we send these
> Linus's way. We can replace it with a better solution afterwards.

FWIW, I tested this series in an arm64 heterogeneous setup with things
like hotplug and exec()ing between 32-bit and 64-bit tasks and it all
seems good.

The alternative would be to revert Waiman's setaffinity changes, which
I've had a go at here:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=ssa-reverts

and I _think_ I've rescued the UAF fix too.

What do people prefer?

Will

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
@ 2023-02-03 17:55     ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-03 17:55 UTC (permalink / raw)
  To: Will Deacon, linux-kernel
  Cc: kernel-team, Peter Zijlstra, Zefan Li, Tejun Heo,
	Johannes Weiner, cgroups

On 1/31/23 17:17, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
>
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
>
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
>
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>
> Note: We wondered whether it was worth calling guarantee_online_cpus()
> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> this path is only called when the effective mask changes, it didn't
> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> cpuset containing only 64-bit cpus, then the affinity is going to be
> forced.
>
>   kernel/cgroup/cpuset.c | 18 +++++++++++-------
>   1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 8552cc2c586a..f15fb0426707 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1205,12 +1205,13 @@ void rebuild_sched_domains(void)
>   /**
>    * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
>    * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
> + * @new_cpus: the temp variable for the new effective_cpus mask
>    *
>    * Iterate through each task of @cs updating its cpus_allowed to the
>    * effective cpuset's.  As this function is called with cpuset_rwsem held,
>    * cpuset membership stays stable.
>    */
> -static void update_tasks_cpumask(struct cpuset *cs)
> +static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
>   {
>   	struct css_task_iter it;
>   	struct task_struct *task;
> @@ -1224,7 +1225,10 @@ static void update_tasks_cpumask(struct cpuset *cs)
>   		if (top_cs && (task->flags & PF_KTHREAD) &&
>   		    kthread_is_per_cpu(task))
>   			continue;
> -		set_cpus_allowed_ptr(task, cs->effective_cpus);
> +
> +		cpumask_and(new_cpus, cs->effective_cpus,
> +			    task_cpu_possible_mask(task));
> +		set_cpus_allowed_ptr(task, new_cpus);
>   	}
>   	css_task_iter_end(&it);
>   }
> @@ -1509,7 +1513,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
>   	spin_unlock_irq(&callback_lock);
>   
>   	if (adding || deleting)
> -		update_tasks_cpumask(parent);
> +		update_tasks_cpumask(parent, tmp->new_cpus);
>   
>   	/*
>   	 * Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary.
> @@ -1661,7 +1665,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
>   		WARN_ON(!is_in_v2_mode() &&
>   			!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
>   
> -		update_tasks_cpumask(cp);
> +		update_tasks_cpumask(cp, tmp->new_cpus);
>   
>   		/*
>   		 * On legacy hierarchy, if the effective cpumask of any non-
> @@ -2309,7 +2313,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
>   		}
>   	}
>   
> -	update_tasks_cpumask(parent);
> +	update_tasks_cpumask(parent, tmpmask.new_cpus);
>   
>   	if (parent->child_ecpus_count)
>   		update_sibling_cpumasks(parent, cs, &tmpmask);
> @@ -3347,7 +3351,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
>   	 * as the tasks will be migrated to an ancestor.
>   	 */
>   	if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
> -		update_tasks_cpumask(cs);
> +		update_tasks_cpumask(cs, new_cpus);
>   	if (mems_updated && !nodes_empty(cs->mems_allowed))
>   		update_tasks_nodemask(cs);
>   
> @@ -3384,7 +3388,7 @@ hotplug_update_tasks(struct cpuset *cs,
>   	spin_unlock_irq(&callback_lock);
>   
>   	if (cpus_updated)
> -		update_tasks_cpumask(cs);
> +		update_tasks_cpumask(cs, new_cpus);
>   	if (mems_updated)
>   		update_tasks_nodemask(cs);
>   }

Acked-by: Waiman Long <longman@redhat.com>

This change is good for backporting to stable releases. For the latest 
kernel, I will prefer to centralize the masking in 
__set_cpus_allowed_ptr() where a scratch_mask is already being used for 
masking purpose. Let get this patch merged now, I will send a patch to 
move off the masking afterward.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
@ 2023-02-03 17:55     ` Waiman Long
  0 siblings, 0 replies; 57+ messages in thread
From: Waiman Long @ 2023-02-03 17:55 UTC (permalink / raw)
  To: Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Peter Zijlstra, Zefan Li,
	Tejun Heo, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On 1/31/23 17:17, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
>
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
>
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
>
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
>
> Note: We wondered whether it was worth calling guarantee_online_cpus()
> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> this path is only called when the effective mask changes, it didn't
> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> cpuset containing only 64-bit cpus, then the affinity is going to be
> forced.
>
>   kernel/cgroup/cpuset.c | 18 +++++++++++-------
>   1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 8552cc2c586a..f15fb0426707 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1205,12 +1205,13 @@ void rebuild_sched_domains(void)
>   /**
>    * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
>    * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
> + * @new_cpus: the temp variable for the new effective_cpus mask
>    *
>    * Iterate through each task of @cs updating its cpus_allowed to the
>    * effective cpuset's.  As this function is called with cpuset_rwsem held,
>    * cpuset membership stays stable.
>    */
> -static void update_tasks_cpumask(struct cpuset *cs)
> +static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
>   {
>   	struct css_task_iter it;
>   	struct task_struct *task;
> @@ -1224,7 +1225,10 @@ static void update_tasks_cpumask(struct cpuset *cs)
>   		if (top_cs && (task->flags & PF_KTHREAD) &&
>   		    kthread_is_per_cpu(task))
>   			continue;
> -		set_cpus_allowed_ptr(task, cs->effective_cpus);
> +
> +		cpumask_and(new_cpus, cs->effective_cpus,
> +			    task_cpu_possible_mask(task));
> +		set_cpus_allowed_ptr(task, new_cpus);
>   	}
>   	css_task_iter_end(&it);
>   }
> @@ -1509,7 +1513,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
>   	spin_unlock_irq(&callback_lock);
>   
>   	if (adding || deleting)
> -		update_tasks_cpumask(parent);
> +		update_tasks_cpumask(parent, tmp->new_cpus);
>   
>   	/*
>   	 * Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary.
> @@ -1661,7 +1665,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
>   		WARN_ON(!is_in_v2_mode() &&
>   			!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
>   
> -		update_tasks_cpumask(cp);
> +		update_tasks_cpumask(cp, tmp->new_cpus);
>   
>   		/*
>   		 * On legacy hierarchy, if the effective cpumask of any non-
> @@ -2309,7 +2313,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
>   		}
>   	}
>   
> -	update_tasks_cpumask(parent);
> +	update_tasks_cpumask(parent, tmpmask.new_cpus);
>   
>   	if (parent->child_ecpus_count)
>   		update_sibling_cpumasks(parent, cs, &tmpmask);
> @@ -3347,7 +3351,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
>   	 * as the tasks will be migrated to an ancestor.
>   	 */
>   	if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
> -		update_tasks_cpumask(cs);
> +		update_tasks_cpumask(cs, new_cpus);
>   	if (mems_updated && !nodes_empty(cs->mems_allowed))
>   		update_tasks_nodemask(cs);
>   
> @@ -3384,7 +3388,7 @@ hotplug_update_tasks(struct cpuset *cs,
>   	spin_unlock_irq(&callback_lock);
>   
>   	if (cpus_updated)
> -		update_tasks_cpumask(cs);
> +		update_tasks_cpumask(cs, new_cpus);
>   	if (mems_updated)
>   		update_tasks_nodemask(cs);
>   }

Acked-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

This change is good for backporting to stable releases. For the latest 
kernel, I will prefer to centralize the masking in 
__set_cpus_allowed_ptr() where a scratch_mask is already being used for 
masking purpose. Let get this patch merged now, I will send a patch to 
move off the masking afterward.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
@ 2023-02-06 20:21     ` Tejun Heo
  0 siblings, 0 replies; 57+ messages in thread
From: Tejun Heo @ 2023-02-06 20:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, kernel-team, Peter Zijlstra, Waiman Long, Zefan Li,
	Johannes Weiner, cgroups

On Tue, Jan 31, 2023 at 10:17:19PM +0000, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
> 
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
> 
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
> 
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <will@kernel.org>

Applied to cgroup/for-6.2-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
@ 2023-02-06 20:21     ` Tejun Heo
  0 siblings, 0 replies; 57+ messages in thread
From: Tejun Heo @ 2023-02-06 20:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ, Peter Zijlstra, Waiman Long,
	Zefan Li, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, Jan 31, 2023 at 10:17:19PM +0000, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
> 
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
> 
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
> 
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <will-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Applied to cgroup/for-6.2-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2023-02-06 20:21 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-31 22:17 [PATCH 0/2] Fix broken cpuset affinity handling on heterogeneous systems Will Deacon
2023-01-31 22:17 ` Will Deacon
2023-01-31 22:17 ` [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs Will Deacon
2023-01-31 22:17   ` Will Deacon
2023-02-01  4:14   ` Waiman Long
2023-02-01  4:14     ` Waiman Long
2023-02-01  9:14     ` Peter Zijlstra
2023-02-01  9:14       ` Peter Zijlstra
2023-02-01 15:16       ` Waiman Long
2023-02-01 15:16         ` Waiman Long
2023-02-01 18:46         ` Waiman Long
2023-02-01 19:14           ` Waiman Long
2023-02-01 19:14             ` Waiman Long
2023-02-01 19:17             ` Waiman Long
2023-02-01 19:17               ` Waiman Long
2023-02-01 21:10           ` Peter Zijlstra
2023-02-02  3:34             ` Waiman Long
2023-02-03 11:50               ` Will Deacon
2023-02-03 11:50                 ` Will Deacon
2023-02-03 15:13                 ` Waiman Long
2023-02-03 15:13                   ` Waiman Long
2023-02-03 15:26                   ` Peter Zijlstra
2023-02-03 15:26                     ` Peter Zijlstra
2023-02-03 15:35                     ` Waiman Long
2023-02-03 15:35                       ` Waiman Long
2023-02-02  8:34     ` Peter Zijlstra
2023-02-02  8:34       ` Peter Zijlstra
2023-02-02 16:06       ` Waiman Long
2023-02-02 16:06         ` Waiman Long
2023-02-02 19:42         ` Peter Zijlstra
2023-02-02 19:42           ` Peter Zijlstra
2023-02-02 20:46           ` Waiman Long
2023-02-02 20:46             ` Waiman Long
2023-02-02 20:48             ` Tejun Heo
2023-02-02 20:48               ` Tejun Heo
2023-02-02 20:53               ` Waiman Long
2023-02-02 20:53                 ` Waiman Long
2023-02-02 21:05                 ` Waiman Long
2023-02-02 21:50                   ` Tejun Heo
2023-02-02 21:50                     ` Tejun Heo
2023-02-03  0:54                     ` Waiman Long
2023-02-03  0:54                       ` Waiman Long
2023-02-03 16:31                     ` Will Deacon
2023-02-03 16:31                       ` Will Deacon
2023-02-01 10:23   ` Hillf Danton
2023-01-31 22:17 ` [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task Will Deacon
2023-01-31 22:17   ` Will Deacon
2023-02-01  2:22   ` Waiman Long
2023-02-01  9:15     ` Peter Zijlstra
2023-02-01  9:15       ` Peter Zijlstra
2023-02-01 15:03       ` Waiman Long
2023-02-01  9:27   ` Peter Zijlstra
2023-02-01  9:27     ` Peter Zijlstra
2023-02-03 17:55   ` Waiman Long
2023-02-03 17:55     ` Waiman Long
2023-02-06 20:21   ` Tejun Heo
2023-02-06 20:21     ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.