linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] cgroup/cpuset: Miscellaneous updates
@ 2023-03-06 20:08 Waiman Long
  2023-03-06 20:08 ` [PATCH 1/5] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset Waiman Long
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Waiman Long @ 2023-03-06 20:08 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan
  Cc: cgroups, linux-kernel, linux-kselftest, Will Deacon,
	Peter Zijlstra, Waiman Long

This patch series includes miscellaneous update to the cpuset and its
testing code.

Patch 2 is actually a follow-up of commit 3fb906e7fabb ("cgroup/cpuset:
Don't filter offline CPUs in cpuset_cpus_allowed() for top cpuset tasks").

Patches 3-4 are for handling corner cases when dealing with
task_cpu_possible_mask().

Waiman Long (5):
  cgroup/cpuset: Skip task update if hotplug doesn't affect current
    cpuset
  cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset
    are updated
  cgroup/cpuset: Find another usable CPU if none found in current cpuset
  cgroup/cpuset: Add CONFIG_DEBUG_CPUSETS config for cpuset testing
  cgroup/cpuset: Minor updates to test_cpuset_prs.sh

 init/Kconfig                                  |   5 +
 kernel/cgroup/cpuset.c                        | 155 +++++++++++++++++-
 .../selftests/cgroup/test_cpuset_prs.sh       |  25 +--
 3 files changed, 165 insertions(+), 20 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/5] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset
  2023-03-06 20:08 [PATCH 0/5] cgroup/cpuset: Miscellaneous updates Waiman Long
@ 2023-03-06 20:08 ` Waiman Long
  2023-03-14 16:50   ` Michal Koutný
  2023-03-06 20:08 ` [PATCH 2/5] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated Waiman Long
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2023-03-06 20:08 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan
  Cc: cgroups, linux-kernel, linux-kselftest, Will Deacon,
	Peter Zijlstra, Waiman Long

If a hotplug event doesn't affect the current cpuset, there is no point
to call hotplug_update_tasks() or hotplug_update_tasks_legacy(). So
just skip it.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 636f1c682ac0..a801abad3bac 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3508,6 +3508,8 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
 update_tasks:
 	cpus_updated = !cpumask_equal(&new_cpus, cs->effective_cpus);
 	mems_updated = !nodes_equal(new_mems, cs->effective_mems);
+	if (!cpus_updated && !mems_updated)
+		goto unlock;	/* Hotplug doesn't affect this cpuset */
 
 	if (mems_updated)
 		check_insane_mems_config(&new_mems);
@@ -3519,6 +3521,7 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
 		hotplug_update_tasks_legacy(cs, &new_cpus, &new_mems,
 					    cpus_updated, mems_updated);
 
+unlock:
 	percpu_up_write(&cpuset_rwsem);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/5] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated
  2023-03-06 20:08 [PATCH 0/5] cgroup/cpuset: Miscellaneous updates Waiman Long
  2023-03-06 20:08 ` [PATCH 1/5] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset Waiman Long
@ 2023-03-06 20:08 ` Waiman Long
  2023-03-14 17:34   ` Michal Koutný
  2023-03-06 20:08 ` [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset Waiman Long
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2023-03-06 20:08 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan
  Cc: cgroups, linux-kernel, linux-kselftest, Will Deacon,
	Peter Zijlstra, Waiman Long

Similar to commit 3fb906e7fabb ("group/cpuset: Don't filter offline
CPUs in cpuset_cpus_allowed() for top cpuset tasks"), the whole set of
possible CPUs including offline ones should be used for setting cpumasks
for tasks in the top cpuset when a cpuset partition is modified.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a801abad3bac..bbf57dcb2f68 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1209,7 +1209,8 @@ void rebuild_sched_domains(void)
  *
  * Iterate through each task of @cs updating its cpus_allowed to the
  * effective cpuset's.  As this function is called with cpuset_rwsem held,
- * cpuset membership stays stable.
+ * cpuset membership stays stable. For top_cpuset, task_cpu_possible_mask()
+ * is used instead of effective_cpus.
  */
 static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
 {
@@ -1219,15 +1220,18 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
 
 	css_task_iter_start(&cs->css, 0, &it);
 	while ((task = css_task_iter_next(&it))) {
-		/*
-		 * Percpu kthreads in top_cpuset are ignored
-		 */
-		if (top_cs && (task->flags & PF_KTHREAD) &&
-		    kthread_is_per_cpu(task))
-			continue;
+		const struct cpumask *possible_mask = task_cpu_possible_mask(task);
 
-		cpumask_and(new_cpus, cs->effective_cpus,
-			    task_cpu_possible_mask(task));
+		if (top_cs) {
+			/*
+			 * Percpu kthreads in top_cpuset are ignored
+			 */
+			if ((task->flags & PF_KTHREAD) && kthread_is_per_cpu(task))
+				continue;
+			cpumask_andnot(new_cpus, possible_mask, cs->subparts_cpus);
+		} else {
+			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
+		}
 		set_cpus_allowed_ptr(task, new_cpus);
 	}
 	css_task_iter_end(&it);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset
  2023-03-06 20:08 [PATCH 0/5] cgroup/cpuset: Miscellaneous updates Waiman Long
  2023-03-06 20:08 ` [PATCH 1/5] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset Waiman Long
  2023-03-06 20:08 ` [PATCH 2/5] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated Waiman Long
@ 2023-03-06 20:08 ` Waiman Long
  2023-03-14 18:17   ` Michal Koutný
  2023-03-06 20:08 ` [PATCH 4/5] cgroup/cpuset: Add CONFIG_DEBUG_CPUSETS config for cpuset testing Waiman Long
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2023-03-06 20:08 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan
  Cc: cgroups, linux-kernel, linux-kselftest, Will Deacon,
	Peter Zijlstra, Waiman Long

On a system with asymmetric CPUs, a restricted task is one that can run
only a selected subset of available CPUs.  When a CPU goes offline or
when "cpuset.cpus" is changed, it is possible that a restricted task
may not have any runnable CPUs left in the current cpuset even if there
is still some CPUs in effective_cpus. In this case, the restricted task
cannot be run at all.

There are several ways we may be able to handle this situation. Treating
it like empty effective_cpus is probably too disruptive and is unfair to
the normal tasks. So it is better to have some special handling for these
restricted tasks. One possibility is to move the restricted tasks up the
cpuset hierarchy, but it is tricky to do it right. Another solution is
to assign other usable CPUs to these tasks. This patch implements the
later alternative by finding one usable CPU by walking up the cpuset
hierarchy and printing an informational message to let the users know
that these restricted tasks are running in a cpuset with no usable CPU.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 56 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index bbf57dcb2f68..aa8225daf1d3 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1202,6 +1202,38 @@ void rebuild_sched_domains(void)
 	cpus_read_unlock();
 }
 
+/*
+ * Find a usable effective (online) CPU up the cpuset hierarchy and return it.
+ */
+static int find_usable_cpu(struct cpuset *cs, struct cpumask *new_cpus,
+			   const struct cpumask *possible_mask)
+{
+	struct cpuset *parent;
+	unsigned long flags;
+	int cpu;
+
+	/*
+	 * When offlining cpu, some effective_cpus may not be up to date.
+	 * So check cpu_online_mask to be sure.
+	 */
+	parent = parent_cs(cs);
+	while (parent &&
+	      (!cpumask_and(new_cpus, parent->effective_cpus, possible_mask) ||
+	       !cpumask_and(new_cpus, new_cpus, cpu_online_mask)))
+		parent = parent_cs(cs);
+
+	/* Fall back to all possible online cpus, if necessary */
+	if (!parent)
+		cpumask_and(new_cpus, possible_mask, cpu_online_mask);
+
+	/* cpumask_any_distribute() has to be called with preemption disabled */
+	local_irq_save(flags);
+	cpu = cpumask_any_distribute(new_cpus);
+	local_irq_restore(flags);
+
+	return cpu;
+}
+
 /**
  * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
  * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
@@ -1218,6 +1250,7 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
 	struct task_struct *task;
 	bool top_cs = cs == &top_cpuset;
 
+	percpu_rwsem_assert_held(&cpuset_rwsem);
 	css_task_iter_start(&cs->css, 0, &it);
 	while ((task = css_task_iter_next(&it))) {
 		const struct cpumask *possible_mask = task_cpu_possible_mask(task);
@@ -1232,7 +1265,28 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
 		} else {
 			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
 		}
-		set_cpus_allowed_ptr(task, new_cpus);
+		/*
+		 * On systems with assymetric CPUs, it is possible that
+		 * cpumask will become empty or set_cpus_allowed_ptr() will
+		 * return an error even if we still have CPUs in
+		 * effective_cpus. In this case, we find a usable CPU walking
+		 * up the cpuset hierarchy and use that for this particular
+		 * task with an informational message about the change in the
+		 * hope that the users will adjust "cpuset.cpus" accordingly.
+		 */
+		if (cpumask_empty(new_cpus) ||
+		    set_cpus_allowed_ptr(task, new_cpus)) {
+			char name[80];
+			int cpu;
+
+			cpu = find_usable_cpu(cs, new_cpus, possible_mask);
+			cpumask_clear(new_cpus);
+			cpumask_set_cpu(cpu, new_cpus);
+			WARN_ON_ONCE(set_cpus_allowed_ptr(task, new_cpus));
+			cgroup_name(cs->css.cgroup, name, sizeof(name));
+			pr_info("cpuset: Restricted task %s(%d) in cpuset %s is forced to run on outside CPU %d\n",
+				task->comm, task->pid, name, cpu);
+		}
 	}
 	css_task_iter_end(&it);
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/5] cgroup/cpuset: Add CONFIG_DEBUG_CPUSETS config for cpuset testing
  2023-03-06 20:08 [PATCH 0/5] cgroup/cpuset: Miscellaneous updates Waiman Long
                   ` (2 preceding siblings ...)
  2023-03-06 20:08 ` [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset Waiman Long
@ 2023-03-06 20:08 ` Waiman Long
  2023-03-06 20:08 ` [PATCH 5/5] cgroup/cpuset: Minor updates to test_cpuset_prs.sh Waiman Long
  2023-03-15 16:24 ` [PATCH 0/5] cgroup/cpuset: Miscellaneous updates Will Deacon
  5 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2023-03-06 20:08 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan
  Cc: cgroups, linux-kernel, linux-kselftest, Will Deacon,
	Peter Zijlstra, Waiman Long

Since commit 431c69fac05b ("cpuset: Honour task_cpu_possible_mask()
in guarantee_online_cpus()"), task_cpu_possible_mask() is used within
the cpuset code. However, it is hard to find a arm64 system that can
actually makes task_cpu_possible_mask() return different cpu mask. As a
result, it is hard to exercise the correctness of the code that handle
exception cases due to task_cpu_possible_mask().

To help in exercising those code paths, we need a way to force
task_cpu_possible_mask() to return a different cpu mask. This patch adds
a new CONFIG_DEBUG_CPUSETS config option to enable some debug code to do
just that. The idea is to create a debugfs file "debug_cpu_possible_mask"
that holds the cpumask to be returned by task_cpu_possible_mask() when
a task with name started with the special prefix "cstest" is used as
the input argument. Userspace testing code is then able to exercise
the different code that is affected by task_cpu_possible_mask().

Signed-off-by: Waiman Long <longman@redhat.com>
---
 init/Kconfig           |  5 +++
 kernel/cgroup/cpuset.c | 76 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/init/Kconfig b/init/Kconfig
index 18f0bf50c468..2abaa830aff0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1140,6 +1140,11 @@ config PROC_PID_CPUSET
 	depends on CPUSETS
 	default y
 
+config DEBUG_CPUSETS
+	bool "Enable cpuset debugging"
+	depends on CPUSETS && DEBUG_FS
+	default n
+
 config CGROUP_DEVICE
 	bool "Device controller"
 	help
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index aa8225daf1d3..45051ebb6606 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -220,6 +220,29 @@ static inline bool is_prs_invalid(int prs_state)
 	return prs_state < 0;
 }
 
+#ifdef CONFIG_DEBUG_CPUSETS
+static struct cpumask debug_cpu_possible_mask;
+
+/*
+ * Debugging code for testing code involving task_cpu_possible_mask()
+ */
+static inline const struct cpumask *
+__task_cpu_possible_mask(struct task_struct *p)
+{
+	const struct cpumask *mask = task_cpu_possible_mask(p);
+
+	if (mask != cpu_possible_mask)
+		return mask;
+	else if (!strncmp(p->comm, "cstest", 6))
+		return &debug_cpu_possible_mask;
+	else
+		return cpu_possible_mask;
+}
+
+#undef  task_cpu_possible_mask
+#define task_cpu_possible_mask(p)	__task_cpu_possible_mask(p)
+#endif /* CONFIG_DEBUG_CPUSETS */
+
 /*
  * Temporary cpumasks for working with partitions that are passed among
  * functions to avoid memory allocation in inner functions.
@@ -4139,3 +4162,56 @@ void cpuset_task_status_allowed(struct seq_file *m, struct task_struct *task)
 	seq_printf(m, "Mems_allowed_list:\t%*pbl\n",
 		   nodemask_pr_args(&task->mems_allowed));
 }
+
+#ifdef CONFIG_DEBUG_CPUSETS
+#include <linux/debugfs.h>
+
+/*
+ * Add a debugfs file "debug_cpu_possible_mask" that allows user to set
+ * a debug mask for testing.
+ */
+static ssize_t read_debug_mask(struct file *file, char __user *user_buf,
+			       size_t count, loff_t *ppos)
+{
+	char buf[80];
+	int len;
+
+	len = snprintf(buf, sizeof(buf) - 1, "%*pbl\n",
+		       cpumask_pr_args(&debug_cpu_possible_mask));
+	return simple_read_from_buffer(user_buf, count, ppos, buf, len);
+}
+
+static ssize_t write_debug_mask(struct file *file, const char __user *user_buf,
+				size_t count, loff_t *ppos)
+{
+	unsigned int len;
+	char buf[80];
+	int retval = 0;
+
+	len = min(count, sizeof(buf) - 1);
+	if (copy_from_user(buf, user_buf, len))
+		return -EFAULT;
+
+	if (!*buf)
+		cpumask_clear(&debug_cpu_possible_mask);
+	else
+		retval = cpulist_parse(buf, &debug_cpu_possible_mask);
+
+	return (retval < 0) ? retval : count;
+}
+
+static const struct file_operations fops_debug_mask = {
+	.read = read_debug_mask,
+	.write = write_debug_mask,
+	.llseek = default_llseek,
+};
+
+static int __init create_debug_cpu_possible_mask(void)
+{
+	cpumask_copy(&debug_cpu_possible_mask, cpu_possible_mask);
+	debugfs_create_file("debug_cpu_possible_mask", 0600, NULL, NULL,
+			    &fops_debug_mask);
+	return 0;
+}
+late_initcall(create_debug_cpu_possible_mask);
+#endif /* CONFIG_DEBUG_CPUSETS */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/5] cgroup/cpuset: Minor updates to test_cpuset_prs.sh
  2023-03-06 20:08 [PATCH 0/5] cgroup/cpuset: Miscellaneous updates Waiman Long
                   ` (3 preceding siblings ...)
  2023-03-06 20:08 ` [PATCH 4/5] cgroup/cpuset: Add CONFIG_DEBUG_CPUSETS config for cpuset testing Waiman Long
@ 2023-03-06 20:08 ` Waiman Long
  2023-03-07 16:16   ` Kamalesh Babulal
  2023-03-15 16:24 ` [PATCH 0/5] cgroup/cpuset: Miscellaneous updates Will Deacon
  5 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2023-03-06 20:08 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan
  Cc: cgroups, linux-kernel, linux-kselftest, Will Deacon,
	Peter Zijlstra, Waiman Long

This patch makes the following minor updates to the cpuset partition
testing script test_cpuset_prs.sh.

 - Remove online_cpus function call as it will be called anyway on exit
   in cleanup.
 - Make the enabling of sched/verbose debugfs flag conditional on the
   "-v" verbose option and set DELAY_FACTOR to 2 in this case as cpuset
   partition operations are likely to be slowed down by enabling that.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 .../selftests/cgroup/test_cpuset_prs.sh       | 25 +++++++++++--------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index 75c100de90ff..2b5215cc599f 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -15,13 +15,6 @@ skip_test() {
 
 [[ $(id -u) -eq 0 ]] || skip_test "Test must be run as root!"
 
-# Set sched verbose flag, if available
-if [[ -d /sys/kernel/debug/sched ]]
-then
-	# Used to restore the original setting during cleanup
-	SCHED_DEBUG=$(cat /sys/kernel/debug/sched/verbose)
-	echo Y > /sys/kernel/debug/sched/verbose
-fi
 
 # Get wait_inotify location
 WAIT_INOTIFY=$(cd $(dirname $0); pwd)/wait_inotify
@@ -37,10 +30,14 @@ CPUS=$(lscpu | grep "^CPU(s):" | sed -e "s/.*:[[:space:]]*//")
 PROG=$1
 VERBOSE=
 DELAY_FACTOR=1
+SCHED_DEBUG=
 while [[ "$1" = -* ]]
 do
 	case "$1" in
 		-v) VERBOSE=1
+		    # Enable sched/verbose can slow thing down
+		    [[ $DELAY_FACTOR -eq 1 ]] &&
+			DELAY_FACTOR=2
 		    break
 		    ;;
 		-d) DELAY_FACTOR=$2
@@ -54,6 +51,14 @@ do
 	shift
 done
 
+# Set sched verbose flag if available when "-v" option is specified
+if [[ -n "$VERBOSE" && -d /sys/kernel/debug/sched ]]
+then
+	# Used to restore the original setting during cleanup
+	SCHED_DEBUG=$(cat /sys/kernel/debug/sched/verbose)
+	echo Y > /sys/kernel/debug/sched/verbose
+fi
+
 cd $CGROUP2
 echo +cpuset > cgroup.subtree_control
 [[ -d test ]] || mkdir test
@@ -65,7 +70,8 @@ cleanup()
 	rmdir A1/A2/A3 A1/A2 A1 B1 > /dev/null 2>&1
 	cd ..
 	rmdir test > /dev/null 2>&1
-	echo "$SCHED_DEBUG" > /sys/kernel/debug/sched/verbose
+	[[ -n "$SCHED_DEBUG" ]] &&
+		echo "$SCHED_DEBUG" > /sys/kernel/debug/sched/verbose
 }
 
 # Pause in ms
@@ -571,7 +577,6 @@ run_state_test()
 			echo "Test $TEST[$I] failed result check!"
 			eval echo \"\${$TEST[$I]}\"
 			dump_states
-			online_cpus
 			exit 1
 		}
 
@@ -582,7 +587,6 @@ run_state_test()
 				eval echo \"\${$TEST[$I]}\"
 				echo
 				dump_states
-				online_cpus
 				exit 1
 			}
 		}
@@ -594,7 +598,6 @@ run_state_test()
 				eval echo \"\${$TEST[$I]}\"
 				echo
 				dump_states
-				online_cpus
 				exit 1
 			}
 		}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] cgroup/cpuset: Minor updates to test_cpuset_prs.sh
  2023-03-06 20:08 ` [PATCH 5/5] cgroup/cpuset: Minor updates to test_cpuset_prs.sh Waiman Long
@ 2023-03-07 16:16   ` Kamalesh Babulal
  0 siblings, 0 replies; 23+ messages in thread
From: Kamalesh Babulal @ 2023-03-07 16:16 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan
  Cc: cgroups, linux-kernel, linux-kselftest, Will Deacon, Peter Zijlstra


On 3/7/23 01:38, Waiman Long wrote:
> This patch makes the following minor updates to the cpuset partition
> testing script test_cpuset_prs.sh.
> 
>  - Remove online_cpus function call as it will be called anyway on exit
>    in cleanup.
>  - Make the enabling of sched/verbose debugfs flag conditional on the
>    "-v" verbose option and set DELAY_FACTOR to 2 in this case as cpuset
>    partition operations are likely to be slowed down by enabling that.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>

Reviewed-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/5] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset
  2023-03-06 20:08 ` [PATCH 1/5] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset Waiman Long
@ 2023-03-14 16:50   ` Michal Koutný
  2023-03-14 18:20     ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Koutný @ 2023-03-14 16:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Will Deacon, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 611 bytes --]

On Mon, Mar 06, 2023 at 03:08:45PM -0500, Waiman Long <longman@redhat.com> wrote:
> If a hotplug event doesn't affect the current cpuset, there is no point
> to call hotplug_update_tasks() or hotplug_update_tasks_legacy(). So
> just skip it.

This skips "insane" modification of cs->cpus_allowed in
hotplug_update_tasks_legacy() but assuming cs->cpus_allowed is kept in
sync with cs->effective_cpus on v1, it is OK to skip the update based
only on effective_cpus check.

Hence,

>  kernel/cgroup/cpuset.c | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Michal Koutný <mkoutny@suse.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated
  2023-03-06 20:08 ` [PATCH 2/5] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated Waiman Long
@ 2023-03-14 17:34   ` Michal Koutný
  2023-03-14 19:02     ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Koutný @ 2023-03-14 17:34 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Will Deacon, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]

Hello Waiman.

On Mon, Mar 06, 2023 at 03:08:46PM -0500, Waiman Long <longman@redhat.com> wrote:
> -		/*
> -		 * Percpu kthreads in top_cpuset are ignored
> -		 */
> -		if (top_cs && (task->flags & PF_KTHREAD) &&
> -		    kthread_is_per_cpu(task))
> -			continue;
> +		const struct cpumask *possible_mask = task_cpu_possible_mask(task);
>  
> -		cpumask_and(new_cpus, cs->effective_cpus,
> -			    task_cpu_possible_mask(task));
> +		if (top_cs) {
> +			/*
> +			 * Percpu kthreads in top_cpuset are ignored
> +			 */
> +			if ((task->flags & PF_KTHREAD) && kthread_is_per_cpu(task))
> +				continue;
> +			cpumask_andnot(new_cpus, possible_mask, cs->subparts_cpus);
> +		} else {
> +			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
> +		}

I'm wrapping my head around this slightly.
1) I'd suggest swapping args in of cpumask_and() to have possible_mask
   consistently first.
2) Then I'm wondering whether two branches are truly different when
   effective_cpus := cpus_allowed - subparts_cpus
   top_cpuset.cpus_allowed == possible_mask        (1)

IOW, can you see a difference in what affinities are set to eligible
top_cpuset tasks before and after this patch upon CPU hotplug?
(Hm, (1) holds only in v2. So is this a fix for v1 only?)

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset
  2023-03-06 20:08 ` [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset Waiman Long
@ 2023-03-14 18:17   ` Michal Koutný
  2023-03-14 20:22     ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Koutný @ 2023-03-14 18:17 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Will Deacon, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 3581 bytes --]

Hello.

On Mon, Mar 06, 2023 at 03:08:47PM -0500, Waiman Long <longman@redhat.com> wrote:
> On a system with asymmetric CPUs, a restricted task is one that can run
> only a selected subset of available CPUs.  When a CPU goes offline or
> when "cpuset.cpus" is changed, it is possible that a restricted task
> may not have any runnable CPUs left in the current cpuset even if there
> is still some CPUs in effective_cpus. In this case, the restricted task
> cannot be run at all.
> 
> There are several ways we may be able to handle this situation. Treating
> it like empty effective_cpus is probably too disruptive and is unfair to
> the normal tasks. So it is better to have some special handling for these
> restricted tasks. One possibility is to move the restricted tasks up the
> cpuset hierarchy, but it is tricky to do it right. Another solution is
> to assign other usable CPUs to these tasks. This patch implements the
> later alternative by finding one usable CPU by walking up the cpuset
> hierarchy and printing an informational message to let the users know
> that these restricted tasks are running in a cpuset with no usable CPU.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cgroup/cpuset.c | 56 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index bbf57dcb2f68..aa8225daf1d3 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1202,6 +1202,38 @@ void rebuild_sched_domains(void)
>  	cpus_read_unlock();
>  }
>  
> [...]
>  /**
>   * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
>   * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
> @@ -1218,6 +1250,7 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
>  	struct task_struct *task;
>  	bool top_cs = cs == &top_cpuset;
>  
> +	percpu_rwsem_assert_held(&cpuset_rwsem);
>  	css_task_iter_start(&cs->css, 0, &it);
>  	while ((task = css_task_iter_next(&it))) {
>  		const struct cpumask *possible_mask = task_cpu_possible_mask(task);
> @@ -1232,7 +1265,28 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
>  		} else {
>  			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
>  		}
> -		set_cpus_allowed_ptr(task, new_cpus);
> +		/*
> +		 * On systems with assymetric CPUs, it is possible that
> +		 * cpumask will become empty or set_cpus_allowed_ptr() will
> +		 * return an error even if we still have CPUs in
> +		 * effective_cpus. In this case, we find a usable CPU walking
> +		 * up the cpuset hierarchy and use that for this particular
> +		 * task with an informational message about the change in the
> +		 * hope that the users will adjust "cpuset.cpus" accordingly.
> +		 */
> +		if (cpumask_empty(new_cpus) ||
> +		    set_cpus_allowed_ptr(task, new_cpus)) {

IIUC, cpumask_empty(new_cpus) here implies
cpumask_empty(cs->effective_cpus) but that shouldn't happen (cs should
inherit non-empty mask from an ancestor). Do I miss/forget anything?

This thus covers the case when p->user_cpus_ptr is incompatible with
hotplug or cpuset.cpus allowance and a different affinity must be
chosen. But doesn't that mean that the task would run _out_ of
cs->effective_cpus?
I guess that's unavoidable on asymmetric CPU archs but not no SMPs.
Shouldn't the solution distinguish between the two? (I.e. never run out
of effective_cpus on SMP.)

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/5] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset
  2023-03-14 16:50   ` Michal Koutný
@ 2023-03-14 18:20     ` Waiman Long
  0 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2023-03-14 18:20 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Will Deacon, Peter Zijlstra


On 3/14/23 12:50, Michal Koutný wrote:
> On Mon, Mar 06, 2023 at 03:08:45PM -0500, Waiman Long <longman@redhat.com> wrote:
>> If a hotplug event doesn't affect the current cpuset, there is no point
>> to call hotplug_update_tasks() or hotplug_update_tasks_legacy(). So
>> just skip it.
> This skips "insane" modification of cs->cpus_allowed in
> hotplug_update_tasks_legacy() but assuming cs->cpus_allowed is kept in
> sync with cs->effective_cpus on v1, it is OK to skip the update based
> only on effective_cpus check.

Yes, effective_cpus is equivalent to cpus_allowed in v1 unless you mount 
the cpuset with the cpuset_v2_mode flag which will behave more like v2 
where effective_cpus is still the key.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated
  2023-03-14 17:34   ` Michal Koutný
@ 2023-03-14 19:02     ` Waiman Long
  2023-03-15 10:06       ` Michal Koutný
  0 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2023-03-14 19:02 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Will Deacon, Peter Zijlstra

On 3/14/23 13:34, Michal Koutný wrote:
> Hello Waiman.
>
> On Mon, Mar 06, 2023 at 03:08:46PM -0500, Waiman Long <longman@redhat.com> wrote:
>> -		/*
>> -		 * Percpu kthreads in top_cpuset are ignored
>> -		 */
>> -		if (top_cs && (task->flags & PF_KTHREAD) &&
>> -		    kthread_is_per_cpu(task))
>> -			continue;
>> +		const struct cpumask *possible_mask = task_cpu_possible_mask(task);
>>   
>> -		cpumask_and(new_cpus, cs->effective_cpus,
>> -			    task_cpu_possible_mask(task));
>> +		if (top_cs) {
>> +			/*
>> +			 * Percpu kthreads in top_cpuset are ignored
>> +			 */
>> +			if ((task->flags & PF_KTHREAD) && kthread_is_per_cpu(task))
>> +				continue;
>> +			cpumask_andnot(new_cpus, possible_mask, cs->subparts_cpus);
>> +		} else {
>> +			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
>> +		}
> I'm wrapping my head around this slightly.
> 1) I'd suggest swapping args in of cpumask_and() to have possible_mask
>     consistently first.
I don't quite understand what you meant by "swapping args". It is 
effective new_cpus = cs->effective_cpus ∩ possible_mask. What is the 
point of swapping cs->effective_cpus and possible_mask.
> 2) Then I'm wondering whether two branches are truly different when
>     effective_cpus := cpus_allowed - subparts_cpus
>     top_cpuset.cpus_allowed == possible_mask        (1)
effective_cpus may not be equal "cpus_allowed - subparts_cpus" if some 
of the CPUs are offline as effective_cpus contains only online CPUs. 
subparts_cpu can include offline cpus too. That is why I choose that 
expression. I will add a comment to clarify that.
>
> IOW, can you see a difference in what affinities are set to eligible
> top_cpuset tasks before and after this patch upon CPU hotplug?
> (Hm, (1) holds only in v2. So is this a fix for v1 only?)

This is due to the fact that cpu hotplug code currently doesn't update 
the cpu affinity of tasks in the top cpuset. Tasks not in the top cpuset 
can rely on the hotplug code to update the cpu affinity appropriately. 
For the tasks in the top cpuset, we have to make sure that all the 
offline CPUs are included.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset
  2023-03-14 18:17   ` Michal Koutný
@ 2023-03-14 20:22     ` Waiman Long
  2023-03-17 12:27       ` Michal Koutný
  0 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2023-03-14 20:22 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Will Deacon, Peter Zijlstra

On 3/14/23 14:17, Michal Koutný wrote:
> Hello.
>
> On Mon, Mar 06, 2023 at 03:08:47PM -0500, Waiman Long <longman@redhat.com> wrote:
>> On a system with asymmetric CPUs, a restricted task is one that can run
>> only a selected subset of available CPUs.  When a CPU goes offline or
>> when "cpuset.cpus" is changed, it is possible that a restricted task
>> may not have any runnable CPUs left in the current cpuset even if there
>> is still some CPUs in effective_cpus. In this case, the restricted task
>> cannot be run at all.
>>
>> There are several ways we may be able to handle this situation. Treating
>> it like empty effective_cpus is probably too disruptive and is unfair to
>> the normal tasks. So it is better to have some special handling for these
>> restricted tasks. One possibility is to move the restricted tasks up the
>> cpuset hierarchy, but it is tricky to do it right. Another solution is
>> to assign other usable CPUs to these tasks. This patch implements the
>> later alternative by finding one usable CPU by walking up the cpuset
>> hierarchy and printing an informational message to let the users know
>> that these restricted tasks are running in a cpuset with no usable CPU.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/cgroup/cpuset.c | 56 +++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 55 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index bbf57dcb2f68..aa8225daf1d3 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -1202,6 +1202,38 @@ void rebuild_sched_domains(void)
>>   	cpus_read_unlock();
>>   }
>>   
>> [...]
>>   /**
>>    * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
>>    * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
>> @@ -1218,6 +1250,7 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
>>   	struct task_struct *task;
>>   	bool top_cs = cs == &top_cpuset;
>>   
>> +	percpu_rwsem_assert_held(&cpuset_rwsem);
>>   	css_task_iter_start(&cs->css, 0, &it);
>>   	while ((task = css_task_iter_next(&it))) {
>>   		const struct cpumask *possible_mask = task_cpu_possible_mask(task);
>> @@ -1232,7 +1265,28 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
>>   		} else {
>>   			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
>>   		}
>> -		set_cpus_allowed_ptr(task, new_cpus);
>> +		/*
>> +		 * On systems with assymetric CPUs, it is possible that
>> +		 * cpumask will become empty or set_cpus_allowed_ptr() will
>> +		 * return an error even if we still have CPUs in
>> +		 * effective_cpus. In this case, we find a usable CPU walking
>> +		 * up the cpuset hierarchy and use that for this particular
>> +		 * task with an informational message about the change in the
>> +		 * hope that the users will adjust "cpuset.cpus" accordingly.
>> +		 */
>> +		if (cpumask_empty(new_cpus) ||
>> +		    set_cpus_allowed_ptr(task, new_cpus)) {
> IIUC, cpumask_empty(new_cpus) here implies
> cpumask_empty(cs->effective_cpus) but that shouldn't happen (cs should
> inherit non-empty mask from an ancestor). Do I miss/forget anything?
>
> This thus covers the case when p->user_cpus_ptr is incompatible with
> hotplug or cpuset.cpus allowance and a different affinity must be
> chosen. But doesn't that mean that the task would run _out_ of
> cs->effective_cpus?
> I guess that's unavoidable on asymmetric CPU archs but not no SMPs.
> Shouldn't the solution distinguish between the two? (I.e. never run out
> of effective_cpus on SMP.)

Some arm64 systems can have asymmetric CPUs where certain tasks are only 
runnable on a selected subset of CPUs.  This information is not captured 
in the cpuset. As a result, task_cpu_possible_mask() may return a mask 
that have no overlap with effective_cpus causing new_cpus to become 
empty. It takes me a while to understand the implication of that. I will 
elaborate this point a bit more in the patch.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated
  2023-03-14 19:02     ` Waiman Long
@ 2023-03-15 10:06       ` Michal Koutný
  2023-03-15 14:39         ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Koutný @ 2023-03-15 10:06 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Will Deacon, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 2594 bytes --]

On Tue, Mar 14, 2023 at 03:02:53PM -0400, Waiman Long <longman@redhat.com> wrote:
> > > +			cpumask_andnot(new_cpus, possible_mask, cs->subparts_cpus);
> > > +		} else {
> > > +			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
> > > +		}
> > I'm wrapping my head around this slightly.
> > 1) I'd suggest swapping args in of cpumask_and() to have possible_mask
> >     consistently first.
> I don't quite understand what you meant by "swapping args". It is effective
> new_cpus = cs->effective_cpus ∩ possible_mask. What is the point of swapping
> cs->effective_cpus and possible_mask.

No effect except better readability (possible_mask comes first in the
other branch above too, left hand arg as the "base" that's modified).


> > 2) Then I'm wondering whether two branches are truly different when
> >     effective_cpus := cpus_allowed - subparts_cpus
> >     top_cpuset.cpus_allowed == possible_mask        (1)
> effective_cpus may not be equal "cpus_allowed - subparts_cpus" if some of
> the CPUs are offline as effective_cpus contains only online CPUs.
> subparts_cpu can include offline cpus too. That is why I choose that
> expression. I will add a comment to clarify that.

I see now that it returns offlined cpus to top cpuset's tasks.

> > 
> > IOW, can you see a difference in what affinities are set to eligible
> > top_cpuset tasks before and after this patch upon CPU hotplug?
> > (Hm, (1) holds only in v2. So is this a fix for v1 only?)
> 
> This is due to the fact that cpu hotplug code currently doesn't update the
> cpu affinity of tasks in the top cpuset. Tasks not in the top cpuset can
> rely on the hotplug code to update the cpu affinity appropriately.

Oh, I mistook this for hotplug changing behavior but it's actually for
updating top_cpuset when its children becomes a partition root.

	IIUC, top cpuset + hotplug has been treated specially because
	hotplug must have taken care of affinity regardless of cpuset.
	v1 allowed modification of top cpuset's mask (not sure if that
	worked), v2 won't allow direct top cpuset's mask modificiation
	but indirectly via partition root children.

So this is a continuation for 3fb906e7fabb ("cgroup/cpuset: Don't filter
offline CPUs in cpuset_cpus_allowed() for top cpuset tasks") to ensure
hotplug offline/online cycle won't overwrite top_cpuset tasks'
affinities (when partition change during offlined period).
This looks correct in this regard then.
(I wish it were simpler but that's for a different/broader top
cpuset+hotplug approach.)

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated
  2023-03-15 10:06       ` Michal Koutný
@ 2023-03-15 14:39         ` Waiman Long
  0 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2023-03-15 14:39 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Will Deacon, Peter Zijlstra


On 3/15/23 06:06, Michal Koutný wrote:
> On Tue, Mar 14, 2023 at 03:02:53PM -0400, Waiman Long <longman@redhat.com> wrote:
>>>> +			cpumask_andnot(new_cpus, possible_mask, cs->subparts_cpus);
>>>> +		} else {
>>>> +			cpumask_and(new_cpus, cs->effective_cpus, possible_mask);
>>>> +		}
>>> I'm wrapping my head around this slightly.
>>> 1) I'd suggest swapping args in of cpumask_and() to have possible_mask
>>>      consistently first.
>> I don't quite understand what you meant by "swapping args". It is effective
>> new_cpus = cs->effective_cpus ∩ possible_mask. What is the point of swapping
>> cs->effective_cpus and possible_mask.
> No effect except better readability (possible_mask comes first in the
> other branch above too, left hand arg as the "base" that's modified).
OK, I got it now. I will swap it as suggested.
>
>>> 2) Then I'm wondering whether two branches are truly different when
>>>      effective_cpus := cpus_allowed - subparts_cpus
>>>      top_cpuset.cpus_allowed == possible_mask        (1)
>> effective_cpus may not be equal "cpus_allowed - subparts_cpus" if some of
>> the CPUs are offline as effective_cpus contains only online CPUs.
>> subparts_cpu can include offline cpus too. That is why I choose that
>> expression. I will add a comment to clarify that.
> I see now that it returns offlined cpus to top cpuset's tasks.
>
>>> IOW, can you see a difference in what affinities are set to eligible
>>> top_cpuset tasks before and after this patch upon CPU hotplug?
>>> (Hm, (1) holds only in v2. So is this a fix for v1 only?)
>> This is due to the fact that cpu hotplug code currently doesn't update the
>> cpu affinity of tasks in the top cpuset. Tasks not in the top cpuset can
>> rely on the hotplug code to update the cpu affinity appropriately.
> Oh, I mistook this for hotplug changing behavior but it's actually for
> updating top_cpuset when its children becomes a partition root.
>
> 	IIUC, top cpuset + hotplug has been treated specially because
> 	hotplug must have taken care of affinity regardless of cpuset.
> 	v1 allowed modification of top cpuset's mask (not sure if that
> 	worked), v2 won't allow direct top cpuset's mask modificiation
> 	but indirectly via partition root children.
>
> So this is a continuation for 3fb906e7fabb ("cgroup/cpuset: Don't filter
> offline CPUs in cpuset_cpus_allowed() for top cpuset tasks") to ensure
> hotplug offline/online cycle won't overwrite top_cpuset tasks'
> affinities (when partition change during offlined period).
> This looks correct in this regard then.
> (I wish it were simpler but that's for a different/broader top
> cpuset+hotplug approach.)

You can't change the content of "cpuset.cpus" in the top cpuset (both v1 
& v2). I believe the CPU hotplug does not attempt to update the cpumask 
of tasks in the top cpuset mainly due to potential locking issue as 
hotplug is triggered by hardware event. Partition change, however, is a 
userspace event. So there shouldn't be any locking implication other 
than the fact per-cpu kthreads should not have their cpumasks changed.

To be consistent with commit 3fb906e7fabb ("cgroup/cpuset: Don't filter 
offline CPUs in cpuset_cpus_allowed() for top cpuset tasks"), similar 
logic needs to be applied in the later case.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5] cgroup/cpuset: Miscellaneous updates
  2023-03-06 20:08 [PATCH 0/5] cgroup/cpuset: Miscellaneous updates Waiman Long
                   ` (4 preceding siblings ...)
  2023-03-06 20:08 ` [PATCH 5/5] cgroup/cpuset: Minor updates to test_cpuset_prs.sh Waiman Long
@ 2023-03-15 16:24 ` Will Deacon
  2023-03-15 16:59   ` Waiman Long
  5 siblings, 1 reply; 23+ messages in thread
From: Will Deacon @ 2023-03-15 16:24 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Peter Zijlstra

Hi Waiman,

On Mon, Mar 06, 2023 at 03:08:44PM -0500, Waiman Long wrote:
> This patch series includes miscellaneous update to the cpuset and its
> testing code.
> 
> Patch 2 is actually a follow-up of commit 3fb906e7fabb ("cgroup/cpuset:
> Don't filter offline CPUs in cpuset_cpus_allowed() for top cpuset tasks").
> 
> Patches 3-4 are for handling corner cases when dealing with
> task_cpu_possible_mask().

Thanks for cc'ing me on these. I ran my arm64 asymmetric tests and, fwiw,
I get the same results as vanilla -rc2, so that's good.

One behaviour that persists (and which I thought might be addressed by this
series) is the following. Imagine a 4-CPU system with CPUs 0-1 being 64-bit
only. If I configure a parent cpuset with 'cpuset.cpus' of "0-2" and a
child cpuset with 'cpuset.cpus' of "0-1", then attaching a 32-bit task
to the child cpuset will result in an affinity mask of 4. If I then change
'cpuset.cpus' of the parent cpuset to "0-1,3", the affinity mask of the
task remains at '4' whereas it might be nice to update it to '8', in-line
with the new affinity mask of the parent cpuset.

Anyway, I'm not complaining (this is certainly _not_ a regression), but
I thought I'd highlight it in case you were aiming to address this with
your changes.

Cheers,

Will

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5] cgroup/cpuset: Miscellaneous updates
  2023-03-15 16:24 ` [PATCH 0/5] cgroup/cpuset: Miscellaneous updates Will Deacon
@ 2023-03-15 16:59   ` Waiman Long
  0 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2023-03-15 16:59 UTC (permalink / raw)
  To: Will Deacon
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Peter Zijlstra


On 3/15/23 12:24, Will Deacon wrote:
> Hi Waiman,
>
> On Mon, Mar 06, 2023 at 03:08:44PM -0500, Waiman Long wrote:
>> This patch series includes miscellaneous update to the cpuset and its
>> testing code.
>>
>> Patch 2 is actually a follow-up of commit 3fb906e7fabb ("cgroup/cpuset:
>> Don't filter offline CPUs in cpuset_cpus_allowed() for top cpuset tasks").
>>
>> Patches 3-4 are for handling corner cases when dealing with
>> task_cpu_possible_mask().
> Thanks for cc'ing me on these. I ran my arm64 asymmetric tests and, fwiw,
> I get the same results as vanilla -rc2, so that's good.
>
> One behaviour that persists (and which I thought might be addressed by this
> series) is the following. Imagine a 4-CPU system with CPUs 0-1 being 64-bit
> only. If I configure a parent cpuset with 'cpuset.cpus' of "0-2" and a
> child cpuset with 'cpuset.cpus' of "0-1", then attaching a 32-bit task
> to the child cpuset will result in an affinity mask of 4. If I then change
> 'cpuset.cpus' of the parent cpuset to "0-1,3", the affinity mask of the
> task remains at '4' whereas it might be nice to update it to '8', in-line
> with the new affinity mask of the parent cpuset.
>
> Anyway, I'm not complaining (this is certainly _not_ a regression), but
> I thought I'd highlight it in case you were aiming to address this with
> your changes.

I believe it is because changes in parent cpuset only won't cause the 
tasks in the child cpuset to be re-evaluated unless it causes a change 
in the effective_cpus of the child cpuset. This is the case here. We 
currently don't track how many tasks in the child cpusets are using 
parent's cpumask due to lacking runnable CPUs in the child cpuset. We 
can only fix this if we track those special tasks. It can be fixable, 
but I don't know if it is a problem that is worth fixing.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset
  2023-03-14 20:22     ` Waiman Long
@ 2023-03-17 12:27       ` Michal Koutný
  2023-03-17 14:59         ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Koutný @ 2023-03-17 12:27 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Will Deacon, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 965 bytes --]

On Tue, Mar 14, 2023 at 04:22:06PM -0400, Waiman Long <longman@redhat.com> wrote:
> Some arm64 systems can have asymmetric CPUs where certain tasks are only
> runnable on a selected subset of CPUs.

Ah, I'm catching up.

> This information is not captured in the cpuset. As a result,
> task_cpu_possible_mask() may return a mask that have no overlap with
> effective_cpus causing new_cpus to become empty.

I can see that historically, there was an approach of terminating
unaccomodable tasks:
   94f9c00f6460 ("arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores") 
the removal of killing had been made possible with
   df950811f4a8 ("arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system").

That gives two other alternatives to affinity modification:
2) kill such tasks (not unlike OOM upon memory.max reduction),
3) reject cpuset reduction (violates cgroup v2 delegation).

What do you think about 2)?

Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset
  2023-03-17 12:27       ` Michal Koutný
@ 2023-03-17 14:59         ` Waiman Long
  2023-03-24 14:32           ` Will Deacon
  0 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2023-03-17 14:59 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Will Deacon, Peter Zijlstra

On 3/17/23 08:27, Michal Koutný wrote:
> On Tue, Mar 14, 2023 at 04:22:06PM -0400, Waiman Long <longman@redhat.com> wrote:
>> Some arm64 systems can have asymmetric CPUs where certain tasks are only
>> runnable on a selected subset of CPUs.
> Ah, I'm catching up.
>
>> This information is not captured in the cpuset. As a result,
>> task_cpu_possible_mask() may return a mask that have no overlap with
>> effective_cpus causing new_cpus to become empty.
> I can see that historically, there was an approach of terminating
> unaccomodable tasks:
>     94f9c00f6460 ("arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores")
> the removal of killing had been made possible with
>     df950811f4a8 ("arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system").
>
> That gives two other alternatives to affinity modification:
> 2) kill such tasks (not unlike OOM upon memory.max reduction),
> 3) reject cpuset reduction (violates cgroup v2 delegation).
>
> What do you think about 2)?

Yes, killing it is one possible solution.

(3) doesn't work if the affinity change is due to hot cpu removal. So 
that leaves this patch or (2) as the only alternative. I would like to 
hear what Will and Tejun thinks about it.

I am going to remove this patch from the series for the time being.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset
  2023-03-17 14:59         ` Waiman Long
@ 2023-03-24 14:32           ` Will Deacon
  2023-03-24 14:42             ` Waiman Long
  2023-03-24 18:19             ` Michal Koutný
  0 siblings, 2 replies; 23+ messages in thread
From: Will Deacon @ 2023-03-24 14:32 UTC (permalink / raw)
  To: Waiman Long
  Cc: Michal Koutný,
	Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Peter Zijlstra

On Fri, Mar 17, 2023 at 10:59:26AM -0400, Waiman Long wrote:
> On 3/17/23 08:27, Michal Koutný wrote:
> > On Tue, Mar 14, 2023 at 04:22:06PM -0400, Waiman Long <longman@redhat.com> wrote:
> > > Some arm64 systems can have asymmetric CPUs where certain tasks are only
> > > runnable on a selected subset of CPUs.
> > Ah, I'm catching up.
> > 
> > > This information is not captured in the cpuset. As a result,
> > > task_cpu_possible_mask() may return a mask that have no overlap with
> > > effective_cpus causing new_cpus to become empty.
> > I can see that historically, there was an approach of terminating
> > unaccomodable tasks:
> >     94f9c00f6460 ("arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores")
> > the removal of killing had been made possible with
> >     df950811f4a8 ("arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system").
> > 
> > That gives two other alternatives to affinity modification:
> > 2) kill such tasks (not unlike OOM upon memory.max reduction),
> > 3) reject cpuset reduction (violates cgroup v2 delegation).
> > 
> > What do you think about 2)?
> 
> Yes, killing it is one possible solution.
> 
> (3) doesn't work if the affinity change is due to hot cpu removal. So that
> leaves this patch or (2) as the only alternative. I would like to hear what
> Will and Tejun thinks about it.

The main constraint from the Android side (the lucky ecosystem where these
SoCs tend to show up) is that existing userspace (including 32-bit binaries)
continues to function without modification. So approaches such as killing
tasks or rejecting system calls tend not to work as well, since you
inevitably get divergent behaviour leading to functional breakage rather
than e.g. performance anomalies.

Having said that, the behaviour we currently have in mainline seems to
be alright, so please don't go out of your way to accomodate these SoCs.
I'm mainly just concerned about introducing any regressions, which is why
I ran my tests on this series

Cheers,

Will

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset
  2023-03-24 14:32           ` Will Deacon
@ 2023-03-24 14:42             ` Waiman Long
  2023-03-24 18:19             ` Michal Koutný
  1 sibling, 0 replies; 23+ messages in thread
From: Waiman Long @ 2023-03-24 14:42 UTC (permalink / raw)
  To: Will Deacon
  Cc: Michal Koutný,
	Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Peter Zijlstra

On 3/24/23 10:32, Will Deacon wrote:
> On Fri, Mar 17, 2023 at 10:59:26AM -0400, Waiman Long wrote:
>> On 3/17/23 08:27, Michal Koutný wrote:
>>> On Tue, Mar 14, 2023 at 04:22:06PM -0400, Waiman Long <longman@redhat.com> wrote:
>>>> Some arm64 systems can have asymmetric CPUs where certain tasks are only
>>>> runnable on a selected subset of CPUs.
>>> Ah, I'm catching up.
>>>
>>>> This information is not captured in the cpuset. As a result,
>>>> task_cpu_possible_mask() may return a mask that have no overlap with
>>>> effective_cpus causing new_cpus to become empty.
>>> I can see that historically, there was an approach of terminating
>>> unaccomodable tasks:
>>>      94f9c00f6460 ("arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores")
>>> the removal of killing had been made possible with
>>>      df950811f4a8 ("arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system").
>>>
>>> That gives two other alternatives to affinity modification:
>>> 2) kill such tasks (not unlike OOM upon memory.max reduction),
>>> 3) reject cpuset reduction (violates cgroup v2 delegation).
>>>
>>> What do you think about 2)?
>> Yes, killing it is one possible solution.
>>
>> (3) doesn't work if the affinity change is due to hot cpu removal. So that
>> leaves this patch or (2) as the only alternative. I would like to hear what
>> Will and Tejun thinks about it.
> The main constraint from the Android side (the lucky ecosystem where these
> SoCs tend to show up) is that existing userspace (including 32-bit binaries)
> continues to function without modification. So approaches such as killing
> tasks or rejecting system calls tend not to work as well, since you
> inevitably get divergent behaviour leading to functional breakage rather
> than e.g. performance anomalies.
>
> Having said that, the behaviour we currently have in mainline seems to
> be alright, so please don't go out of your way to accomodate these SoCs.
> I'm mainly just concerned about introducing any regressions, which is why
> I ran my tests on this series

I agree that killing it may be too draconian. I am withholding this 
patch for now.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset
  2023-03-24 14:32           ` Will Deacon
  2023-03-24 14:42             ` Waiman Long
@ 2023-03-24 18:19             ` Michal Koutný
  2023-03-25 22:08               ` Waiman Long
  1 sibling, 1 reply; 23+ messages in thread
From: Michal Koutný @ 2023-03-24 18:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: Waiman Long, Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan,
	cgroups, linux-kernel, linux-kselftest, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 657 bytes --]

On Fri, Mar 24, 2023 at 02:32:50PM +0000, Will Deacon <will@kernel.org> wrote:
> So approaches such as killing tasks or rejecting system calls tend not
> to work as well, since you inevitably get divergent behaviour leading
> to functional breakage rather than e.g. performance anomalies.

What about temporary performance drop from 100% to 0% aka freezing the
tasks for the duration of the mismatching affinity config?


> Having said that, the behaviour we currently have in mainline seems to
> be alright, so please don't go out of your way to accomodate these SoCs.

I see. (Just wondering what you think about the fourth option above.)

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset
  2023-03-24 18:19             ` Michal Koutný
@ 2023-03-25 22:08               ` Waiman Long
  0 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2023-03-25 22:08 UTC (permalink / raw)
  To: Michal Koutný, Will Deacon
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Shuah Khan, cgroups,
	linux-kernel, linux-kselftest, Peter Zijlstra


On 3/24/23 14:19, Michal Koutný wrote:
> On Fri, Mar 24, 2023 at 02:32:50PM +0000, Will Deacon <will@kernel.org> wrote:
>> So approaches such as killing tasks or rejecting system calls tend not
>> to work as well, since you inevitably get divergent behaviour leading
>> to functional breakage rather than e.g. performance anomalies.
> What about temporary performance drop from 100% to 0% aka freezing the
> tasks for the duration of the mismatching affinity config?

That can be a lot of extra work to freeze it. I will prefer something 
simpler.

Without this patch, I believe it will lead to a cpumask of 0 which will 
cause the scheduler to pick a fallback cpu. It looks like the fallback 
code may be able to pick up the right cpu or it may panic the system 
(less likely).

Cheers,
Longman
>
>
>> Having said that, the behaviour we currently have in mainline seems to
>> be alright, so please don't go out of your way to accomodate these SoCs.
> I see. (Just wondering what you think about the fourth option above.)
>
> Thanks,
> Michal


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-03-25 22:09 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-06 20:08 [PATCH 0/5] cgroup/cpuset: Miscellaneous updates Waiman Long
2023-03-06 20:08 ` [PATCH 1/5] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset Waiman Long
2023-03-14 16:50   ` Michal Koutný
2023-03-14 18:20     ` Waiman Long
2023-03-06 20:08 ` [PATCH 2/5] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated Waiman Long
2023-03-14 17:34   ` Michal Koutný
2023-03-14 19:02     ` Waiman Long
2023-03-15 10:06       ` Michal Koutný
2023-03-15 14:39         ` Waiman Long
2023-03-06 20:08 ` [PATCH 3/5] cgroup/cpuset: Find another usable CPU if none found in current cpuset Waiman Long
2023-03-14 18:17   ` Michal Koutný
2023-03-14 20:22     ` Waiman Long
2023-03-17 12:27       ` Michal Koutný
2023-03-17 14:59         ` Waiman Long
2023-03-24 14:32           ` Will Deacon
2023-03-24 14:42             ` Waiman Long
2023-03-24 18:19             ` Michal Koutný
2023-03-25 22:08               ` Waiman Long
2023-03-06 20:08 ` [PATCH 4/5] cgroup/cpuset: Add CONFIG_DEBUG_CPUSETS config for cpuset testing Waiman Long
2023-03-06 20:08 ` [PATCH 5/5] cgroup/cpuset: Minor updates to test_cpuset_prs.sh Waiman Long
2023-03-07 16:16   ` Kamalesh Babulal
2023-03-15 16:24 ` [PATCH 0/5] cgroup/cpuset: Miscellaneous updates Will Deacon
2023-03-15 16:59   ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).