All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions
@ 2024-01-17 16:35 Waiman Long
  2024-01-17 16:35 ` [RFC PATCH 1/8] rcu/nocb: Pass a cpumask instead of a single CPU to offload/deoffload Waiman Long
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Waiman Long @ 2024-01-17 16:35 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan
  Cc: cgroups, linux-doc, linux-kernel, rcu, linux-kselftest,
	Mrunal Patel, Ryan Phillips, Brent Rowsell, Peter Hunt,
	Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Marcelo Tosatti, Phil Auld, Paul Gortmaker,
	Daniel Bristot de Oliveira, Juri Lelli, Peter Zijlstra,
	Costa Shulyupin, Waiman Long

This patch series is based on the RFC patch from Frederic [1]. Instead
of offering RCU_NOCB as a separate option, it is now lumped into a
root-only cpuset.cpus.isolation_full flag that will enable all the
additional CPU isolation capabilities available for isolated partitions
if set. RCU_NOCB is just the first one to this party. Additional dynamic
CPU isolation capabilities will be added in the future.

The first 2 patches are adopted from Federic with minor twists to fix
merge conflicts and compilation issue. The rests are for implementing
the new cpuset.cpus.isolation_full interface which is essentially a flag
to globally enable or disable full CPU isolation on isolated partitions.
On read, it also shows the CPU isolation capabilities that are currently
enabled. RCU_NOCB requires that the rcu_nocbs option be present in
the kernel boot command line. Without that, the rcu_nocb functionality
cannot be enabled even if the isolation_full flag is set. So we allow
users to check the isolation_full file to verify that if the desired
CPU isolation capability is enabled or not.

Only sanity checking has been done so far. More testing, especially on
the RCU side, will be needed.

[1] https://lore.kernel.org/lkml/20220525221055.1152307-1-frederic@kernel.org/

Frederic Weisbecker (2):
  rcu/nocb: Pass a cpumask instead of a single CPU to offload/deoffload
  rcu/nocb: Prepare to change nocb cpumask from CPU-hotplug protected
    cpuset caller

Waiman Long (6):
  rcu/no_cb: Add rcu_nocb_enabled() to expose the rcu_nocb state
  cgroup/cpuset: Better tracking of addition/deletion of isolated CPUs
  cgroup/cpuset: Add cpuset.cpus.isolation_full
  cgroup/cpuset: Enable dynamic rcu_nocb mode on isolated CPUs
  cgroup/cpuset: Document the new cpuset.cpus.isolation_full control
    file
  cgroup/cpuset: Update test_cpuset_prs.sh to handle
    cpuset.cpus.isolation_full

 Documentation/admin-guide/cgroup-v2.rst       |  24 ++
 include/linux/rcupdate.h                      |  15 +-
 kernel/cgroup/cpuset.c                        | 237 ++++++++++++++----
 kernel/rcu/rcutorture.c                       |   6 +-
 kernel/rcu/tree_nocb.h                        | 118 ++++++---
 .../selftests/cgroup/test_cpuset_prs.sh       |  23 +-
 6 files changed, 337 insertions(+), 86 deletions(-)

-- 
2.39.3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 1/8] rcu/nocb: Pass a cpumask instead of a single CPU to offload/deoffload
  2024-01-17 16:35 [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Waiman Long
@ 2024-01-17 16:35 ` Waiman Long
  2024-01-17 16:35 ` [RFC PATCH 2/8] rcu/nocb: Prepare to change nocb cpumask from CPU-hotplug protected cpuset caller Waiman Long
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2024-01-17 16:35 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan
  Cc: cgroups, linux-doc, linux-kernel, rcu, linux-kselftest,
	Mrunal Patel, Ryan Phillips, Brent Rowsell, Peter Hunt,
	Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Marcelo Tosatti, Phil Auld, Paul Gortmaker,
	Daniel Bristot de Oliveira, Juri Lelli, Peter Zijlstra,
	Costa Shulyupin, Waiman Long

From: Frederic Weisbecker <frederic@kernel.org>

Currently the interface to toggle callbacks offloading state only takes
a single CPU per call. Now driving RCU NOCB through cpusets requires
to be able to change the offloading state of a whole set of CPUs.

To make it easier, extend the (de-)offloading interface to support a
cpumask.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Phil Auld <pauld@redhat.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/rcupdate.h |   9 ++--
 kernel/rcu/rcutorture.c  |   4 +-
 kernel/rcu/tree_nocb.h   | 102 ++++++++++++++++++++++++++-------------
 3 files changed, 76 insertions(+), 39 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 0746b1b0b663..b649344075d2 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -142,13 +142,14 @@ static inline void rcu_irq_work_resched(void) { }
 
 #ifdef CONFIG_RCU_NOCB_CPU
 void rcu_init_nohz(void);
-int rcu_nocb_cpu_offload(int cpu);
-int rcu_nocb_cpu_deoffload(int cpu);
+int rcu_nocb_cpumask_update(const struct cpumask *cpumask, bool offload);
 void rcu_nocb_flush_deferred_wakeup(void);
 #else /* #ifdef CONFIG_RCU_NOCB_CPU */
 static inline void rcu_init_nohz(void) { }
-static inline int rcu_nocb_cpu_offload(int cpu) { return -EINVAL; }
-static inline int rcu_nocb_cpu_deoffload(int cpu) { return 0; }
+static inline int rcu_nocb_cpumask_update(const struct cpumask *cpumask, bool offload)
+{
+	return -EINVAL;
+}
 static inline void rcu_nocb_flush_deferred_wakeup(void) { }
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
 
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 7567ca8e743c..228a5488eb5e 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2140,10 +2140,10 @@ static int rcu_nocb_toggle(void *arg)
 		r = torture_random(&rand);
 		cpu = (r >> 1) % (maxcpu + 1);
 		if (r & 0x1) {
-			rcu_nocb_cpu_offload(cpu);
+			rcu_nocb_cpumask_update(cpumask_of(cpu), true);
 			atomic_long_inc(&n_nocb_offload);
 		} else {
-			rcu_nocb_cpu_deoffload(cpu);
+			rcu_nocb_cpumask_update(cpumask_of(cpu), false);
 			atomic_long_inc(&n_nocb_deoffload);
 		}
 		toggle_delay = torture_random(&rand) % toggle_fuzz + toggle_interval;
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 4efbf7333d4e..60b0a15ed6e2 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1203,29 +1203,23 @@ static long rcu_nocb_rdp_deoffload(void *arg)
 	return 0;
 }
 
-int rcu_nocb_cpu_deoffload(int cpu)
+static int rcu_nocb_cpu_deoffload(int cpu)
 {
 	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	int ret = 0;
 
-	cpus_read_lock();
-	mutex_lock(&rcu_state.barrier_mutex);
-	if (rcu_rdp_is_offloaded(rdp)) {
-		if (cpu_online(cpu)) {
-			ret = work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
-			if (!ret)
-				cpumask_clear_cpu(cpu, rcu_nocb_mask);
-		} else {
-			pr_info("NOCB: Cannot CB-deoffload offline CPU %d\n", rdp->cpu);
-			ret = -EINVAL;
-		}
-	}
-	mutex_unlock(&rcu_state.barrier_mutex);
-	cpus_read_unlock();
+	if (cpu_is_offline(cpu))
+		return -EINVAL;
+
+	if (!rcu_rdp_is_offloaded(rdp))
+		return 0;
+
+	ret = work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp);
+	if (!ret)
+		cpumask_clear_cpu(cpu, rcu_nocb_mask);
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(rcu_nocb_cpu_deoffload);
 
 static long rcu_nocb_rdp_offload(void *arg)
 {
@@ -1236,12 +1230,6 @@ static long rcu_nocb_rdp_offload(void *arg)
 	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
 
 	WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
-	/*
-	 * For now we only support re-offload, ie: the rdp must have been
-	 * offloaded on boot first.
-	 */
-	if (!rdp->nocb_gp_rdp)
-		return -EINVAL;
 
 	if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
 		return -EINVAL;
@@ -1288,29 +1276,77 @@ static long rcu_nocb_rdp_offload(void *arg)
 	return 0;
 }
 
-int rcu_nocb_cpu_offload(int cpu)
+static int rcu_nocb_cpu_offload(int cpu)
 {
 	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
-	int ret = 0;
+	int ret;
+
+	if (cpu_is_offline(cpu))
+		return -EINVAL;
+
+	if (rcu_rdp_is_offloaded(rdp))
+		return 0;
+
+	ret = work_on_cpu(cpu, rcu_nocb_rdp_offload, rdp);
+	if (!ret)
+		cpumask_set_cpu(cpu, rcu_nocb_mask);
+
+	return ret;
+}
+
+int rcu_nocb_cpumask_update(const struct cpumask *cpumask, bool offload)
+{
+	int cpu;
+	int err = 0;
+	int err_cpu;
+	cpumask_var_t saved_nocb_mask;
+
+	if (!alloc_cpumask_var(&saved_nocb_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	cpumask_copy(saved_nocb_mask, rcu_nocb_mask);
 
 	cpus_read_lock();
 	mutex_lock(&rcu_state.barrier_mutex);
-	if (!rcu_rdp_is_offloaded(rdp)) {
-		if (cpu_online(cpu)) {
-			ret = work_on_cpu(cpu, rcu_nocb_rdp_offload, rdp);
-			if (!ret)
-				cpumask_set_cpu(cpu, rcu_nocb_mask);
+	for_each_cpu(cpu, cpumask) {
+		if (offload) {
+			err = rcu_nocb_cpu_offload(cpu);
+			if (err < 0) {
+				err_cpu = cpu;
+				pr_err("NOCB: offload cpu %d failed (%d)\n", cpu, err);
+				break;
+			}
 		} else {
-			pr_info("NOCB: Cannot CB-offload offline CPU %d\n", rdp->cpu);
-			ret = -EINVAL;
+			err = rcu_nocb_cpu_deoffload(cpu);
+			if (err < 0) {
+				err_cpu = cpu;
+				pr_err("NOCB: deoffload cpu %d failed (%d)\n", cpu, err);
+				break;
+			}
+		}
+	}
+
+	/* Rollback in case of error */
+	if (err < 0) {
+		err_cpu = cpu;
+		for_each_cpu(cpu, cpumask) {
+			if (err_cpu == cpu)
+				break;
+			if (cpumask_test_cpu(cpu, saved_nocb_mask))
+				WARN_ON_ONCE(rcu_nocb_cpu_offload(cpu));
+			else
+				WARN_ON_ONCE(rcu_nocb_cpu_deoffload(cpu));
 		}
 	}
+
 	mutex_unlock(&rcu_state.barrier_mutex);
 	cpus_read_unlock();
 
-	return ret;
+	free_cpumask_var(saved_nocb_mask);
+
+	return err;
 }
-EXPORT_SYMBOL_GPL(rcu_nocb_cpu_offload);
+EXPORT_SYMBOL_GPL(rcu_nocb_cpumask_update);
 
 #ifdef CONFIG_RCU_LAZY
 static unsigned long
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 2/8] rcu/nocb: Prepare to change nocb cpumask from CPU-hotplug protected cpuset caller
  2024-01-17 16:35 [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Waiman Long
  2024-01-17 16:35 ` [RFC PATCH 1/8] rcu/nocb: Pass a cpumask instead of a single CPU to offload/deoffload Waiman Long
@ 2024-01-17 16:35 ` Waiman Long
  2024-01-17 16:35 ` [RFC PATCH 3/8] rcu/no_cb: Add rcu_nocb_enabled() to expose the rcu_nocb state Waiman Long
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2024-01-17 16:35 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan
  Cc: cgroups, linux-doc, linux-kernel, rcu, linux-kselftest,
	Mrunal Patel, Ryan Phillips, Brent Rowsell, Peter Hunt,
	Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Marcelo Tosatti, Phil Auld, Paul Gortmaker,
	Daniel Bristot de Oliveira, Juri Lelli, Peter Zijlstra,
	Costa Shulyupin, Waiman Long

From: Frederic Weisbecker <frederic@kernel.org>

cpusets is going to use the NOCB (de-)offloading interface while
holding hotplug lock. Therefore pull out the responsibility of protecting
against concurrent CPU-hotplug changes to the callers of
rcu_nocb_cpumask_update().

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Phil Auld <pauld@redhat.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/rcu/rcutorture.c | 2 ++
 kernel/rcu/tree_nocb.h  | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 228a5488eb5e..e935152346ff 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2139,6 +2139,7 @@ static int rcu_nocb_toggle(void *arg)
 	do {
 		r = torture_random(&rand);
 		cpu = (r >> 1) % (maxcpu + 1);
+		cpus_read_lock();
 		if (r & 0x1) {
 			rcu_nocb_cpumask_update(cpumask_of(cpu), true);
 			atomic_long_inc(&n_nocb_offload);
@@ -2146,6 +2147,7 @@ static int rcu_nocb_toggle(void *arg)
 			rcu_nocb_cpumask_update(cpumask_of(cpu), false);
 			atomic_long_inc(&n_nocb_deoffload);
 		}
+		cpus_read_unlock();
 		toggle_delay = torture_random(&rand) % toggle_fuzz + toggle_interval;
 		set_current_state(TASK_INTERRUPTIBLE);
 		schedule_hrtimeout(&toggle_delay, HRTIMER_MODE_REL);
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 60b0a15ed6e2..bbcf6f4152a3 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1301,12 +1301,13 @@ int rcu_nocb_cpumask_update(const struct cpumask *cpumask, bool offload)
 	int err_cpu;
 	cpumask_var_t saved_nocb_mask;
 
+	lockdep_assert_cpus_held();
+
 	if (!alloc_cpumask_var(&saved_nocb_mask, GFP_KERNEL))
 		return -ENOMEM;
 
 	cpumask_copy(saved_nocb_mask, rcu_nocb_mask);
 
-	cpus_read_lock();
 	mutex_lock(&rcu_state.barrier_mutex);
 	for_each_cpu(cpu, cpumask) {
 		if (offload) {
@@ -1340,7 +1341,6 @@ int rcu_nocb_cpumask_update(const struct cpumask *cpumask, bool offload)
 	}
 
 	mutex_unlock(&rcu_state.barrier_mutex);
-	cpus_read_unlock();
 
 	free_cpumask_var(saved_nocb_mask);
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 3/8] rcu/no_cb: Add rcu_nocb_enabled() to expose the rcu_nocb state
  2024-01-17 16:35 [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Waiman Long
  2024-01-17 16:35 ` [RFC PATCH 1/8] rcu/nocb: Pass a cpumask instead of a single CPU to offload/deoffload Waiman Long
  2024-01-17 16:35 ` [RFC PATCH 2/8] rcu/nocb: Prepare to change nocb cpumask from CPU-hotplug protected cpuset caller Waiman Long
@ 2024-01-17 16:35 ` Waiman Long
  2024-01-17 16:35 ` [RFC PATCH 4/8] cgroup/cpuset: Better tracking of addition/deletion of isolated CPUs Waiman Long
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2024-01-17 16:35 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan
  Cc: cgroups, linux-doc, linux-kernel, rcu, linux-kselftest,
	Mrunal Patel, Ryan Phillips, Brent Rowsell, Peter Hunt,
	Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Marcelo Tosatti, Phil Auld, Paul Gortmaker,
	Daniel Bristot de Oliveira, Juri Lelli, Peter Zijlstra,
	Costa Shulyupin, Waiman Long

Add a new rcu_nocb_enabled() helper to expose the rcu_nocb state
to other kernel subsystems like cpuset.  That will allow cpuset to
determine if RCU no-callback can be enabled on isolated CPUs within
isolated partitions. If so, the corresponding RCU functions can be
called to enable it when full CPU isolation is requested.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/rcupdate.h |  6 ++++++
 kernel/rcu/tree_nocb.h   | 12 ++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index b649344075d2..976d55a3e523 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -120,6 +120,12 @@ void rcu_init(void);
 extern int rcu_scheduler_active;
 void rcu_sched_clock_irq(int user);
 
+#ifdef CONFIG_RCU_NOCB_CPU
+int rcu_nocb_enabled(struct cpumask *out_mask);
+#else
+static inline int rcu_nocb_enabled(struct cpumask *out_mask) { return 0; }
+#endif
+
 #ifdef CONFIG_TASKS_RCU_GENERIC
 void rcu_init_tasks_generic(void);
 #else
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index bbcf6f4152a3..020a347ccd52 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -81,6 +81,18 @@ static int __init parse_rcu_nocb_poll(char *arg)
 }
 __setup("rcu_nocb_poll", parse_rcu_nocb_poll);
 
+/*
+ * Return the rcu_nocb state & optionally copy out rcu_nocb_mask.
+ */
+int rcu_nocb_enabled(struct cpumask *out_mask)
+{
+	if (!rcu_state.nocb_is_setup)
+		return 0;
+	if (out_mask)
+		cpumask_copy(out_mask, rcu_nocb_mask);
+	return 1;
+}
+
 /*
  * Don't bother bypassing ->cblist if the call_rcu() rate is low.
  * After all, the main point of bypassing is to avoid lock contention
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 4/8] cgroup/cpuset: Better tracking of addition/deletion of isolated CPUs
  2024-01-17 16:35 [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Waiman Long
                   ` (2 preceding siblings ...)
  2024-01-17 16:35 ` [RFC PATCH 3/8] rcu/no_cb: Add rcu_nocb_enabled() to expose the rcu_nocb state Waiman Long
@ 2024-01-17 16:35 ` Waiman Long
  2024-01-17 16:35 ` [RFC PATCH 5/8] cgroup/cpuset: Add cpuset.cpus.isolation_full Waiman Long
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2024-01-17 16:35 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan
  Cc: cgroups, linux-doc, linux-kernel, rcu, linux-kselftest,
	Mrunal Patel, Ryan Phillips, Brent Rowsell, Peter Hunt,
	Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Marcelo Tosatti, Phil Auld, Paul Gortmaker,
	Daniel Bristot de Oliveira, Juri Lelli, Peter Zijlstra,
	Costa Shulyupin, Waiman Long

The process of updating workqueue unbound cpumask to exclude isolated
CPUs in cpuset only requires the use of the aggregated isolated_cpus
cpumask.  Other types of CPU isolation, like the RCU no-callback CPU
mode, may require knowing more granular addition and deletion of isolated
CPUs. To enable these types of CPU isolation at run time, we need to
provide better tracking of the addition and deletion of isolated CPUs.

This patch adds a new isolated_cpus_modifier enum type for tracking
the addition and deletion of isolated CPUs as well as renaming
update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
to accommodate additional CPU isolation modes in the future.

There is no functional change.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 113 +++++++++++++++++++++++++----------------
 1 file changed, 69 insertions(+), 44 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index dfbb16aca9f4..0479af76a5dc 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -206,6 +206,13 @@ struct cpuset {
  */
 static cpumask_var_t	subpartitions_cpus;
 
+/* Enum types for possible changes to the set of isolated CPUs */
+enum isolated_cpus_modifiers {
+	ISOL_CPUS_NONE = 0,
+	ISOL_CPUS_ADD,
+	ISOL_CPUS_DELETE,
+};
+
 /*
  * Exclusive CPUs in isolated partitions
  */
@@ -1446,14 +1453,14 @@ static void partition_xcpus_newstate(int old_prs, int new_prs, struct cpumask *x
  * @new_prs: new partition_root_state
  * @parent: parent cpuset
  * @xcpus: exclusive CPUs to be added
- * Return: true if isolated_cpus modified, false otherwise
+ * Return: isolated_cpus modifier
  *
  * Remote partition if parent == NULL
  */
-static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
-				struct cpumask *xcpus)
+static int partition_xcpus_add(int new_prs, struct cpuset *parent,
+			       struct cpumask *xcpus)
 {
-	bool isolcpus_updated;
+	int icpus_mod = ISOL_CPUS_NONE;
 
 	WARN_ON_ONCE(new_prs < 0);
 	lockdep_assert_held(&callback_lock);
@@ -1464,13 +1471,14 @@ static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
 	if (parent == &top_cpuset)
 		cpumask_or(subpartitions_cpus, subpartitions_cpus, xcpus);
 
-	isolcpus_updated = (new_prs != parent->partition_root_state);
-	if (isolcpus_updated)
+	if (new_prs != parent->partition_root_state) {
 		partition_xcpus_newstate(parent->partition_root_state, new_prs,
 					 xcpus);
-
+		icpus_mod = (new_prs == PRS_ISOLATED)
+			    ? ISOL_CPUS_ADD : ISOL_CPUS_DELETE;
+	}
 	cpumask_andnot(parent->effective_cpus, parent->effective_cpus, xcpus);
-	return isolcpus_updated;
+	return icpus_mod;
 }
 
 /*
@@ -1478,14 +1486,14 @@ static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
  * @old_prs: old partition_root_state
  * @parent: parent cpuset
  * @xcpus: exclusive CPUs to be removed
- * Return: true if isolated_cpus modified, false otherwise
+ * Return: isolated_cpus modifier
  *
  * Remote partition if parent == NULL
  */
-static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
+static int partition_xcpus_del(int old_prs, struct cpuset *parent,
 				struct cpumask *xcpus)
 {
-	bool isolcpus_updated;
+	int icpus_mod;
 
 	WARN_ON_ONCE(old_prs < 0);
 	lockdep_assert_held(&callback_lock);
@@ -1495,27 +1503,40 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
 	if (parent == &top_cpuset)
 		cpumask_andnot(subpartitions_cpus, subpartitions_cpus, xcpus);
 
-	isolcpus_updated = (old_prs != parent->partition_root_state);
-	if (isolcpus_updated)
+	if (old_prs != parent->partition_root_state) {
 		partition_xcpus_newstate(old_prs, parent->partition_root_state,
 					 xcpus);
-
+		icpus_mod = (old_prs == PRS_ISOLATED)
+			    ? ISOL_CPUS_DELETE : ISOL_CPUS_ADD;
+	}
 	cpumask_and(xcpus, xcpus, cpu_active_mask);
 	cpumask_or(parent->effective_cpus, parent->effective_cpus, xcpus);
-	return isolcpus_updated;
+	return icpus_mod;
 }
 
-static void update_unbound_workqueue_cpumask(bool isolcpus_updated)
+/**
+ * update_isolation_cpumasks - Add or remove CPUs to/from full isolation state
+ * @mask: cpumask of the CPUs to be added or removed
+ * @modifier: enum isolated_cpus_modifiers
+ * Return: 0 if successful, error code otherwise
+ *
+ * Workqueue unbound cpumask update is applied irrespective of isolation_full
+ * state and the whole isolated_cpus is passed. Repeated calls with the same
+ * isolated_cpus will not cause further action other than a wasted mutex
+ * lock/unlock.
+ */
+static int update_isolation_cpumasks(struct cpumask *mask, int modifier)
 {
-	int ret;
+	int err;
 
 	lockdep_assert_cpus_held();
 
-	if (!isolcpus_updated)
-		return;
+	if (!modifier)
+		return 0;	/* No change in isolated CPUs */
 
-	ret = workqueue_unbound_exclude_cpumask(isolated_cpus);
-	WARN_ON_ONCE(ret < 0);
+	err = workqueue_unbound_exclude_cpumask(isolated_cpus);
+	WARN_ON_ONCE(err);
+	return err;
 }
 
 /**
@@ -1577,7 +1598,7 @@ static inline bool is_local_partition(struct cpuset *cs)
 static int remote_partition_enable(struct cpuset *cs, int new_prs,
 				   struct tmpmasks *tmp)
 {
-	bool isolcpus_updated;
+	int icpus_mod;
 
 	/*
 	 * The user must have sysadmin privilege.
@@ -1600,7 +1621,7 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
 		return 0;
 
 	spin_lock_irq(&callback_lock);
-	isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
+	icpus_mod = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
 	list_add(&cs->remote_sibling, &remote_children);
 	if (cs->use_parent_ecpus) {
 		struct cpuset *parent = parent_cs(cs);
@@ -1609,7 +1630,7 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
 		parent->child_ecpus_count--;
 	}
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_isolation_cpumasks(tmp->new_cpus, icpus_mod);
 
 	/*
 	 * Proprogate changes in top_cpuset's effective_cpus down the hierarchy.
@@ -1630,7 +1651,7 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
  */
 static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
 {
-	bool isolcpus_updated;
+	int icpus_mod;
 
 	compute_effective_exclusive_cpumask(cs, tmp->new_cpus);
 	WARN_ON_ONCE(!is_remote_partition(cs));
@@ -1638,14 +1659,14 @@ static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
 
 	spin_lock_irq(&callback_lock);
 	list_del_init(&cs->remote_sibling);
-	isolcpus_updated = partition_xcpus_del(cs->partition_root_state,
-					       NULL, tmp->new_cpus);
+	icpus_mod = partition_xcpus_del(cs->partition_root_state, NULL,
+					tmp->new_cpus);
 	cs->partition_root_state = -cs->partition_root_state;
 	if (!cs->prs_err)
 		cs->prs_err = PERR_INVCPUS;
 	reset_partition_data(cs);
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_isolation_cpumasks(tmp->new_cpus, icpus_mod);
 
 	/*
 	 * Proprogate changes in top_cpuset's effective_cpus down the hierarchy.
@@ -1668,7 +1689,8 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *newmask,
 {
 	bool adding, deleting;
 	int prs = cs->partition_root_state;
-	int isolcpus_updated = 0;
+	int icpus_add_mod = ISOL_CPUS_NONE;
+	int icpus_del_mod = ISOL_CPUS_NONE;
 
 	if (WARN_ON_ONCE(!is_remote_partition(cs)))
 		return;
@@ -1693,12 +1715,12 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *newmask,
 
 	spin_lock_irq(&callback_lock);
 	if (adding)
-		isolcpus_updated += partition_xcpus_add(prs, NULL, tmp->addmask);
+		icpus_add_mod = partition_xcpus_add(prs, NULL, tmp->addmask);
 	if (deleting)
-		isolcpus_updated += partition_xcpus_del(prs, NULL, tmp->delmask);
+		icpus_del_mod = partition_xcpus_del(prs, NULL, tmp->delmask);
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
-
+	update_isolation_cpumasks(tmp->addmask, icpus_add_mod);
+	update_isolation_cpumasks(tmp->delmask, icpus_del_mod);
 	/*
 	 * Proprogate changes in top_cpuset's effective_cpus down the hierarchy.
 	 */
@@ -1819,7 +1841,8 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 	int part_error = PERR_NONE;	/* Partition error? */
 	int subparts_delta = 0;
 	struct cpumask *xcpus;		/* cs effective_xcpus */
-	int isolcpus_updated = 0;
+	int icpus_add_mod = ISOL_CPUS_NONE;
+	int icpus_del_mod = ISOL_CPUS_NONE;
 	bool nocpu;
 
 	lockdep_assert_held(&cpuset_mutex);
@@ -2052,22 +2075,23 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 			cs->nr_subparts = 0;
 	}
 	/*
-	 * Adding to parent's effective_cpus means deletion CPUs from cs
+	 * Adding to parent's effective_cpus means deleting CPUs from cs
 	 * and vice versa.
 	 */
 	if (adding)
-		isolcpus_updated += partition_xcpus_del(old_prs, parent,
-							tmp->addmask);
+		icpus_add_mod = partition_xcpus_del(old_prs, parent,
+						    tmp->addmask);
 	if (deleting)
-		isolcpus_updated += partition_xcpus_add(new_prs, parent,
-							tmp->delmask);
+		icpus_del_mod = partition_xcpus_add(new_prs, parent,
+						    tmp->delmask);
 
 	if (is_partition_valid(parent)) {
 		parent->nr_subparts += subparts_delta;
 		WARN_ON_ONCE(parent->nr_subparts < 0);
 	}
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_isolation_cpumasks(tmp->addmask, icpus_add_mod);
+	update_isolation_cpumasks(tmp->delmask, icpus_del_mod);
 
 	if ((old_prs != new_prs) && (cmd == partcmd_update))
 		update_partition_exclusive(cs, new_prs);
@@ -3044,7 +3068,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
 	int err = PERR_NONE, old_prs = cs->partition_root_state;
 	struct cpuset *parent = parent_cs(cs);
 	struct tmpmasks tmpmask;
-	bool new_xcpus_state = false;
+	int icpus_mod = ISOL_CPUS_NONE;
 
 	if (old_prs == new_prs)
 		return 0;
@@ -3096,7 +3120,8 @@ static int update_prstate(struct cpuset *cs, int new_prs)
 		/*
 		 * A change in load balance state only, no change in cpumasks.
 		 */
-		new_xcpus_state = true;
+		icpus_mod = (new_prs == PRS_ISOLATED)
+			    ? ISOL_CPUS_ADD : ISOL_CPUS_DELETE;
 	} else {
 		/*
 		 * Switching back to member is always allowed even if it
@@ -3128,10 +3153,10 @@ static int update_prstate(struct cpuset *cs, int new_prs)
 	WRITE_ONCE(cs->prs_err, err);
 	if (!is_partition_valid(cs))
 		reset_partition_data(cs);
-	else if (new_xcpus_state)
+	else if (icpus_mod)
 		partition_xcpus_newstate(old_prs, new_prs, cs->effective_xcpus);
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(new_xcpus_state);
+	update_isolation_cpumasks(cs->effective_xcpus, icpus_mod);
 
 	/* Force update if switching back to member */
 	update_cpumasks_hier(cs, &tmpmask, !new_prs ? HIER_CHECKALL : 0);
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 5/8] cgroup/cpuset: Add cpuset.cpus.isolation_full
  2024-01-17 16:35 [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Waiman Long
                   ` (3 preceding siblings ...)
  2024-01-17 16:35 ` [RFC PATCH 4/8] cgroup/cpuset: Better tracking of addition/deletion of isolated CPUs Waiman Long
@ 2024-01-17 16:35 ` Waiman Long
  2024-01-17 16:35 ` [RFC PATCH 6/8] cgroup/cpuset: Enable dynamic rcu_nocb mode on isolated CPUs Waiman Long
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2024-01-17 16:35 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan
  Cc: cgroups, linux-doc, linux-kernel, rcu, linux-kselftest,
	Mrunal Patel, Ryan Phillips, Brent Rowsell, Peter Hunt,
	Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Marcelo Tosatti, Phil Auld, Paul Gortmaker,
	Daniel Bristot de Oliveira, Juri Lelli, Peter Zijlstra,
	Costa Shulyupin, Waiman Long

This patch adds a new root only cpuset.cpus.isolation_full control file
for enabling or disabling full CPU isolation mode where additional CPU
isolation methods available to be used by cpuset will be turned on or
off for all the isolated CPUs within isolated partitions.

On write, cpuset.cpus.isolation_full accepts any integer. A zero value
will disable full CPU isolation while a non-zero value will enable it.
On read, cpuset.cpus.isolation_full will return either "0" (disabled)
or "1" (enabled) followed by a comma separated list of additional
CPU isolation methods that are enabled. The list of these available
isolation methods will depend on kernel configuration options used as
well as the presence of some pre-conditions for some of them.

This patch only provides the infrastructure code. The various isolation
methods will be added later on.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 88 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 0479af76a5dc..d1d4ce213979 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -218,6 +218,30 @@ enum isolated_cpus_modifiers {
  */
 static cpumask_var_t	isolated_cpus;
 
+/*
+ * Enable full CPU isolation in isolated partitions, if set.
+ */
+static bool isolation_full;
+
+/*
+ * A flag indicating what cpuset full isolation modes can be enabled.
+ */
+static int isolation_flags;
+
+enum cpuset_isolation_types {
+	ISOL_TYPE_MAX,
+};
+
+static const char * const isolation_type_names[ISOL_TYPE_MAX] = {
+};
+
+/* Detect the cpuset isolation modes that can be enabled */
+static __init int set_isolation_flags(void)
+{
+	return 0;
+}
+late_initcall(set_isolation_flags);
+
 /* List of remote partition root children */
 static struct list_head remote_children;
 
@@ -1524,6 +1548,8 @@ static int partition_xcpus_del(int old_prs, struct cpuset *parent,
  * state and the whole isolated_cpus is passed. Repeated calls with the same
  * isolated_cpus will not cause further action other than a wasted mutex
  * lock/unlock.
+ *
+ * The other isolation modes will only be activated when isolation_full is set.
  */
 static int update_isolation_cpumasks(struct cpumask *mask, int modifier)
 {
@@ -1536,6 +1562,13 @@ static int update_isolation_cpumasks(struct cpumask *mask, int modifier)
 
 	err = workqueue_unbound_exclude_cpumask(isolated_cpus);
 	WARN_ON_ONCE(err);
+
+	if (!isolation_flags || !isolation_full)
+		return err;
+
+	if (WARN_ON_ONCE(cpumask_empty(mask)))
+		return -EINVAL;
+
 	return err;
 }
 
@@ -3514,6 +3547,7 @@ typedef enum {
 	FILE_EXCLUSIVE_CPULIST,
 	FILE_EFFECTIVE_XCPULIST,
 	FILE_ISOLATED_CPULIST,
+	FILE_ISOLATION_FULL,
 	FILE_CPU_EXCLUSIVE,
 	FILE_MEM_EXCLUSIVE,
 	FILE_MEM_HARDWALL,
@@ -3713,6 +3747,25 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
 	case FILE_ISOLATED_CPULIST:
 		seq_printf(sf, "%*pbl\n", cpumask_pr_args(isolated_cpus));
 		break;
+	case FILE_ISOLATION_FULL:
+		if (isolation_full) {
+			int i, cnt;
+
+			/* Also print the isolation modes that are enabled */
+			seq_puts(sf, "1");
+			for (i = cnt = 0; i < ISOL_TYPE_MAX; i++) {
+				if (!(isolation_flags & BIT(i)))
+					continue;
+
+				seq_printf(sf, "%c%s", cnt ? ',' : ' ',
+					   isolation_type_names[i]);
+				cnt++;
+			}
+			seq_puts(sf, "\n");
+		} else {
+			seq_puts(sf, "0\n");
+		}
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -3833,6 +3886,33 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf,
 	return retval ?: nbytes;
 }
 
+/*
+ * cpuset_write_isolfull - enable/disable cpuset isolation full mode
+ */
+static int cpuset_write_isolfull(struct cgroup_subsys_state *css,
+				 struct cftype *cft, u64 val)
+{
+	struct cpuset *cs = css_cs(css);
+	int retval = 0;
+
+	cpus_read_lock();
+	mutex_lock(&cpuset_mutex);
+	if (!is_cpuset_online(cs)) {
+		retval = -ENODEV;
+	} else if (isolation_full != !!val) {
+		isolation_full = !!val;
+		if (!cpumask_empty(isolated_cpus)) {
+			int mod = isolation_full
+				  ? ISOL_CPUS_ADD : ISOL_CPUS_DELETE;
+
+			retval = update_isolation_cpumasks(isolated_cpus, mod);
+		}
+	}
+	mutex_unlock(&cpuset_mutex);
+	cpus_read_unlock();
+	return retval;
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
@@ -4013,6 +4093,14 @@ static struct cftype dfl_files[] = {
 		.flags = CFTYPE_ONLY_ON_ROOT,
 	},
 
+	{
+		.name = "cpus.isolation_full",
+		.seq_show = cpuset_common_seq_show,
+		.write_u64 = cpuset_write_isolfull,
+		.private = FILE_ISOLATION_FULL,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+	},
+
 	{ }	/* terminate */
 };
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 6/8] cgroup/cpuset: Enable dynamic rcu_nocb mode on isolated CPUs
  2024-01-17 16:35 [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Waiman Long
                   ` (4 preceding siblings ...)
  2024-01-17 16:35 ` [RFC PATCH 5/8] cgroup/cpuset: Add cpuset.cpus.isolation_full Waiman Long
@ 2024-01-17 16:35 ` Waiman Long
  2024-01-17 16:35 ` [RFC PATCH 7/8] cgroup/cpuset: Document the new cpuset.cpus.isolation_full control file Waiman Long
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2024-01-17 16:35 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan
  Cc: cgroups, linux-doc, linux-kernel, rcu, linux-kselftest,
	Mrunal Patel, Ryan Phillips, Brent Rowsell, Peter Hunt,
	Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Marcelo Tosatti, Phil Auld, Paul Gortmaker,
	Daniel Bristot de Oliveira, Juri Lelli, Peter Zijlstra,
	Costa Shulyupin, Waiman Long

The patch adds RCU no-callback isolation mode dynamically to isolated
CPUs within isolated partitions when the full CPU isolation mode is
enabled. This isolation feature will only be available for use by cpuset
if the "rcu_nocb" boot command line option is specified in the kernel
command line with or without the optional CPU list argument.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index d1d4ce213979..40bbb0a9cb84 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -218,6 +218,11 @@ enum isolated_cpus_modifiers {
  */
 static cpumask_var_t	isolated_cpus;
 
+/*
+ * rcu_nocb_mask set up at boot time.
+ */
+static cpumask_var_t	rcu_nocb_mask_preset;
+
 /*
  * Enable full CPU isolation in isolated partitions, if set.
  */
@@ -229,15 +234,26 @@ static bool isolation_full;
 static int isolation_flags;
 
 enum cpuset_isolation_types {
+	ISOL_TYPE_RCU,	/* RCU no-callback CPU mode */
 	ISOL_TYPE_MAX,
 };
 
+enum cpuset_isolation_flags {
+	ISOL_FLAG_RCU = BIT(ISOL_TYPE_RCU),
+};
+
 static const char * const isolation_type_names[ISOL_TYPE_MAX] = {
+	[ISOL_TYPE_RCU] = "rcu_nocbs",
 };
 
 /* Detect the cpuset isolation modes that can be enabled */
 static __init int set_isolation_flags(void)
 {
+	if (rcu_nocb_enabled(NULL)) {
+		BUG_ON(!zalloc_cpumask_var(&rcu_nocb_mask_preset, GFP_KERNEL));
+		(void)rcu_nocb_enabled(rcu_nocb_mask_preset);
+		isolation_flags |= ISOL_FLAG_RCU;
+	}
 	return 0;
 }
 late_initcall(set_isolation_flags);
@@ -1554,6 +1570,7 @@ static int partition_xcpus_del(int old_prs, struct cpuset *parent,
 static int update_isolation_cpumasks(struct cpumask *mask, int modifier)
 {
 	int err;
+	bool enable = (modifier == ISOL_CPUS_ADD);
 
 	lockdep_assert_cpus_held();
 
@@ -1569,6 +1586,25 @@ static int update_isolation_cpumasks(struct cpumask *mask, int modifier)
 	if (WARN_ON_ONCE(cpumask_empty(mask)))
 		return -EINVAL;
 
+	err = 0;
+	if (isolation_flags & ISOL_FLAG_RCU) {
+		/*
+		 * When disabling rcu_nocb, make sure that we don't touch any
+		 * CPUs that have already been set in rcu_nocb_mask_preset.
+		 */
+		if (!enable && cpumask_intersects(mask, rcu_nocb_mask_preset)) {
+			cpumask_var_t tmp_mask;
+
+			if (WARN_ON_ONCE(!alloc_cpumask_var(&tmp_mask, GFP_KERNEL)))
+				return -ENOMEM;
+			if (cpumask_andnot(tmp_mask, mask, rcu_nocb_mask_preset))
+				err = rcu_nocb_cpumask_update(tmp_mask, enable);
+			free_cpumask_var(tmp_mask);
+		} else {
+			err = rcu_nocb_cpumask_update(mask, enable);
+		}
+	}
+	WARN_ON_ONCE(err);
 	return err;
 }
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 7/8] cgroup/cpuset: Document the new cpuset.cpus.isolation_full control file
  2024-01-17 16:35 [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Waiman Long
                   ` (5 preceding siblings ...)
  2024-01-17 16:35 ` [RFC PATCH 6/8] cgroup/cpuset: Enable dynamic rcu_nocb mode on isolated CPUs Waiman Long
@ 2024-01-17 16:35 ` Waiman Long
  2024-01-17 16:35 ` [RFC PATCH 8/8] cgroup/cpuset: Update test_cpuset_prs.sh to handle cpuset.cpus.isolation_full Waiman Long
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2024-01-17 16:35 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan
  Cc: cgroups, linux-doc, linux-kernel, rcu, linux-kselftest,
	Mrunal Patel, Ryan Phillips, Brent Rowsell, Peter Hunt,
	Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Marcelo Tosatti, Phil Auld, Paul Gortmaker,
	Daniel Bristot de Oliveira, Juri Lelli, Peter Zijlstra,
	Costa Shulyupin, Waiman Long

Document the new cpuset.cpus.isolation_full control file. Currently only
the rcu_nocbs flag is supported, but more will be added in the future.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 17e6e9565156..bbd066838f93 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2352,6 +2352,30 @@ Cpuset Interface Files
 	isolated partitions. It will be empty if no isolated partition
 	is created.
 
+  cpuset.cpus.isolation_full
+	A read multiple values and write single value file which exists
+	on root cgroup only.
+
+	This file shows the state of full CPU isolation mode for isolated
+	CPUs in isolated partitions.  It shows either "0' if full CPU
+	isolation mode is disabled, or "1" followed by a comma-separated
+	list of additional CPU isolation flags that are enabled.
+	The currently supported CPU isolation flag is:
+
+	  rcu_nocbs
+		RCU no-callback CPU mode, which prevents such CPUs'
+		callbacks from being invoked in softirq context.
+		Invocation of such CPUs' RCU callbacks will instead be
+		offloaded to "rcuox/N" kthreads created for that purpose.
+		It is similar in functionality to the "rcu_nocbs"
+		boot command line option, but for dynamically created
+		isolated CPUs in isolated partitions.  This flag can
+		only be enabled if such a "rcu_nocbs" option is present
+		in the boot command line of the running kernel.
+
+	Full CPU isolation mode is enabled by writing a non-zero value
+	to this file and disabled by writing a zero value to it.
+
   cpuset.cpus.partition
 	A read-write single value file which exists on non-root
 	cpuset-enabled cgroups.  This flag is owned by the parent cgroup
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 8/8] cgroup/cpuset: Update test_cpuset_prs.sh to handle cpuset.cpus.isolation_full
  2024-01-17 16:35 [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Waiman Long
                   ` (6 preceding siblings ...)
  2024-01-17 16:35 ` [RFC PATCH 7/8] cgroup/cpuset: Document the new cpuset.cpus.isolation_full control file Waiman Long
@ 2024-01-17 16:35 ` Waiman Long
  2024-01-17 17:07 ` [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Tejun Heo
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2024-01-17 16:35 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan
  Cc: cgroups, linux-doc, linux-kernel, rcu, linux-kselftest,
	Mrunal Patel, Ryan Phillips, Brent Rowsell, Peter Hunt,
	Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Marcelo Tosatti, Phil Auld, Paul Gortmaker,
	Daniel Bristot de Oliveira, Juri Lelli, Peter Zijlstra,
	Costa Shulyupin, Waiman Long

Add a new "-F" option to cpuset.cpus.isolation_full to enable
cpuset.cpus.isolation_full for trying out the effect of enabling
full CPU isolation.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 .../selftests/cgroup/test_cpuset_prs.sh       | 23 ++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index b5eb1be2248c..2a8f0cb8d252 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -32,6 +32,7 @@ NR_CPUS=$(lscpu | grep "^CPU(s):" | sed -e "s/.*:[[:space:]]*//")
 PROG=$1
 VERBOSE=0
 DELAY_FACTOR=1
+ISOLATION_FULL=
 SCHED_DEBUG=
 while [[ "$1" = -* ]]
 do
@@ -44,7 +45,10 @@ do
 		-d) DELAY_FACTOR=$2
 		    shift
 		    ;;
-		*)  echo "Usage: $PROG [-v] [-d <delay-factor>"
+		-F) ISOLATION_FULL=1
+		    shift
+		    ;;
+		*)  echo "Usage: $PROG [-v] [-d <delay-factor>] [-F]"
 		    exit
 		    ;;
 	esac
@@ -108,6 +112,22 @@ console_msg()
 	pause 0.01
 }
 
+setup_isolation_full()
+{
+	ISOL_FULL=${CGROUP2}/cpuset.cpus.isolation_full
+	if [[ -n "$ISOLATION_FULL" ]]
+	then
+		echo 1 > $ISOL_FULL
+		set -- $(cat $ISOL_FULL)
+		ISOLATION_FLAGS=$2
+		[[ $VERBOSE -gt 0 ]] && {
+			echo "Full CPU isolation flags: $ISOLATION_FLAGS"
+		}
+	else
+		echo 0 > $ISOL_FULL
+	fi
+}
+
 test_partition()
 {
 	EXPECTED_VAL=$1
@@ -930,6 +950,7 @@ test_inotify()
 }
 
 trap cleanup 0 2 3 6
+setup_isolation_full
 run_state_test TEST_MATRIX
 test_isolated
 test_inotify
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions
  2024-01-17 16:35 [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Waiman Long
                   ` (7 preceding siblings ...)
  2024-01-17 16:35 ` [RFC PATCH 8/8] cgroup/cpuset: Update test_cpuset_prs.sh to handle cpuset.cpus.isolation_full Waiman Long
@ 2024-01-17 17:07 ` Tejun Heo
  2024-01-17 17:15   ` Waiman Long
  2024-01-19 10:24 ` Paul E. McKenney
  2024-01-22 15:07 ` Michal Koutný
  10 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2024-01-17 17:07 UTC (permalink / raw)
  To: Waiman Long
  Cc: Zefan Li, Johannes Weiner, Frederic Weisbecker, Jonathan Corbet,
	Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Josh Triplett,
	Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, Davidlohr Bueso, Shuah Khan, cgroups, linux-doc,
	linux-kernel, rcu, linux-kselftest, Mrunal Patel, Ryan Phillips,
	Brent Rowsell, Peter Hunt, Cestmir Kalina,
	Nicolas Saenz Julienne, Alex Gladkov, Marcelo Tosatti, Phil Auld,
	Paul Gortmaker, Daniel Bristot de Oliveira, Juri Lelli,
	Peter Zijlstra, Costa Shulyupin

Hello,

On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
> The first 2 patches are adopted from Federic with minor twists to fix
> merge conflicts and compilation issue. The rests are for implementing
> the new cpuset.cpus.isolation_full interface which is essentially a flag
> to globally enable or disable full CPU isolation on isolated partitions.

I think the interface is a bit premature. The cpuset partition feature is
already pretty restrictive and makes it really clear that it's to isolate
the CPUs. I think it'd be better to just enable all the isolation features
by default. If there are valid use cases which can't be served without
disabling some isolation features, we can worry about adding the interface
at that point.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions
  2024-01-17 17:07 ` [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Tejun Heo
@ 2024-01-17 17:15   ` Waiman Long
  2024-02-06 12:56     ` Frederic Weisbecker
  0 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2024-01-17 17:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Zefan Li, Johannes Weiner, Frederic Weisbecker, Jonathan Corbet,
	Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Josh Triplett,
	Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, Davidlohr Bueso, Shuah Khan, cgroups, linux-doc,
	linux-kernel, rcu, linux-kselftest, Mrunal Patel, Ryan Phillips,
	Brent Rowsell, Peter Hunt, Cestmir Kalina,
	Nicolas Saenz Julienne, Alex Gladkov, Marcelo Tosatti, Phil Auld,
	Paul Gortmaker, Daniel Bristot de Oliveira, Juri Lelli,
	Peter Zijlstra, Costa Shulyupin


On 1/17/24 12:07, Tejun Heo wrote:
> Hello,
>
> On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
>> The first 2 patches are adopted from Federic with minor twists to fix
>> merge conflicts and compilation issue. The rests are for implementing
>> the new cpuset.cpus.isolation_full interface which is essentially a flag
>> to globally enable or disable full CPU isolation on isolated partitions.
> I think the interface is a bit premature. The cpuset partition feature is
> already pretty restrictive and makes it really clear that it's to isolate
> the CPUs. I think it'd be better to just enable all the isolation features
> by default. If there are valid use cases which can't be served without
> disabling some isolation features, we can worry about adding the interface
> at that point.

My current thought is to make isolated partitions act like 
isolcpus=domain, additional CPU isolation capabilities are optional and 
can be turned on using isolation_full. However, I am fine with making 
all these turned on by default if it is the consensus.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions
  2024-01-17 16:35 [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Waiman Long
                   ` (8 preceding siblings ...)
  2024-01-17 17:07 ` [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Tejun Heo
@ 2024-01-19 10:24 ` Paul E. McKenney
  2024-02-11  1:46   ` Waiman Long
  2024-01-22 15:07 ` Michal Koutný
  10 siblings, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2024-01-19 10:24 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Neeraj Upadhyay, Joel Fernandes, Josh Triplett,
	Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, Davidlohr Bueso, Shuah Khan, cgroups, linux-doc,
	linux-kernel, rcu, linux-kselftest, Mrunal Patel, Ryan Phillips,
	Brent Rowsell, Peter Hunt, Cestmir Kalina,
	Nicolas Saenz Julienne, Alex Gladkov, Marcelo Tosatti, Phil Auld,
	Paul Gortmaker, Daniel Bristot de Oliveira, Juri Lelli,
	Peter Zijlstra, Costa Shulyupin

On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
> This patch series is based on the RFC patch from Frederic [1]. Instead
> of offering RCU_NOCB as a separate option, it is now lumped into a
> root-only cpuset.cpus.isolation_full flag that will enable all the
> additional CPU isolation capabilities available for isolated partitions
> if set. RCU_NOCB is just the first one to this party. Additional dynamic
> CPU isolation capabilities will be added in the future.
> 
> The first 2 patches are adopted from Federic with minor twists to fix
> merge conflicts and compilation issue. The rests are for implementing
> the new cpuset.cpus.isolation_full interface which is essentially a flag
> to globally enable or disable full CPU isolation on isolated partitions.
> On read, it also shows the CPU isolation capabilities that are currently
> enabled. RCU_NOCB requires that the rcu_nocbs option be present in
> the kernel boot command line. Without that, the rcu_nocb functionality
> cannot be enabled even if the isolation_full flag is set. So we allow
> users to check the isolation_full file to verify that if the desired
> CPU isolation capability is enabled or not.
> 
> Only sanity checking has been done so far. More testing, especially on
> the RCU side, will be needed.

There has been some discussion of simplifying the (de-)offloading code
to handle only offline CPUs.  Along with some discussion of eliminating
the (de-)offloading capability altogehter.

We clearly should converge on the capability to be provided before
exposing this to userspace.  ;-)

							Thanx, Paul

> [1] https://lore.kernel.org/lkml/20220525221055.1152307-1-frederic@kernel.org/
> 
> Frederic Weisbecker (2):
>   rcu/nocb: Pass a cpumask instead of a single CPU to offload/deoffload
>   rcu/nocb: Prepare to change nocb cpumask from CPU-hotplug protected
>     cpuset caller
> 
> Waiman Long (6):
>   rcu/no_cb: Add rcu_nocb_enabled() to expose the rcu_nocb state
>   cgroup/cpuset: Better tracking of addition/deletion of isolated CPUs
>   cgroup/cpuset: Add cpuset.cpus.isolation_full
>   cgroup/cpuset: Enable dynamic rcu_nocb mode on isolated CPUs
>   cgroup/cpuset: Document the new cpuset.cpus.isolation_full control
>     file
>   cgroup/cpuset: Update test_cpuset_prs.sh to handle
>     cpuset.cpus.isolation_full
> 
>  Documentation/admin-guide/cgroup-v2.rst       |  24 ++
>  include/linux/rcupdate.h                      |  15 +-
>  kernel/cgroup/cpuset.c                        | 237 ++++++++++++++----
>  kernel/rcu/rcutorture.c                       |   6 +-
>  kernel/rcu/tree_nocb.h                        | 118 ++++++---
>  .../selftests/cgroup/test_cpuset_prs.sh       |  23 +-
>  6 files changed, 337 insertions(+), 86 deletions(-)
> 
> -- 
> 2.39.3
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions
  2024-01-17 16:35 [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Waiman Long
                   ` (9 preceding siblings ...)
  2024-01-19 10:24 ` Paul E. McKenney
@ 2024-01-22 15:07 ` Michal Koutný
  2024-01-23  5:50   ` Waiman Long
  10 siblings, 1 reply; 20+ messages in thread
From: Michal Koutný @ 2024-01-22 15:07 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan, cgroups, linux-doc, linux-kernel, rcu,
	linux-kselftest, Mrunal Patel, Ryan Phillips, Brent Rowsell,
	Peter Hunt, Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Marcelo Tosatti, Phil Auld, Paul Gortmaker,
	Daniel Bristot de Oliveira, Juri Lelli, Peter Zijlstra,
	Costa Shulyupin

[-- Attachment #1: Type: text/plain, Size: 838 bytes --]

Hello Waiman.

On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long <longman@redhat.com> wrote:
> This patch series is based on the RFC patch from Frederic [1]. Instead
> of offering RCU_NOCB as a separate option, it is now lumped into a
> root-only cpuset.cpus.isolation_full flag that will enable all the
> additional CPU isolation capabilities available for isolated partitions
> if set. RCU_NOCB is just the first one to this party. Additional dynamic
> CPU isolation capabilities will be added in the future.

IIUC this is similar to what I suggested back in the day and you didn't
consider it [1]. Do I read this right that you've changed your mind?

(It's fine if you did, I'm only asking to follow the heading of cpuset
controller.)

Thanks,
Michal

[1] https://lore.kernel.org/r/58c87587-417b-1498-185f-1db6bb612c82@redhat.com/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions
  2024-01-22 15:07 ` Michal Koutný
@ 2024-01-23  5:50   ` Waiman Long
  0 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2024-01-23  5:50 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan, cgroups, linux-doc, linux-kernel, rcu,
	linux-kselftest, Mrunal Patel, Ryan Phillips, Brent Rowsell,
	Peter Hunt, Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Marcelo Tosatti, Phil Auld, Paul Gortmaker,
	Daniel Bristot de Oliveira, Juri Lelli, Peter Zijlstra,
	Costa Shulyupin


On 1/22/24 10:07, Michal Koutný wrote:
> Hello Waiman.
>
> On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long <longman@redhat.com> wrote:
>> This patch series is based on the RFC patch from Frederic [1]. Instead
>> of offering RCU_NOCB as a separate option, it is now lumped into a
>> root-only cpuset.cpus.isolation_full flag that will enable all the
>> additional CPU isolation capabilities available for isolated partitions
>> if set. RCU_NOCB is just the first one to this party. Additional dynamic
>> CPU isolation capabilities will be added in the future.
> IIUC this is similar to what I suggested back in the day and you didn't
> consider it [1]. Do I read this right that you've changed your mind?

I didn't said that we were not going to do this at the time. It's just 
that more evaluation will need to be done before we are going to do 
this. I was also looking to see if there were use cases where such 
capabilities were needed. Now I am aware that such use cases do exist 
and we should start looking into it.

>
> (It's fine if you did, I'm only asking to follow the heading of cpuset
> controller.)

OK, the title of the cover-letter may be too specific. I will make it 
more general in the next version.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions
  2024-01-17 17:15   ` Waiman Long
@ 2024-02-06 12:56     ` Frederic Weisbecker
  2024-02-06 19:15       ` Marcelo Tosatti
  2024-02-10  4:19       ` Waiman Long
  0 siblings, 2 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2024-02-06 12:56 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Josh Triplett,
	Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, Davidlohr Bueso, Shuah Khan, cgroups, linux-doc,
	linux-kernel, rcu, linux-kselftest, Mrunal Patel, Ryan Phillips,
	Brent Rowsell, Peter Hunt, Cestmir Kalina,
	Nicolas Saenz Julienne, Alex Gladkov, Marcelo Tosatti, Phil Auld,
	Paul Gortmaker, Daniel Bristot de Oliveira, Juri Lelli,
	Peter Zijlstra, Costa Shulyupin

Le Wed, Jan 17, 2024 at 12:15:07PM -0500, Waiman Long a écrit :
> 
> On 1/17/24 12:07, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
> > > The first 2 patches are adopted from Federic with minor twists to fix
> > > merge conflicts and compilation issue. The rests are for implementing
> > > the new cpuset.cpus.isolation_full interface which is essentially a flag
> > > to globally enable or disable full CPU isolation on isolated partitions.
> > I think the interface is a bit premature. The cpuset partition feature is
> > already pretty restrictive and makes it really clear that it's to isolate
> > the CPUs. I think it'd be better to just enable all the isolation features
> > by default. If there are valid use cases which can't be served without
> > disabling some isolation features, we can worry about adding the interface
> > at that point.
> 
> My current thought is to make isolated partitions act like isolcpus=domain,
> additional CPU isolation capabilities are optional and can be turned on
> using isolation_full. However, I am fine with making all these turned on by
> default if it is the consensus.

Right it was the consensus last time I tried. Along with the fact that mutating
this isolation_full set has to be done on offline CPUs to simplify the whole
picture.

So lemme try to summarize what needs to be done:

1) An all-isolation feature file (that is, all the HK_TYPE_* things) on/off for
  now. And if it ever proves needed, provide a way later for more finegrained
  tuning.

2) This file must only apply to offline CPUs because it avoids migrations and
  stuff.

3) I need to make RCU NOCB tunable only on offline CPUs, which isn't that much
   changes.

4) HK_TYPE_TIMER:
   * Wrt. timers in general, not much needs to be done, the CPUs are
     offline. But:
   * arch/x86/kvm/x86.c does something weird
   * drivers/char/random.c might need some care
   * watchdog needs to be (de-)activated
   
5) HK_TYPE_DOMAIN:
   * This one I fear is not mutable, this is isolcpus...

6) HK_TYPE_MANAGED_IRQ:
   * I prefer not to think about it :-)

7) HK_TYPE_TICK:
   * Maybe some tiny ticks internals to revisit, I'll check that.
   * There is a remote tick to take into consideration, but again the
     CPUs are offline so it shouldn't be too complicated.

8) HK_TYPE_WQ:
   * Fortunately we already have all the mutable interface in place.
     But we must make it live nicely with the sysfs workqueue affinity
     files.

9) HK_FLAG_SCHED:
   * Oops, this one is ignored by nohz_full/isolcpus, isn't it?
   Should be removed?

10) HK_TYPE_RCU:
    * That's point 3) and also some kthreads to affine, which leads us
     to the following in HK_TYPE_KTHREAD:

11) HK_FLAG_KTHREAD:
    * I'm guessing it's fine as long as isolation_full is also an
      isolated partition. Then unbound kthreads shouldn't run there.

12) HK_TYPE_MISC:
    * Should be fine as ILB isn't running on offline CPUs.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions
  2024-02-06 12:56     ` Frederic Weisbecker
@ 2024-02-06 19:15       ` Marcelo Tosatti
  2024-02-07 14:47         ` Frederic Weisbecker
  2024-02-10  4:19       ` Waiman Long
  1 sibling, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2024-02-06 19:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Waiman Long, Tejun Heo, Zefan Li, Johannes Weiner,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan, cgroups, linux-doc, linux-kernel, rcu,
	linux-kselftest, Mrunal Patel, Ryan Phillips, Brent Rowsell,
	Peter Hunt, Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Phil Auld, Paul Gortmaker, Daniel Bristot de Oliveira,
	Juri Lelli, Peter Zijlstra, Costa Shulyupin

On Tue, Feb 06, 2024 at 01:56:23PM +0100, Frederic Weisbecker wrote:
> Le Wed, Jan 17, 2024 at 12:15:07PM -0500, Waiman Long a écrit :
> > 
> > On 1/17/24 12:07, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
> > > > The first 2 patches are adopted from Federic with minor twists to fix
> > > > merge conflicts and compilation issue. The rests are for implementing
> > > > the new cpuset.cpus.isolation_full interface which is essentially a flag
> > > > to globally enable or disable full CPU isolation on isolated partitions.
> > > I think the interface is a bit premature. The cpuset partition feature is
> > > already pretty restrictive and makes it really clear that it's to isolate
> > > the CPUs. I think it'd be better to just enable all the isolation features
> > > by default. If there are valid use cases which can't be served without
> > > disabling some isolation features, we can worry about adding the interface
> > > at that point.
> > 
> > My current thought is to make isolated partitions act like isolcpus=domain,
> > additional CPU isolation capabilities are optional and can be turned on
> > using isolation_full. However, I am fine with making all these turned on by
> > default if it is the consensus.
> 
> Right it was the consensus last time I tried. Along with the fact that mutating
> this isolation_full set has to be done on offline CPUs to simplify the whole
> picture.
> 
> So lemme try to summarize what needs to be done:
> 
> 1) An all-isolation feature file (that is, all the HK_TYPE_* things) on/off for
>   now. And if it ever proves needed, provide a way later for more finegrained
>   tuning.
> 
> 2) This file must only apply to offline CPUs because it avoids migrations and
>   stuff.
> 
> 3) I need to make RCU NOCB tunable only on offline CPUs, which isn't that much
>    changes.
> 
> 4) HK_TYPE_TIMER:
>    * Wrt. timers in general, not much needs to be done, the CPUs are
>      offline. But:
>    * arch/x86/kvm/x86.c does something weird
>    * drivers/char/random.c might need some care
>    * watchdog needs to be (de-)activated
>    
> 5) HK_TYPE_DOMAIN:
>    * This one I fear is not mutable, this is isolcpus...

Except for HK_TYPE_DOMAIN, i have never seen anyone use any of this
flags.

> 
> 6) HK_TYPE_MANAGED_IRQ:
>    * I prefer not to think about it :-)
> 
> 7) HK_TYPE_TICK:
>    * Maybe some tiny ticks internals to revisit, I'll check that.
>    * There is a remote tick to take into consideration, but again the
>      CPUs are offline so it shouldn't be too complicated.
> 
> 8) HK_TYPE_WQ:
>    * Fortunately we already have all the mutable interface in place.
>      But we must make it live nicely with the sysfs workqueue affinity
>      files.
> 
> 9) HK_FLAG_SCHED:
>    * Oops, this one is ignored by nohz_full/isolcpus, isn't it?
>    Should be removed?
> 
> 10) HK_TYPE_RCU:
>     * That's point 3) and also some kthreads to affine, which leads us
>      to the following in HK_TYPE_KTHREAD:
> 
> 11) HK_FLAG_KTHREAD:
>     * I'm guessing it's fine as long as isolation_full is also an
>       isolated partition. Then unbound kthreads shouldn't run there.
> 
> 12) HK_TYPE_MISC:
>     * Should be fine as ILB isn't running on offline CPUs.
> 
> Thanks.
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions
  2024-02-06 19:15       ` Marcelo Tosatti
@ 2024-02-07 14:47         ` Frederic Weisbecker
  2024-02-07 14:59           ` Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2024-02-07 14:47 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Waiman Long, Tejun Heo, Zefan Li, Johannes Weiner,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan, cgroups, linux-doc, linux-kernel, rcu,
	linux-kselftest, Mrunal Patel, Ryan Phillips, Brent Rowsell,
	Peter Hunt, Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Phil Auld, Paul Gortmaker, Daniel Bristot de Oliveira,
	Juri Lelli, Peter Zijlstra, Costa Shulyupin

Le Tue, Feb 06, 2024 at 04:15:18PM -0300, Marcelo Tosatti a écrit :
> On Tue, Feb 06, 2024 at 01:56:23PM +0100, Frederic Weisbecker wrote:
> > Le Wed, Jan 17, 2024 at 12:15:07PM -0500, Waiman Long a écrit :
> > > 
> > > On 1/17/24 12:07, Tejun Heo wrote:
> > > > Hello,
> > > > 
> > > > On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
> > > > > The first 2 patches are adopted from Federic with minor twists to fix
> > > > > merge conflicts and compilation issue. The rests are for implementing
> > > > > the new cpuset.cpus.isolation_full interface which is essentially a flag
> > > > > to globally enable or disable full CPU isolation on isolated partitions.
> > > > I think the interface is a bit premature. The cpuset partition feature is
> > > > already pretty restrictive and makes it really clear that it's to isolate
> > > > the CPUs. I think it'd be better to just enable all the isolation features
> > > > by default. If there are valid use cases which can't be served without
> > > > disabling some isolation features, we can worry about adding the interface
> > > > at that point.
> > > 
> > > My current thought is to make isolated partitions act like isolcpus=domain,
> > > additional CPU isolation capabilities are optional and can be turned on
> > > using isolation_full. However, I am fine with making all these turned on by
> > > default if it is the consensus.
> > 
> > Right it was the consensus last time I tried. Along with the fact that mutating
> > this isolation_full set has to be done on offline CPUs to simplify the whole
> > picture.
> > 
> > So lemme try to summarize what needs to be done:
> > 
> > 1) An all-isolation feature file (that is, all the HK_TYPE_* things) on/off for
> >   now. And if it ever proves needed, provide a way later for more finegrained
> >   tuning.
> > 
> > 2) This file must only apply to offline CPUs because it avoids migrations and
> >   stuff.
> > 
> > 3) I need to make RCU NOCB tunable only on offline CPUs, which isn't that much
> >    changes.
> > 
> > 4) HK_TYPE_TIMER:
> >    * Wrt. timers in general, not much needs to be done, the CPUs are
> >      offline. But:
> >    * arch/x86/kvm/x86.c does something weird
> >    * drivers/char/random.c might need some care
> >    * watchdog needs to be (de-)activated
> >    
> > 5) HK_TYPE_DOMAIN:
> >    * This one I fear is not mutable, this is isolcpus...
> 
> Except for HK_TYPE_DOMAIN, i have never seen anyone use any of this
> flags.

HK_TYPE_DOMAIN is used by isolcpus=domain,....
HK_TYPE_MANAGED_IRQ is used by isolcpus=managed_irq,...

All the others (except HK_TYPE_SCHED) are used by nohz_full=

Thanks.

> 
> > 
> > 6) HK_TYPE_MANAGED_IRQ:
> >    * I prefer not to think about it :-)
> > 
> > 7) HK_TYPE_TICK:
> >    * Maybe some tiny ticks internals to revisit, I'll check that.
> >    * There is a remote tick to take into consideration, but again the
> >      CPUs are offline so it shouldn't be too complicated.
> > 
> > 8) HK_TYPE_WQ:
> >    * Fortunately we already have all the mutable interface in place.
> >      But we must make it live nicely with the sysfs workqueue affinity
> >      files.
> > 
> > 9) HK_FLAG_SCHED:
> >    * Oops, this one is ignored by nohz_full/isolcpus, isn't it?
> >    Should be removed?
> > 
> > 10) HK_TYPE_RCU:
> >     * That's point 3) and also some kthreads to affine, which leads us
> >      to the following in HK_TYPE_KTHREAD:
> > 
> > 11) HK_FLAG_KTHREAD:
> >     * I'm guessing it's fine as long as isolation_full is also an
> >       isolated partition. Then unbound kthreads shouldn't run there.
> > 
> > 12) HK_TYPE_MISC:
> >     * Should be fine as ILB isn't running on offline CPUs.
> > 
> > Thanks.
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions
  2024-02-07 14:47         ` Frederic Weisbecker
@ 2024-02-07 14:59           ` Marcelo Tosatti
  0 siblings, 0 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2024-02-07 14:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Waiman Long, Tejun Heo, Zefan Li, Johannes Weiner,
	Jonathan Corbet, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Davidlohr Bueso,
	Shuah Khan, cgroups, linux-doc, linux-kernel, rcu,
	linux-kselftest, Mrunal Patel, Ryan Phillips, Brent Rowsell,
	Peter Hunt, Cestmir Kalina, Nicolas Saenz Julienne, Alex Gladkov,
	Phil Auld, Paul Gortmaker, Daniel Bristot de Oliveira,
	Juri Lelli, Peter Zijlstra, Costa Shulyupin

On Wed, Feb 07, 2024 at 03:47:46PM +0100, Frederic Weisbecker wrote:
> Le Tue, Feb 06, 2024 at 04:15:18PM -0300, Marcelo Tosatti a écrit :
> > On Tue, Feb 06, 2024 at 01:56:23PM +0100, Frederic Weisbecker wrote:
> > > Le Wed, Jan 17, 2024 at 12:15:07PM -0500, Waiman Long a écrit :
> > > > 
> > > > On 1/17/24 12:07, Tejun Heo wrote:
> > > > > Hello,
> > > > > 
> > > > > On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
> > > > > > The first 2 patches are adopted from Federic with minor twists to fix
> > > > > > merge conflicts and compilation issue. The rests are for implementing
> > > > > > the new cpuset.cpus.isolation_full interface which is essentially a flag
> > > > > > to globally enable or disable full CPU isolation on isolated partitions.
> > > > > I think the interface is a bit premature. The cpuset partition feature is
> > > > > already pretty restrictive and makes it really clear that it's to isolate
> > > > > the CPUs. I think it'd be better to just enable all the isolation features
> > > > > by default. If there are valid use cases which can't be served without
> > > > > disabling some isolation features, we can worry about adding the interface
> > > > > at that point.
> > > > 
> > > > My current thought is to make isolated partitions act like isolcpus=domain,
> > > > additional CPU isolation capabilities are optional and can be turned on
> > > > using isolation_full. However, I am fine with making all these turned on by
> > > > default if it is the consensus.
> > > 
> > > Right it was the consensus last time I tried. Along with the fact that mutating
> > > this isolation_full set has to be done on offline CPUs to simplify the whole
> > > picture.
> > > 
> > > So lemme try to summarize what needs to be done:
> > > 
> > > 1) An all-isolation feature file (that is, all the HK_TYPE_* things) on/off for
> > >   now. And if it ever proves needed, provide a way later for more finegrained
> > >   tuning.
> > > 
> > > 2) This file must only apply to offline CPUs because it avoids migrations and
> > >   stuff.
> > > 
> > > 3) I need to make RCU NOCB tunable only on offline CPUs, which isn't that much
> > >    changes.
> > > 
> > > 4) HK_TYPE_TIMER:
> > >    * Wrt. timers in general, not much needs to be done, the CPUs are
> > >      offline. But:
> > >    * arch/x86/kvm/x86.c does something weird
> > >    * drivers/char/random.c might need some care
> > >    * watchdog needs to be (de-)activated
> > >    
> > > 5) HK_TYPE_DOMAIN:
> > >    * This one I fear is not mutable, this is isolcpus...
> > 
> > Except for HK_TYPE_DOMAIN, i have never seen anyone use any of this
> > flags.
> 
> HK_TYPE_DOMAIN is used by isolcpus=domain,....

> HK_TYPE_MANAGED_IRQ is used by isolcpus=managed_irq,...
> 
> All the others (except HK_TYPE_SCHED) are used by nohz_full=

I mean i've never seen any use of the individual flags being set.

You either want full isolation (nohz_full and all the flags together,
except for HK_TYPE_DOMAIN which is sometimes enabled/disabled), or not.

So why not group them all together ?

Do you know of any separate uses of these flags (except for
HK_TYPE_DOMAIN).

> Thanks.
> 
> > 
> > > 
> > > 6) HK_TYPE_MANAGED_IRQ:
> > >    * I prefer not to think about it :-)
> > > 
> > > 7) HK_TYPE_TICK:
> > >    * Maybe some tiny ticks internals to revisit, I'll check that.
> > >    * There is a remote tick to take into consideration, but again the
> > >      CPUs are offline so it shouldn't be too complicated.
> > > 
> > > 8) HK_TYPE_WQ:
> > >    * Fortunately we already have all the mutable interface in place.
> > >      But we must make it live nicely with the sysfs workqueue affinity
> > >      files.
> > > 
> > > 9) HK_FLAG_SCHED:
> > >    * Oops, this one is ignored by nohz_full/isolcpus, isn't it?
> > >    Should be removed?
> > > 
> > > 10) HK_TYPE_RCU:
> > >     * That's point 3) and also some kthreads to affine, which leads us
> > >      to the following in HK_TYPE_KTHREAD:
> > > 
> > > 11) HK_FLAG_KTHREAD:
> > >     * I'm guessing it's fine as long as isolation_full is also an
> > >       isolated partition. Then unbound kthreads shouldn't run there.
> > > 
> > > 12) HK_TYPE_MISC:
> > >     * Should be fine as ILB isn't running on offline CPUs.
> > > 
> > > Thanks.
> > > 
> > > 
> > 
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions
  2024-02-06 12:56     ` Frederic Weisbecker
  2024-02-06 19:15       ` Marcelo Tosatti
@ 2024-02-10  4:19       ` Waiman Long
  1 sibling, 0 replies; 20+ messages in thread
From: Waiman Long @ 2024-02-10  4:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Jonathan Corbet,
	Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Josh Triplett,
	Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, Davidlohr Bueso, Shuah Khan, cgroups, linux-doc,
	linux-kernel, rcu, linux-kselftest, Mrunal Patel, Ryan Phillips,
	Brent Rowsell, Peter Hunt, Cestmir Kalina,
	Nicolas Saenz Julienne, Alex Gladkov, Marcelo Tosatti, Phil Auld,
	Paul Gortmaker, Daniel Bristot de Oliveira, Juri Lelli,
	Peter Zijlstra, Costa Shulyupin

On 2/6/24 07:56, Frederic Weisbecker wrote:
> Le Wed, Jan 17, 2024 at 12:15:07PM -0500, Waiman Long a écrit :
>> On 1/17/24 12:07, Tejun Heo wrote:
>>> Hello,
>>>
>>> On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
>>>> The first 2 patches are adopted from Federic with minor twists to fix
>>>> merge conflicts and compilation issue. The rests are for implementing
>>>> the new cpuset.cpus.isolation_full interface which is essentially a flag
>>>> to globally enable or disable full CPU isolation on isolated partitions.
>>> I think the interface is a bit premature. The cpuset partition feature is
>>> already pretty restrictive and makes it really clear that it's to isolate
>>> the CPUs. I think it'd be better to just enable all the isolation features
>>> by default. If there are valid use cases which can't be served without
>>> disabling some isolation features, we can worry about adding the interface
>>> at that point.
>> My current thought is to make isolated partitions act like isolcpus=domain,
>> additional CPU isolation capabilities are optional and can be turned on
>> using isolation_full. However, I am fine with making all these turned on by
>> default if it is the consensus.
> Right it was the consensus last time I tried. Along with the fact that mutating
> this isolation_full set has to be done on offline CPUs to simplify the whole
> picture.
>
> So lemme try to summarize what needs to be done:
>
> 1) An all-isolation feature file (that is, all the HK_TYPE_* things) on/off for
>    now. And if it ever proves needed, provide a way later for more finegrained
>    tuning.
That is more or less the current plan. As detailed below, HK_TYPE_DOMAIN 
& HK_TYPE_WQ isolation are included in the isolated partitions by 
default. I am also thinking about including other relatively cheap 
isolation flags by default. The expensive ones will have to be enabled 
via isolation_full.
>
> 2) This file must only apply to offline CPUs because it avoids migrations and
>    stuff.
Well, the process of first moving the CPUs offline first is rather 
expensive. I won't mind doing some partial offlining based on the 
existing set of teardown and bringup callbacks, but I would try to avoid 
fully offlining the CPUs first.
>
> 3) I need to make RCU NOCB tunable only on offline CPUs, which isn't that much
>     changes.
>
> 4) HK_TYPE_TIMER:
>     * Wrt. timers in general, not much needs to be done, the CPUs are
>       offline. But:
>     * arch/x86/kvm/x86.c does something weird
>     * drivers/char/random.c might need some care
>     * watchdog needs to be (de-)activated
>     
> 5) HK_TYPE_DOMAIN:
>     * This one I fear is not mutable, this is isolcpus...

HK_TYPE_DOMAIN is already available via the current cpuset isolated 
partition functionality. What I am currently doing is to extend that to 
other HK_TYPE* flags.


>
> 6) HK_TYPE_MANAGED_IRQ:
>     * I prefer not to think about it :-)
>
> 7) HK_TYPE_TICK:
>     * Maybe some tiny ticks internals to revisit, I'll check that.
>     * There is a remote tick to take into consideration, but again the
>       CPUs are offline so it shouldn't be too complicated.
>
> 8) HK_TYPE_WQ:
>     * Fortunately we already have all the mutable interface in place.
>       But we must make it live nicely with the sysfs workqueue affinity
>       files.

HK_TYPE_WQ is basically done and it is going to work properly with the 
workqueue affinity sysfs files. From the workqueue of view, HK_TYPE_WQ 
is currently treated the same as HK_TYPE_DOMAIN.

>
> 9) HK_FLAG_SCHED:
>     * Oops, this one is ignored by nohz_full/isolcpus, isn't it?
>     Should be removed?
I don't think HK_FLAG_SCHED is being used at all. So I believe we should 
remove it to avoid confusion.
>
> 10) HK_TYPE_RCU:
>      * That's point 3) and also some kthreads to affine, which leads us
>       to the following in HK_TYPE_KTHREAD:
>
> 11) HK_FLAG_KTHREAD:
>      * I'm guessing it's fine as long as isolation_full is also an
>        isolated partition. Then unbound kthreads shouldn't run there.

Yes, isolation_full applies only to isolated partitions. It extends the 
amount of CPU isolation by enabling all the other CPU available 
isolation flags.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions
  2024-01-19 10:24 ` Paul E. McKenney
@ 2024-02-11  1:46   ` Waiman Long
  0 siblings, 0 replies; 20+ messages in thread
From: Waiman Long @ 2024-02-11  1:46 UTC (permalink / raw)
  To: paulmck
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Frederic Weisbecker,
	Jonathan Corbet, Neeraj Upadhyay, Joel Fernandes, Josh Triplett,
	Boqun Feng, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, Davidlohr Bueso, Shuah Khan, cgroups, linux-doc,
	linux-kernel, rcu, linux-kselftest, Mrunal Patel, Ryan Phillips,
	Brent Rowsell, Peter Hunt, Cestmir Kalina,
	Nicolas Saenz Julienne, Alex Gladkov, Marcelo Tosatti, Phil Auld,
	Paul Gortmaker, Daniel Bristot de Oliveira, Juri Lelli,
	Peter Zijlstra, Costa Shulyupin

On 1/19/24 05:24, Paul E. McKenney wrote:
> On Wed, Jan 17, 2024 at 11:35:03AM -0500, Waiman Long wrote:
>> This patch series is based on the RFC patch from Frederic [1]. Instead
>> of offering RCU_NOCB as a separate option, it is now lumped into a
>> root-only cpuset.cpus.isolation_full flag that will enable all the
>> additional CPU isolation capabilities available for isolated partitions
>> if set. RCU_NOCB is just the first one to this party. Additional dynamic
>> CPU isolation capabilities will be added in the future.
>>
>> The first 2 patches are adopted from Federic with minor twists to fix
>> merge conflicts and compilation issue. The rests are for implementing
>> the new cpuset.cpus.isolation_full interface which is essentially a flag
>> to globally enable or disable full CPU isolation on isolated partitions.
>> On read, it also shows the CPU isolation capabilities that are currently
>> enabled. RCU_NOCB requires that the rcu_nocbs option be present in
>> the kernel boot command line. Without that, the rcu_nocb functionality
>> cannot be enabled even if the isolation_full flag is set. So we allow
>> users to check the isolation_full file to verify that if the desired
>> CPU isolation capability is enabled or not.
>>
>> Only sanity checking has been done so far. More testing, especially on
>> the RCU side, will be needed.
> There has been some discussion of simplifying the (de-)offloading code
> to handle only offline CPUs.  Along with some discussion of eliminating
> the (de-)offloading capability altogehter.
>
> We clearly should converge on the capability to be provided before
> exposing this to userspace.  ;-)

Would you mind giving me a pointer to the discussion of simplifying the 
de-offloading code to  handle only offline CPUs?

Thanks,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-02-11  1:46 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-17 16:35 [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Waiman Long
2024-01-17 16:35 ` [RFC PATCH 1/8] rcu/nocb: Pass a cpumask instead of a single CPU to offload/deoffload Waiman Long
2024-01-17 16:35 ` [RFC PATCH 2/8] rcu/nocb: Prepare to change nocb cpumask from CPU-hotplug protected cpuset caller Waiman Long
2024-01-17 16:35 ` [RFC PATCH 3/8] rcu/no_cb: Add rcu_nocb_enabled() to expose the rcu_nocb state Waiman Long
2024-01-17 16:35 ` [RFC PATCH 4/8] cgroup/cpuset: Better tracking of addition/deletion of isolated CPUs Waiman Long
2024-01-17 16:35 ` [RFC PATCH 5/8] cgroup/cpuset: Add cpuset.cpus.isolation_full Waiman Long
2024-01-17 16:35 ` [RFC PATCH 6/8] cgroup/cpuset: Enable dynamic rcu_nocb mode on isolated CPUs Waiman Long
2024-01-17 16:35 ` [RFC PATCH 7/8] cgroup/cpuset: Document the new cpuset.cpus.isolation_full control file Waiman Long
2024-01-17 16:35 ` [RFC PATCH 8/8] cgroup/cpuset: Update test_cpuset_prs.sh to handle cpuset.cpus.isolation_full Waiman Long
2024-01-17 17:07 ` [RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions Tejun Heo
2024-01-17 17:15   ` Waiman Long
2024-02-06 12:56     ` Frederic Weisbecker
2024-02-06 19:15       ` Marcelo Tosatti
2024-02-07 14:47         ` Frederic Weisbecker
2024-02-07 14:59           ` Marcelo Tosatti
2024-02-10  4:19       ` Waiman Long
2024-01-19 10:24 ` Paul E. McKenney
2024-02-11  1:46   ` Waiman Long
2024-01-22 15:07 ` Michal Koutný
2024-01-23  5:50   ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.