All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -tip V3 0/8] workqueue: break affinity initiatively
@ 2020-12-26  2:51 Lai Jiangshan
  2020-12-26  2:51 ` [PATCH -tip V3 1/8] workqueue: use cpu_possible_mask instead of cpu_active_mask to break affinity Lai Jiangshan
                   ` (8 more replies)
  0 siblings, 9 replies; 43+ messages in thread
From: Lai Jiangshan @ 2020-12-26  2:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Valentin Schneider, Peter Zijlstra, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

06249738a41a ("workqueue: Manually break affinity on hotplug")
said that scheduler will not force break affinity for us.

But workqueue highly depends on the old behavior. Many parts of the codes
relies on it, 06249738a41a ("workqueue: Manually break affinity on hotplug")
is not enough to change it, and the commit has flaws in itself too.

It doesn't handle for worker detachment.
It doesn't handle for worker attachement, especially worker creation
  which is handled by Valentin Schneider's patch [1].
It doesn't handle for unbound workers which might be possible
per-cpu-kthread.

We need to thoroughly update the way workqueue handles affinity
in cpu hot[un]plug, what is this patchset intends to do and
replace the Valentin Schneider's patch [1].  The equivalent patch
is patch 10.

The patchset is based on tip/master rather than workqueue tree,
because the patchset is a complement for 06249738a41a ("workqueue:
Manually break affinity on hotplug") which is only in tip/master by now.

And TJ acked to route the series through tip.

Changed from V2:
	Drop V2's patch4, which causes warning about setting cpumask
	online&!active to kthread reported by several people:
		Dexuan Cui <decui@microsoft.com>
		kernel test robot <oliver.sang@intel.com>

	Drop V2's patch 1, which can also cause warning about setting
	cpumask online&!active to kthread.  restore_unbound_workers_cpumask()
	is changed when we are bring cpu online.  And it cause V2's patch7
	(V3's patch5) to be changed accordingly.

	Marked patch8 Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

Changed from V1:
	Add TJ's acked-by for the whole patchset

	Add more words to the comments and the changelog, mainly derived
	from discussion with Peter.

	Update the comments as TJ suggested.
	
	Update a line of code as Valentin suggested.

	Add Valentin's ack for patch 10 because "Seems alright to me." and
	add Valentin's comments to the changelog which is integral.

[1]: https://lore.kernel.org/r/ff62e3ee994efb3620177bf7b19fab16f4866845.camel@redhat.com
[V1 patchset]: https://lore.kernel.org/lkml/20201214155457.3430-1-jiangshanlai@gmail.com/
[V2 patchset]: https://lore.kernel.org/lkml/20201218170919.2950-1-jiangshanlai@gmail.com/

Lai Jiangshan (8):
  workqueue: use cpu_possible_mask instead of cpu_active_mask to break
    affinity
  workqueue: Manually break affinity on pool detachment
  workqueue: introduce wq_online_cpumask
  workqueue: use wq_online_cpumask in restore_unbound_workers_cpumask()
  workqueue: Manually break affinity on hotplug for unbound pool
  workqueue: reorganize workqueue_online_cpu()
  workqueue: reorganize workqueue_offline_cpu() unbind_workers()
  workqueue: Fix affinity of kworkers when attaching into pool

 kernel/workqueue.c | 207 ++++++++++++++++++++++++++++-----------------
 1 file changed, 129 insertions(+), 78 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH -tip V3 1/8] workqueue: use cpu_possible_mask instead of cpu_active_mask to break affinity
  2020-12-26  2:51 [PATCH -tip V3 0/8] workqueue: break affinity initiatively Lai Jiangshan
@ 2020-12-26  2:51 ` Lai Jiangshan
  2020-12-26  2:51 ` [PATCH -tip V3 2/8] workqueue: Manually break affinity on pool detachment Lai Jiangshan
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 43+ messages in thread
From: Lai Jiangshan @ 2020-12-26  2:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Valentin Schneider, Peter Zijlstra, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo, Lai Jiangshan,
	Daniel Bristot de Oliveira

From: Lai Jiangshan <laijs@linux.alibaba.com>

The scheduler won't break affinity for us any more, and we should
"emulate" the same behavior when the scheduler breaks affinity for
us.  The behavior is "changing the cpumask to cpu_possible_mask".

And there might be some other CPUs online later while the worker is
still running with the pending work items.  The worker should be allowed
to use the later online CPUs as before and process the work items ASAP.
If we use cpu_active_mask here, we can't achieve this goal but
using cpu_possible_mask can.

Fixes: 06249738a41a ("workqueue: Manually break affinity on hotplug")
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c71da2a59e12..f2b8f3d458d1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4910,7 +4910,7 @@ static void unbind_workers(int cpu)
 		raw_spin_unlock_irq(&pool->lock);
 
 		for_each_pool_worker(worker, pool)
-			WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_active_mask) < 0);
+			WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
 
 		mutex_unlock(&wq_pool_attach_mutex);
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH -tip V3 2/8] workqueue: Manually break affinity on pool detachment
  2020-12-26  2:51 [PATCH -tip V3 0/8] workqueue: break affinity initiatively Lai Jiangshan
  2020-12-26  2:51 ` [PATCH -tip V3 1/8] workqueue: use cpu_possible_mask instead of cpu_active_mask to break affinity Lai Jiangshan
@ 2020-12-26  2:51 ` Lai Jiangshan
  2020-12-26  2:51 ` [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask Lai Jiangshan
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 43+ messages in thread
From: Lai Jiangshan @ 2020-12-26  2:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Valentin Schneider, Peter Zijlstra, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo, Lai Jiangshan,
	Daniel Bristot de Oliveira

From: Lai Jiangshan <laijs@linux.alibaba.com>

The pool->attrs->cpumask might be a single CPU and it may go
down after detachment, and the scheduler won't force to break
affinity for us since it is a per-cpu-ktrhead.  So we have to
do it on our own and unbind this worker which can't be unbound
by workqueue_offline_cpu() since it doesn't belong to any pool
after detachment.  Do it unconditionally for there is no harm
to break affinity for non-per-cpu-ktrhead and we don't need to
rely on the scheduler's policy on when to break affinity.

Fixes: 06249738a41a ("workqueue: Manually break affinity on hotplug")
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f2b8f3d458d1..ccbceacaea1b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1885,6 +1885,19 @@ static void worker_detach_from_pool(struct worker *worker)
 
 	if (list_empty(&pool->workers))
 		detach_completion = pool->detach_completion;
+
+	/*
+	 * The pool->attrs->cpumask might be a single CPU and it may go
+	 * down after detachment, and the scheduler won't force to break
+	 * affinity for us since it is a per-cpu-ktrhead.  So we have to
+	 * do it on our own and unbind this worker which can't be unbound
+	 * by workqueue_offline_cpu() since it doesn't belong to any pool
+	 * after detachment.  Do it unconditionally for there is no harm
+	 * to break affinity for non-per-cpu-ktrhead and we don't need to
+	 * rely on the scheduler's policy on when to break affinity.
+	 */
+	set_cpus_allowed_ptr(worker->task, cpu_possible_mask);
+
 	mutex_unlock(&wq_pool_attach_mutex);
 
 	/* clear leftover flags without pool->lock after it is detached */
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask
  2020-12-26  2:51 [PATCH -tip V3 0/8] workqueue: break affinity initiatively Lai Jiangshan
  2020-12-26  2:51 ` [PATCH -tip V3 1/8] workqueue: use cpu_possible_mask instead of cpu_active_mask to break affinity Lai Jiangshan
  2020-12-26  2:51 ` [PATCH -tip V3 2/8] workqueue: Manually break affinity on pool detachment Lai Jiangshan
@ 2020-12-26  2:51 ` Lai Jiangshan
  2021-01-04 13:56   ` Peter Zijlstra
  2020-12-26  2:51 ` [PATCH -tip V3 4/8] workqueue: use wq_online_cpumask in restore_unbound_workers_cpumask() Lai Jiangshan
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 43+ messages in thread
From: Lai Jiangshan @ 2020-12-26  2:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Valentin Schneider, Peter Zijlstra, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

wq_online_cpumask is the cached result of cpu_online_mask with the
going-down cpu cleared.  It is needed for later patches for setting
correct cpumask for workers and break affinity initiatively.

The first usage of wq_online_cpumask is also in this patch.
wq_calc_node_cpumask() and wq_update_unbound_numa() can be simplified
a little.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ccbceacaea1b..6f75f7ebeb17 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -310,6 +310,9 @@ static bool workqueue_freezing;		/* PL: have wqs started freezing? */
 /* PL: allowable cpus for unbound wqs and work items */
 static cpumask_var_t wq_unbound_cpumask;
 
+/* PL: online cpus (cpu_online_mask with the going-down cpu cleared) */
+static cpumask_var_t wq_online_cpumask;
+
 /* CPU where unbound work was last round robin scheduled from this CPU */
 static DEFINE_PER_CPU(int, wq_rr_cpu_last);
 
@@ -3825,12 +3828,10 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
  * wq_calc_node_cpumask - calculate a wq_attrs' cpumask for the specified node
  * @attrs: the wq_attrs of the default pwq of the target workqueue
  * @node: the target NUMA node
- * @cpu_going_down: if >= 0, the CPU to consider as offline
  * @cpumask: outarg, the resulting cpumask
  *
- * Calculate the cpumask a workqueue with @attrs should use on @node.  If
- * @cpu_going_down is >= 0, that cpu is considered offline during
- * calculation.  The result is stored in @cpumask.
+ * Calculate the cpumask a workqueue with @attrs should use on @node.
+ * The result is stored in @cpumask.
  *
  * If NUMA affinity is not enabled, @attrs->cpumask is always used.  If
  * enabled and @node has online CPUs requested by @attrs, the returned
@@ -3844,15 +3845,14 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
  * %false if equal.
  */
 static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
-				 int cpu_going_down, cpumask_t *cpumask)
+				 cpumask_t *cpumask)
 {
 	if (!wq_numa_enabled || attrs->no_numa)
 		goto use_dfl;
 
 	/* does @node have any online CPUs @attrs wants? */
 	cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
-	if (cpu_going_down >= 0)
-		cpumask_clear_cpu(cpu_going_down, cpumask);
+	cpumask_and(cpumask, cpumask, wq_online_cpumask);
 
 	if (cpumask_empty(cpumask))
 		goto use_dfl;
@@ -3961,7 +3961,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
 		goto out_free;
 
 	for_each_node(node) {
-		if (wq_calc_node_cpumask(new_attrs, node, -1, tmp_attrs->cpumask)) {
+		if (wq_calc_node_cpumask(new_attrs, node, tmp_attrs->cpumask)) {
 			ctx->pwq_tbl[node] = alloc_unbound_pwq(wq, tmp_attrs);
 			if (!ctx->pwq_tbl[node])
 				goto out_free;
@@ -4086,7 +4086,6 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
  * wq_update_unbound_numa - update NUMA affinity of a wq for CPU hot[un]plug
  * @wq: the target workqueue
  * @cpu: the CPU coming up or going down
- * @online: whether @cpu is coming up or going down
  *
  * This function is to be called from %CPU_DOWN_PREPARE, %CPU_ONLINE and
  * %CPU_DOWN_FAILED.  @cpu is being hot[un]plugged, update NUMA affinity of
@@ -4104,11 +4103,9 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
  * affinity, it's the user's responsibility to flush the work item from
  * CPU_DOWN_PREPARE.
  */
-static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
-				   bool online)
+static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu)
 {
 	int node = cpu_to_node(cpu);
-	int cpu_off = online ? -1 : cpu;
 	struct pool_workqueue *old_pwq = NULL, *pwq;
 	struct workqueue_attrs *target_attrs;
 	cpumask_t *cpumask;
@@ -4136,7 +4133,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
 	 * and create a new one if they don't match.  If the target cpumask
 	 * equals the default pwq's, the default pwq should be used.
 	 */
-	if (wq_calc_node_cpumask(wq->dfl_pwq->pool->attrs, node, cpu_off, cpumask)) {
+	if (wq_calc_node_cpumask(wq->dfl_pwq->pool->attrs, node, cpumask)) {
 		if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
 			return;
 	} else {
@@ -5069,6 +5066,7 @@ int workqueue_online_cpu(unsigned int cpu)
 	int pi;
 
 	mutex_lock(&wq_pool_mutex);
+	cpumask_set_cpu(cpu, wq_online_cpumask);
 
 	for_each_pool(pool, pi) {
 		mutex_lock(&wq_pool_attach_mutex);
@@ -5083,7 +5081,7 @@ int workqueue_online_cpu(unsigned int cpu)
 
 	/* update NUMA affinity of unbound workqueues */
 	list_for_each_entry(wq, &workqueues, list)
-		wq_update_unbound_numa(wq, cpu, true);
+		wq_update_unbound_numa(wq, cpu);
 
 	mutex_unlock(&wq_pool_mutex);
 	return 0;
@@ -5101,8 +5099,9 @@ int workqueue_offline_cpu(unsigned int cpu)
 
 	/* update NUMA affinity of unbound workqueues */
 	mutex_lock(&wq_pool_mutex);
+	cpumask_clear_cpu(cpu, wq_online_cpumask);
 	list_for_each_entry(wq, &workqueues, list)
-		wq_update_unbound_numa(wq, cpu, false);
+		wq_update_unbound_numa(wq, cpu);
 	mutex_unlock(&wq_pool_mutex);
 
 	return 0;
@@ -5939,6 +5938,9 @@ void __init workqueue_init_early(void)
 
 	BUILD_BUG_ON(__alignof__(struct pool_workqueue) < __alignof__(long long));
 
+	BUG_ON(!alloc_cpumask_var(&wq_online_cpumask, GFP_KERNEL));
+	cpumask_copy(wq_online_cpumask, cpu_online_mask);
+
 	BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL));
 	cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(hk_flags));
 
@@ -6035,7 +6037,7 @@ void __init workqueue_init(void)
 	}
 
 	list_for_each_entry(wq, &workqueues, list) {
-		wq_update_unbound_numa(wq, smp_processor_id(), true);
+		wq_update_unbound_numa(wq, smp_processor_id());
 		WARN(init_rescuer(wq),
 		     "workqueue: failed to create early rescuer for %s",
 		     wq->name);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH -tip V3 4/8] workqueue: use wq_online_cpumask in restore_unbound_workers_cpumask()
  2020-12-26  2:51 [PATCH -tip V3 0/8] workqueue: break affinity initiatively Lai Jiangshan
                   ` (2 preceding siblings ...)
  2020-12-26  2:51 ` [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask Lai Jiangshan
@ 2020-12-26  2:51 ` Lai Jiangshan
  2020-12-26  2:51 ` [PATCH -tip V3 5/8] workqueue: Manually break affinity on hotplug for unbound pool Lai Jiangshan
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 43+ messages in thread
From: Lai Jiangshan @ 2020-12-26  2:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Valentin Schneider, Peter Zijlstra, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

restore_unbound_workers_cpumask() is called when CPU_ONLINE, where
wq_online_cpumask equals to cpu_online_mask. So no fucntionality
changed.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6f75f7ebeb17..0a95ae14d46f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5033,13 +5033,14 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 	static cpumask_t cpumask;
 	struct worker *worker;
 
+	lockdep_assert_held(&wq_pool_mutex);
 	lockdep_assert_held(&wq_pool_attach_mutex);
 
 	/* is @cpu allowed for @pool? */
 	if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
 		return;
 
-	cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask);
+	cpumask_and(&cpumask, pool->attrs->cpumask, wq_online_cpumask);
 
 	/* as we're called from CPU_ONLINE, the following shouldn't fail */
 	for_each_pool_worker(worker, pool)
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH -tip V3 5/8] workqueue: Manually break affinity on hotplug for unbound pool
  2020-12-26  2:51 [PATCH -tip V3 0/8] workqueue: break affinity initiatively Lai Jiangshan
                   ` (3 preceding siblings ...)
  2020-12-26  2:51 ` [PATCH -tip V3 4/8] workqueue: use wq_online_cpumask in restore_unbound_workers_cpumask() Lai Jiangshan
@ 2020-12-26  2:51 ` Lai Jiangshan
       [not found]   ` <20201226101631.5448-1-hdanton@sina.com>
  2020-12-26  2:51 ` [PATCH -tip V3 6/8] workqueue: reorganize workqueue_online_cpu() Lai Jiangshan
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 43+ messages in thread
From: Lai Jiangshan @ 2020-12-26  2:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Valentin Schneider, Peter Zijlstra, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo, Lai Jiangshan,
	Daniel Bristot de Oliveira

From: Lai Jiangshan <laijs@linux.alibaba.com>

There is possible that a per-node pool/woker's affinity is a single
CPU.  It can happen when the workqueue user changes the cpumask of the
workqueue or when wq_unbound_cpumask is changed by system adim via
/sys/devices/virtual/workqueue/cpumask.  And pool->attrs->cpumask
is workqueue's cpumask & wq_unbound_cpumask & possible_cpumask_of_the_node,
which can be a single CPU and makes the pool's workers to be "per cpu
kthread".

And it can also happen when the cpu is the first online and has been
the only online cpu in pool->attrs->cpumask.  In this case, the worker
task cpumask is single cpu no matter what pool->attrs->cpumask since
commit d945b5e9f0e3 ("workqueue: Fix setting affinity of unbound worker
threads").

And the scheduler won't break affinity on the "per cpu kthread" workers
when the CPU is going down, so we have to do it by our own.

We do it by reusing existing restore_unbound_workers_cpumask() and rename
it to update_unbound_workers_cpumask().  When the number of the online
CPU of the pool goes from 1 to 0, we break the affinity initiatively.

Note here, we even break the affinity for non-per-cpu-kthread workers,
because first, the code path is slow path which is not worth too much to
optimize, second, we don't need to rely on the code/conditions when the
scheduler forces breaking affinity for us.

The way to break affinity is to set the workers' affinity to
cpu_possible_mask, so that we preserve the same behavisor when
the scheduler breaks affinity for us.

Fixes: 06249738a41a ("workqueue: Manually break affinity on hotplug")
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 48 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0a95ae14d46f..79cc87df0cda 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5019,16 +5019,18 @@ static void rebind_workers(struct worker_pool *pool)
 }
 
 /**
- * restore_unbound_workers_cpumask - restore cpumask of unbound workers
+ * update_unbound_workers_cpumask - update cpumask of unbound workers
  * @pool: unbound pool of interest
- * @cpu: the CPU which is coming up
+ * @online: whether @cpu is coming up or going down
+ * @cpu: the CPU which is coming up or going down
  *
  * An unbound pool may end up with a cpumask which doesn't have any online
- * CPUs.  When a worker of such pool get scheduled, the scheduler resets
- * its cpus_allowed.  If @cpu is in @pool's cpumask which didn't have any
- * online CPU before, cpus_allowed of all its workers should be restored.
+ * CPUs.  We have to reset workers' cpus_allowed of such pool.  And we
+ * restore the workers' cpus_allowed when the pool's cpumask has online
+ * CPU.
  */
-static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
+static void update_unbound_workers_cpumask(struct worker_pool *pool,
+					   bool online, int cpu)
 {
 	static cpumask_t cpumask;
 	struct worker *worker;
@@ -5042,6 +5044,23 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 
 	cpumask_and(&cpumask, pool->attrs->cpumask, wq_online_cpumask);
 
+	if (!online) {
+		if (cpumask_weight(&cpumask) > 0)
+			return;
+		/*
+		 * All unbound workers can be possibly "per cpu kthread"
+		 * if this is the only online CPU in pool->attrs->cpumask
+		 * from the last time it has been brought up until now.
+		 * And the scheduler won't break affinity on the "per cpu
+		 * kthread" workers when the CPU is going down, so we have
+		 * to do it by our own.
+		 */
+		for_each_pool_worker(worker, pool)
+			WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
+
+		return;
+	}
+
 	/* as we're called from CPU_ONLINE, the following shouldn't fail */
 	for_each_pool_worker(worker, pool)
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0);
@@ -5075,7 +5094,7 @@ int workqueue_online_cpu(unsigned int cpu)
 		if (pool->cpu == cpu)
 			rebind_workers(pool);
 		else if (pool->cpu < 0)
-			restore_unbound_workers_cpumask(pool, cpu);
+			update_unbound_workers_cpumask(pool, true, cpu);
 
 		mutex_unlock(&wq_pool_attach_mutex);
 	}
@@ -5090,7 +5109,9 @@ int workqueue_online_cpu(unsigned int cpu)
 
 int workqueue_offline_cpu(unsigned int cpu)
 {
+	struct worker_pool *pool;
 	struct workqueue_struct *wq;
+	int pi;
 
 	/* unbinding per-cpu workers should happen on the local CPU */
 	if (WARN_ON(cpu != smp_processor_id()))
@@ -5098,9 +5119,20 @@ int workqueue_offline_cpu(unsigned int cpu)
 
 	unbind_workers(cpu);
 
-	/* update NUMA affinity of unbound workqueues */
 	mutex_lock(&wq_pool_mutex);
 	cpumask_clear_cpu(cpu, wq_online_cpumask);
+
+	/* update CPU affinity of workers of unbound pools */
+	for_each_pool(pool, pi) {
+		mutex_lock(&wq_pool_attach_mutex);
+
+		if (pool->cpu < 0)
+			update_unbound_workers_cpumask(pool, false, cpu);
+
+		mutex_unlock(&wq_pool_attach_mutex);
+	}
+
+	/* update NUMA affinity of unbound workqueues */
 	list_for_each_entry(wq, &workqueues, list)
 		wq_update_unbound_numa(wq, cpu);
 	mutex_unlock(&wq_pool_mutex);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH -tip V3 6/8] workqueue: reorganize workqueue_online_cpu()
  2020-12-26  2:51 [PATCH -tip V3 0/8] workqueue: break affinity initiatively Lai Jiangshan
                   ` (4 preceding siblings ...)
  2020-12-26  2:51 ` [PATCH -tip V3 5/8] workqueue: Manually break affinity on hotplug for unbound pool Lai Jiangshan
@ 2020-12-26  2:51 ` Lai Jiangshan
  2020-12-26  2:51 ` [PATCH -tip V3 7/8] workqueue: reorganize workqueue_offline_cpu() unbind_workers() Lai Jiangshan
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 43+ messages in thread
From: Lai Jiangshan @ 2020-12-26  2:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Valentin Schneider, Peter Zijlstra, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

Just move around the code, no functionality changed.

It prepares for later patch protecting wq_online_cpumask
in wq_pool_attach_mutex.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 79cc87df0cda..94545e6feda5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5088,12 +5088,17 @@ int workqueue_online_cpu(unsigned int cpu)
 	mutex_lock(&wq_pool_mutex);
 	cpumask_set_cpu(cpu, wq_online_cpumask);
 
+	for_each_cpu_worker_pool(pool, cpu) {
+		mutex_lock(&wq_pool_attach_mutex);
+		rebind_workers(pool);
+		mutex_unlock(&wq_pool_attach_mutex);
+	}
+
+	/* update CPU affinity of workers of unbound pools */
 	for_each_pool(pool, pi) {
 		mutex_lock(&wq_pool_attach_mutex);
 
-		if (pool->cpu == cpu)
-			rebind_workers(pool);
-		else if (pool->cpu < 0)
+		if (pool->cpu < 0)
 			update_unbound_workers_cpumask(pool, true, cpu);
 
 		mutex_unlock(&wq_pool_attach_mutex);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH -tip V3 7/8] workqueue: reorganize workqueue_offline_cpu() unbind_workers()
  2020-12-26  2:51 [PATCH -tip V3 0/8] workqueue: break affinity initiatively Lai Jiangshan
                   ` (5 preceding siblings ...)
  2020-12-26  2:51 ` [PATCH -tip V3 6/8] workqueue: reorganize workqueue_online_cpu() Lai Jiangshan
@ 2020-12-26  2:51 ` Lai Jiangshan
  2020-12-26  2:51 ` [PATCH -tip V3 8/8] workqueue: Fix affinity of kworkers when attaching into pool Lai Jiangshan
  2021-01-08 11:46 ` [PATCH -tip V3 0/8] workqueue: break affinity initiatively Peter Zijlstra
  8 siblings, 0 replies; 43+ messages in thread
From: Lai Jiangshan @ 2020-12-26  2:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Valentin Schneider, Peter Zijlstra, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

Just move around the code, no functionality changed.
Only wq_pool_attach_mutex protected region becomes a little larger.

It prepares for later patch protecting wq_online_cpumask
in wq_pool_attach_mutex.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 90 +++++++++++++++++++++++-----------------------
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 94545e6feda5..dd32398edf55 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4896,61 +4896,57 @@ void wq_worker_comm(char *buf, size_t size, struct task_struct *task)
  * cpu comes back online.
  */
 
-static void unbind_workers(int cpu)
+static void unbind_workers(struct worker_pool *pool)
 {
-	struct worker_pool *pool;
 	struct worker *worker;
 
-	for_each_cpu_worker_pool(pool, cpu) {
-		mutex_lock(&wq_pool_attach_mutex);
-		raw_spin_lock_irq(&pool->lock);
+	lockdep_assert_held(&wq_pool_attach_mutex);
 
-		/*
-		 * We've blocked all attach/detach operations. Make all workers
-		 * unbound and set DISASSOCIATED.  Before this, all workers
-		 * except for the ones which are still executing works from
-		 * before the last CPU down must be on the cpu.  After
-		 * this, they may become diasporas.
-		 */
-		for_each_pool_worker(worker, pool)
-			worker->flags |= WORKER_UNBOUND;
+	raw_spin_lock_irq(&pool->lock);
 
-		pool->flags |= POOL_DISASSOCIATED;
+	/*
+	 * We've blocked all attach/detach operations. Make all workers
+	 * unbound and set DISASSOCIATED.  Before this, all workers
+	 * except for the ones which are still executing works from
+	 * before the last CPU down must be on the cpu.  After
+	 * this, they may become diasporas.
+	 */
+	for_each_pool_worker(worker, pool)
+		worker->flags |= WORKER_UNBOUND;
 
-		raw_spin_unlock_irq(&pool->lock);
+	pool->flags |= POOL_DISASSOCIATED;
 
-		for_each_pool_worker(worker, pool)
-			WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
+	raw_spin_unlock_irq(&pool->lock);
 
-		mutex_unlock(&wq_pool_attach_mutex);
+	for_each_pool_worker(worker, pool)
+		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
 
-		/*
-		 * Call schedule() so that we cross rq->lock and thus can
-		 * guarantee sched callbacks see the %WORKER_UNBOUND flag.
-		 * This is necessary as scheduler callbacks may be invoked
-		 * from other cpus.
-		 */
-		schedule();
+	/*
+	 * Call schedule() so that we cross rq->lock and thus can
+	 * guarantee sched callbacks see the %WORKER_UNBOUND flag.
+	 * This is necessary as scheduler callbacks may be invoked
+	 * from other cpus.
+	 */
+	schedule();
 
-		/*
-		 * Sched callbacks are disabled now.  Zap nr_running.
-		 * After this, nr_running stays zero and need_more_worker()
-		 * and keep_working() are always true as long as the
-		 * worklist is not empty.  This pool now behaves as an
-		 * unbound (in terms of concurrency management) pool which
-		 * are served by workers tied to the pool.
-		 */
-		atomic_set(&pool->nr_running, 0);
+	/*
+	 * Sched callbacks are disabled now.  Zap nr_running.
+	 * After this, nr_running stays zero and need_more_worker()
+	 * and keep_working() are always true as long as the
+	 * worklist is not empty.  This pool now behaves as an
+	 * unbound (in terms of concurrency management) pool which
+	 * are served by workers tied to the pool.
+	 */
+	atomic_set(&pool->nr_running, 0);
 
-		/*
-		 * With concurrency management just turned off, a busy
-		 * worker blocking could lead to lengthy stalls.  Kick off
-		 * unbound chain execution of currently pending work items.
-		 */
-		raw_spin_lock_irq(&pool->lock);
-		wake_up_worker(pool);
-		raw_spin_unlock_irq(&pool->lock);
-	}
+	/*
+	 * With concurrency management just turned off, a busy
+	 * worker blocking could lead to lengthy stalls.  Kick off
+	 * unbound chain execution of currently pending work items.
+	 */
+	raw_spin_lock_irq(&pool->lock);
+	wake_up_worker(pool);
+	raw_spin_unlock_irq(&pool->lock);
 }
 
 /**
@@ -5122,7 +5118,11 @@ int workqueue_offline_cpu(unsigned int cpu)
 	if (WARN_ON(cpu != smp_processor_id()))
 		return -1;
 
-	unbind_workers(cpu);
+	for_each_cpu_worker_pool(pool, cpu) {
+		mutex_lock(&wq_pool_attach_mutex);
+		unbind_workers(pool);
+		mutex_unlock(&wq_pool_attach_mutex);
+	}
 
 	mutex_lock(&wq_pool_mutex);
 	cpumask_clear_cpu(cpu, wq_online_cpumask);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH -tip V3 8/8] workqueue: Fix affinity of kworkers when attaching into pool
  2020-12-26  2:51 [PATCH -tip V3 0/8] workqueue: break affinity initiatively Lai Jiangshan
                   ` (6 preceding siblings ...)
  2020-12-26  2:51 ` [PATCH -tip V3 7/8] workqueue: reorganize workqueue_offline_cpu() unbind_workers() Lai Jiangshan
@ 2020-12-26  2:51 ` Lai Jiangshan
       [not found]   ` <20201229100639.2086-1-hdanton@sina.com>
  2021-01-08 11:46 ` [PATCH -tip V3 0/8] workqueue: break affinity initiatively Peter Zijlstra
  8 siblings, 1 reply; 43+ messages in thread
From: Lai Jiangshan @ 2020-12-26  2:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Valentin Schneider, Peter Zijlstra, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo, Lai Jiangshan

From: Lai Jiangshan <laijs@linux.alibaba.com>

When worker_attach_to_pool() is called, we should not put the workers
to pool->attrs->cpumask when there is not CPU online in it.

We have to use wq_online_cpumask in worker_attach_to_pool() to check
if pool->attrs->cpumask is valid rather than cpu_online_mask or
cpu_active_mask due to gaps between stages in cpu hot[un]plug.

So for that late-spawned per-CPU kworker case: the outgoing CPU should have
already been cleared from wq_online_cpumask, so it gets its affinity reset
to the possible mask and the subsequent wakeup will ensure it's put on an
active CPU.

To use wq_online_cpumask in worker_attach_to_pool(), we need to protect
wq_online_cpumask in wq_pool_attach_mutex and we modify workqueue_online_cpu()
and workqueue_offline_cpu() to enlarge wq_pool_attach_mutex protected
region. We also put updating wq_online_cpumask and [re|un]bind_workers()
in the same wq_pool_attach_mutex protected region to make the update
for percpu workqueue atomically.

Cc: Qian Cai <cai@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Donnefort <vincent.donnefort@arm.com>
Link: https://lore.kernel.org/lkml/20201210163830.21514-3-valentin.schneider@arm.com/
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---
 kernel/workqueue.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index dd32398edf55..25d50050257c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -310,7 +310,7 @@ static bool workqueue_freezing;		/* PL: have wqs started freezing? */
 /* PL: allowable cpus for unbound wqs and work items */
 static cpumask_var_t wq_unbound_cpumask;
 
-/* PL: online cpus (cpu_online_mask with the going-down cpu cleared) */
+/* PL&A: online cpus (cpu_online_mask with the going-down cpu cleared) */
 static cpumask_var_t wq_online_cpumask;
 
 /* CPU where unbound work was last round robin scheduled from this CPU */
@@ -1848,11 +1848,11 @@ static void worker_attach_to_pool(struct worker *worker,
 {
 	mutex_lock(&wq_pool_attach_mutex);
 
-	/*
-	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
-	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
-	 */
-	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+	/* Is there any cpu in pool->attrs->cpumask online? */
+	if (cpumask_intersects(pool->attrs->cpumask, wq_online_cpumask))
+		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask) < 0);
+	else
+		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
 
 	/*
 	 * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
@@ -5082,13 +5082,12 @@ int workqueue_online_cpu(unsigned int cpu)
 	int pi;
 
 	mutex_lock(&wq_pool_mutex);
-	cpumask_set_cpu(cpu, wq_online_cpumask);
 
-	for_each_cpu_worker_pool(pool, cpu) {
-		mutex_lock(&wq_pool_attach_mutex);
+	mutex_lock(&wq_pool_attach_mutex);
+	cpumask_set_cpu(cpu, wq_online_cpumask);
+	for_each_cpu_worker_pool(pool, cpu)
 		rebind_workers(pool);
-		mutex_unlock(&wq_pool_attach_mutex);
-	}
+	mutex_unlock(&wq_pool_attach_mutex);
 
 	/* update CPU affinity of workers of unbound pools */
 	for_each_pool(pool, pi) {
@@ -5118,14 +5117,13 @@ int workqueue_offline_cpu(unsigned int cpu)
 	if (WARN_ON(cpu != smp_processor_id()))
 		return -1;
 
-	for_each_cpu_worker_pool(pool, cpu) {
-		mutex_lock(&wq_pool_attach_mutex);
-		unbind_workers(pool);
-		mutex_unlock(&wq_pool_attach_mutex);
-	}
-
 	mutex_lock(&wq_pool_mutex);
+
+	mutex_lock(&wq_pool_attach_mutex);
 	cpumask_clear_cpu(cpu, wq_online_cpumask);
+	for_each_cpu_worker_pool(pool, cpu)
+		unbind_workers(pool);
+	mutex_unlock(&wq_pool_attach_mutex);
 
 	/* update CPU affinity of workers of unbound pools */
 	for_each_pool(pool, pi) {
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 5/8] workqueue: Manually break affinity on hotplug for unbound pool
       [not found]   ` <20201226101631.5448-1-hdanton@sina.com>
@ 2020-12-27 14:04     ` Lai Jiangshan
  0 siblings, 0 replies; 43+ messages in thread
From: Lai Jiangshan @ 2020-12-27 14:04 UTC (permalink / raw)
  To: Hillf Danton
  Cc: LKML, Valentin Schneider, Peter Zijlstra, Qian Cai,
	Vincent Donnefort, Dexuan Cui, Lai Jiangshan, Tejun Heo,
	Daniel Bristot de Oliveira

On Sat, Dec 26, 2020 at 6:16 PM Hillf Danton <hdanton@sina.com> wrote:
>
> Sat, 26 Dec 2020 10:51:13 +0800
> > From: Lai Jiangshan <laijs@linux.alibaba.com>
> >
> > There is possible that a per-node pool/woker's affinity is a single
> > CPU.  It can happen when the workqueue user changes the cpumask of the
> > workqueue or when wq_unbound_cpumask is changed by system adim via
> > /sys/devices/virtual/workqueue/cpumask.  And pool->attrs->cpumask
> > is workqueue's cpumask & wq_unbound_cpumask & possible_cpumask_of_the_node,
> > which can be a single CPU and makes the pool's workers to be "per cpu
> > kthread".
> >
> > And it can also happen when the cpu is the first online and has been
> > the only online cpu in pool->attrs->cpumask.  In this case, the worker
> > task cpumask is single cpu no matter what pool->attrs->cpumask since
> > commit d945b5e9f0e3 ("workqueue: Fix setting affinity of unbound worker
> > threads").
> >
> > And the scheduler won't break affinity on the "per cpu kthread" workers
> > when the CPU is going down, so we have to do it by our own.
> >
> > We do it by reusing existing restore_unbound_workers_cpumask() and rename
> > it to update_unbound_workers_cpumask().  When the number of the online
> > CPU of the pool goes from 1 to 0, we break the affinity initiatively.
> >
> > Note here, we even break the affinity for non-per-cpu-kthread workers,
> > because first, the code path is slow path which is not worth too much to
> > optimize, second, we don't need to rely on the code/conditions when the
> > scheduler forces breaking affinity for us.
> >
> > The way to break affinity is to set the workers' affinity to
> > cpu_possible_mask, so that we preserve the same behavisor when
> > the scheduler breaks affinity for us.
> >
> > Fixes: 06249738a41a ("workqueue: Manually break affinity on hotplug")
> > Acked-by: Tejun Heo <tj@kernel.org>
> > Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> > ---
> >  kernel/workqueue.c | 48 ++++++++++++++++++++++++++++++++++++++--------
> >  1 file changed, 40 insertions(+), 8 deletions(-)
> >
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index 0a95ae14d46f..79cc87df0cda 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -5019,16 +5019,18 @@ static void rebind_workers(struct worker_pool *pool)
> >  }
> >
> >  /**
> > - * restore_unbound_workers_cpumask - restore cpumask of unbound workers
> > + * update_unbound_workers_cpumask - update cpumask of unbound workers
> >   * @pool: unbound pool of interest
> > - * @cpu: the CPU which is coming up
> > + * @online: whether @cpu is coming up or going down
> > + * @cpu: the CPU which is coming up or going down
> >   *
> >   * An unbound pool may end up with a cpumask which doesn't have any online
> > - * CPUs.  When a worker of such pool get scheduled, the scheduler resets
> > - * its cpus_allowed.  If @cpu is in @pool's cpumask which didn't have any
> > - * online CPU before, cpus_allowed of all its workers should be restored.
> > + * CPUs.  We have to reset workers' cpus_allowed of such pool.  And we
> > + * restore the workers' cpus_allowed when the pool's cpumask has online
> > + * CPU.
> >   */
> > -static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
> > +static void update_unbound_workers_cpumask(struct worker_pool *pool,
> > +                                        bool online, int cpu)
> >  {
> >       static cpumask_t cpumask;
> >       struct worker *worker;
> > @@ -5042,6 +5044,23 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
> >
> >       cpumask_and(&cpumask, pool->attrs->cpumask, wq_online_cpumask);
> >
> > +     if (!online) {
> > +             if (cpumask_weight(&cpumask) > 0)
> > +                     return;
>
> We can apply the weight check also to the online case.
>
> > +             /*
> > +              * All unbound workers can be possibly "per cpu kthread"
> > +              * if this is the only online CPU in pool->attrs->cpumask
> > +              * from the last time it has been brought up until now.
> > +              * And the scheduler won't break affinity on the "per cpu
> > +              * kthread" workers when the CPU is going down, so we have
> > +              * to do it by our own.
> > +              */
> > +             for_each_pool_worker(worker, pool)
> > +                     WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
> > +
> > +             return;
> > +     }
> > +
> >       /* as we're called from CPU_ONLINE, the following shouldn't fail */
> >       for_each_pool_worker(worker, pool)
> >               WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0);
>
> What is the reason that pool->attrs->cpumask is not restored if it is
> not a typo, given that restore appears in the change to the above doc?

reason:

d945b5e9f0e3 ("workqueue: Fix setting affinity of unbound worker
threads").

I don't like this change either, but I don't want to touch it
now.  I will improve it late by moving handling for unbound wq/pool/worker
to a work item (out of cpu hotplug processing) and so that
we can restore pool->attrs->cpumask to workers.

The reason is also the reason I drop the patch1 of the V2 patch.

Did you see any problem with d945b5e9f0e3 except for that it does not
update the comment and it is not so efficient.

>
> BTW is there a git tree available with this patchset tucked in?
>
> > @@ -5075,7 +5094,7 @@ int workqueue_online_cpu(unsigned int cpu)
> >               if (pool->cpu == cpu)
> >                       rebind_workers(pool);
> >               else if (pool->cpu < 0)
> > -                     restore_unbound_workers_cpumask(pool, cpu);
> > +                     update_unbound_workers_cpumask(pool, true, cpu);
> >
> >               mutex_unlock(&wq_pool_attach_mutex);
> >       }
> > @@ -5090,7 +5109,9 @@ int workqueue_online_cpu(unsigned int cpu)
> >
> >  int workqueue_offline_cpu(unsigned int cpu)
> >  {
> > +     struct worker_pool *pool;
> >       struct workqueue_struct *wq;
> > +     int pi;
> >
> >       /* unbinding per-cpu workers should happen on the local CPU */
> >       if (WARN_ON(cpu != smp_processor_id()))
> > @@ -5098,9 +5119,20 @@ int workqueue_offline_cpu(unsigned int cpu)
> >
> >       unbind_workers(cpu);
> >
> > -     /* update NUMA affinity of unbound workqueues */
> >       mutex_lock(&wq_pool_mutex);
> >       cpumask_clear_cpu(cpu, wq_online_cpumask);
> > +
> > +     /* update CPU affinity of workers of unbound pools */
> > +     for_each_pool(pool, pi) {
> > +             mutex_lock(&wq_pool_attach_mutex);
> > +
> > +             if (pool->cpu < 0)
> > +                     update_unbound_workers_cpumask(pool, false, cpu);
> > +
> > +             mutex_unlock(&wq_pool_attach_mutex);
> > +     }
> > +
> > +     /* update NUMA affinity of unbound workqueues */
> >       list_for_each_entry(wq, &workqueues, list)
> >               wq_update_unbound_numa(wq, cpu);
> >       mutex_unlock(&wq_pool_mutex);
> > --
> > 2.19.1.6.gb485710b

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 8/8] workqueue: Fix affinity of kworkers when attaching into pool
       [not found]   ` <20201229100639.2086-1-hdanton@sina.com>
@ 2020-12-29 10:13     ` Lai Jiangshan
  0 siblings, 0 replies; 43+ messages in thread
From: Lai Jiangshan @ 2020-12-29 10:13 UTC (permalink / raw)
  To: Hillf Danton
  Cc: LKML, Valentin Schneider, Peter Zijlstra, Qian Cai,
	Vincent Donnefort, Dexuan Cui, Lai Jiangshan, Tejun Heo

On Tue, Dec 29, 2020 at 6:06 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Sat, 26 Dec 2020 10:51:16 +0800
> > From: Lai Jiangshan <laijs@linux.alibaba.com>
> >
> > When worker_attach_to_pool() is called, we should not put the workers
> > to pool->attrs->cpumask when there is not CPU online in it.
> >
> > We have to use wq_online_cpumask in worker_attach_to_pool() to check
> > if pool->attrs->cpumask is valid rather than cpu_online_mask or
> > cpu_active_mask due to gaps between stages in cpu hot[un]plug.
>
> In 5/8 pool->attrs->cpumask is not restored to avoid triggering
> the warning added in e9d867a67fd03ccc ("sched: Allow
> per-cpu kernel threads to run on online && !active"), is it likely
> needed to repeat that trick here?
> Is the above gap no longer existing here at the presence of
> wq_online_cpumask?

It still exists. When online, wq_online_cpumask is always
cpu_online_mask, no thing changed.

An alternative way is to move the code into a work item, which adds
the proper protection against cpu hotlug and does the work.

I don't want to add too much complex in this patchset.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask
  2020-12-26  2:51 ` [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask Lai Jiangshan
@ 2021-01-04 13:56   ` Peter Zijlstra
  2021-01-05  2:41     ` Lai Jiangshan
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-01-04 13:56 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Valentin Schneider, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo

On Sat, Dec 26, 2020 at 10:51:11AM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> wq_online_cpumask is the cached result of cpu_online_mask with the
> going-down cpu cleared.

You can't use cpu_active_mask ?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask
  2021-01-04 13:56   ` Peter Zijlstra
@ 2021-01-05  2:41     ` Lai Jiangshan
  2021-01-05  2:53       ` Lai Jiangshan
  2021-01-05  8:23       ` Lai Jiangshan
  0 siblings, 2 replies; 43+ messages in thread
From: Lai Jiangshan @ 2021-01-05  2:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Valentin Schneider, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo

On Mon, Jan 4, 2021 at 9:56 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Sat, Dec 26, 2020 at 10:51:11AM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan <laijs@linux.alibaba.com>
> >
> > wq_online_cpumask is the cached result of cpu_online_mask with the
> > going-down cpu cleared.
>
> You can't use cpu_active_mask ?


When a cpu is going out:
(cpu_active_mask is not protected by workqueue mutexs.)

create_worker() for unbound pool  |  cpu offlining
check cpu_active_mask             |
                                  |  remove bit from cpu_active_mask
                                  |  no cpu in pool->attrs->cpumask is active
set pool->attrs->cpumask to worker|
and hit the warning


And when a cpu is onlining, there may be some workers which were just created
after the workqueue hotplug callback is finished but before cpu_active_mask
was updated. workqueue has not call back after cpu_active_mask updated and
these workers' cpumask is not updated.

For percpu workers, these problems can be handled with the help of
POOL_DISASSOCIATED which is protected by workqueue mutexs and the
help of sched/core.c which doesn't warn when per-cpu-kthread.

For unbound workers, the way to handle it without using wq_online_cpumask
is much more complex when a cpu is going out.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask
  2021-01-05  2:41     ` Lai Jiangshan
@ 2021-01-05  2:53       ` Lai Jiangshan
  2021-01-05  8:23       ` Lai Jiangshan
  1 sibling, 0 replies; 43+ messages in thread
From: Lai Jiangshan @ 2021-01-05  2:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Valentin Schneider, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo

On Tue, Jan 5, 2021 at 10:41 AM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>
> On Mon, Jan 4, 2021 at 9:56 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Sat, Dec 26, 2020 at 10:51:11AM +0800, Lai Jiangshan wrote:
> > > From: Lai Jiangshan <laijs@linux.alibaba.com>
> > >
> > > wq_online_cpumask is the cached result of cpu_online_mask with the
> > > going-down cpu cleared.
> >
> > You can't use cpu_active_mask ?
>
>
> When a cpu is going out:
> (cpu_active_mask is not protected by workqueue mutexs.)
>
> create_worker() for unbound pool  |  cpu offlining
> check cpu_active_mask             |
>                                   |  remove bit from cpu_active_mask
>                                   |  no cpu in pool->attrs->cpumask is active
> set pool->attrs->cpumask to worker|
> and hit the warning
>
>
> And when a cpu is onlining, there may be some workers which were just created
> after the workqueue hotplug callback is finished but before cpu_active_mask
> was updated. workqueue has not call back after cpu_active_mask updated and
> these workers' cpumask is not updated.
>
> For percpu workers, these problems can be handled with the help of
> POOL_DISASSOCIATED which is protected by workqueue mutexs and the
> help of sched/core.c which doesn't warn when per-cpu-kthread.
>
> For unbound workers, the way to handle it without using wq_online_cpumask
> is much more complex when a cpu is going out.

To have replied too soon, let me think about it again.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask
  2021-01-05  2:41     ` Lai Jiangshan
  2021-01-05  2:53       ` Lai Jiangshan
@ 2021-01-05  8:23       ` Lai Jiangshan
  2021-01-05 13:17         ` Peter Zijlstra
  2021-01-05 16:24         ` Peter Zijlstra
  1 sibling, 2 replies; 43+ messages in thread
From: Lai Jiangshan @ 2021-01-05  8:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Valentin Schneider, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo

On Tue, Jan 5, 2021 at 10:41 AM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>
> On Mon, Jan 4, 2021 at 9:56 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Sat, Dec 26, 2020 at 10:51:11AM +0800, Lai Jiangshan wrote:
> > > From: Lai Jiangshan <laijs@linux.alibaba.com>
> > >
> > > wq_online_cpumask is the cached result of cpu_online_mask with the
> > > going-down cpu cleared.
> >
> > You can't use cpu_active_mask ?
>
>
> When a cpu is going out:
> (cpu_active_mask is not protected by workqueue mutexs.)
>
> create_worker() for unbound pool  |  cpu offlining
> check cpu_active_mask             |
check wq_online_cpumask
>                                   |  remove bit from cpu_active_mask
>                                   |  no cpu in pool->attrs->cpumask is active
> set pool->attrs->cpumask to worker|
> and hit the warning
                                    |  remove bit from wq_online_cpumask

Even with the help of wq_online_cpumask, the patchset can't silence
the warning in __set_cpus_allowed_ptr() in this case.  It is indeed
hard to suppress the warning for unbound pools.  Maybe we need something
like this (outmost callback of CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
so that workqueue can do preparation when offlining before AP_ACTIVE):

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 0042ef362511..ac2103deb20b 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -20,6 +20,9 @@
  *               |                               ^
  *               v                               |
  *              AP_ACTIVE                      AP_ACTIVE
+ *               |                               ^
+ *               v                               |
+ *              ONLINE                         ONLINE
  */

 enum cpuhp_state {
@@ -194,6 +197,7 @@ enum cpuhp_state {
        CPUHP_AP_X86_HPET_ONLINE,
        CPUHP_AP_X86_KVM_CLK_ONLINE,
        CPUHP_AP_ACTIVE,
+       CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
        CPUHP_ONLINE,
 };


The other way is to modify __set_cpus_allowed_ptr() to suppress the
warning for kworkers and believe/let the workqueue handle cpumask correctly.

And the third way is to use get_online_cpus() in worker_attach_to_pool()
which might delay work items to be processed during cpuhotplug and might
be dangerous when someone call flush_work() in cpuhotplug callbacks.

Any thoughts?

Thanks,
Lai

>
>
> And when a cpu is onlining, there may be some workers which were just created
> after the workqueue hotplug callback is finished but before cpu_active_mask
> was updated. workqueue has not call back after cpu_active_mask updated and
> these workers' cpumask is not updated.
>
> For percpu workers, these problems can be handled with the help of
> POOL_DISASSOCIATED which is protected by workqueue mutexs and the
> help of sched/core.c which doesn't warn when per-cpu-kthread.
>
> For unbound workers, the way to handle it without using wq_online_cpumask
> is much more complex when a cpu is going out.

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask
  2021-01-05  8:23       ` Lai Jiangshan
@ 2021-01-05 13:17         ` Peter Zijlstra
  2021-01-05 14:37           ` Lai Jiangshan
  2021-01-05 16:24         ` Peter Zijlstra
  1 sibling, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-01-05 13:17 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: LKML, Valentin Schneider, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo

On Tue, Jan 05, 2021 at 04:23:44PM +0800, Lai Jiangshan wrote:
> On Tue, Jan 5, 2021 at 10:41 AM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> > On Mon, Jan 4, 2021 at 9:56 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Sat, Dec 26, 2020 at 10:51:11AM +0800, Lai Jiangshan wrote:
> > > > From: Lai Jiangshan <laijs@linux.alibaba.com>
> > > >
> > > > wq_online_cpumask is the cached result of cpu_online_mask with the
> > > > going-down cpu cleared.
> > >
> > > You can't use cpu_active_mask ?
> >
> > When a cpu is going out:
> > (cpu_active_mask is not protected by workqueue mutexs.)

But it is protected by the hotplug lock, which is really all you need
afaict.

If the worker thread gets spawned before workqueue_offline_cpu(), said
function will observe it and adjust the mask, if it gets spawned after
it, it must observe a 'reduced' cpu_active_mask.

> >
> > create_worker() for unbound pool  |  cpu offlining
> > check cpu_active_mask             |
> check wq_online_cpumask
> >                                   |  remove bit from cpu_active_mask
> >                                   |  no cpu in pool->attrs->cpumask is active
> > set pool->attrs->cpumask to worker|
> > and hit the warning
>                                     |  remove bit from wq_online_cpumask
> 
> Even with the help of wq_online_cpumask, the patchset can't silence
> the warning in __set_cpus_allowed_ptr() in this case.  It is indeed
> hard to suppress the warning for unbound pools.  Maybe we need something
> like this (outmost callback of CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
> so that workqueue can do preparation when offlining before AP_ACTIVE):
> 
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 0042ef362511..ac2103deb20b 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -20,6 +20,9 @@
>   *               |                               ^
>   *               v                               |
>   *              AP_ACTIVE                      AP_ACTIVE
> + *               |                               ^
> + *               v                               |
> + *              ONLINE                         ONLINE
>   */
> 
>  enum cpuhp_state {
> @@ -194,6 +197,7 @@ enum cpuhp_state {
>         CPUHP_AP_X86_HPET_ONLINE,
>         CPUHP_AP_X86_KVM_CLK_ONLINE,
>         CPUHP_AP_ACTIVE,
> +       CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
>         CPUHP_ONLINE,
>  };
> 

That's waay to late, by then userspace is long running and expecting
things to 'just-work'.

But afaict, things will mostly work for you when you use cpu_active_mask
on cpu-down and cpu_online_mask on cpu-up.

But I think I see the problem, it is spawning a new worker after
workqueue_online_cpu() but before sched_cpu_activate(), right? That
wants to have the wider mask set.

To solve that, the spawning of workers thing needs to know where we are
in the hotplug process, and it can track that using
workqueue_{on,off}line_cpu(). If it happens after offline, it needs to
use cpu_active_mask, if it happens after online cpu_online_mask is your
guy.

Does that make sense?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask
  2021-01-05 13:17         ` Peter Zijlstra
@ 2021-01-05 14:37           ` Lai Jiangshan
  2021-01-05 14:40             ` Lai Jiangshan
  0 siblings, 1 reply; 43+ messages in thread
From: Lai Jiangshan @ 2021-01-05 14:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Valentin Schneider, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo

On Tue, Jan 5, 2021 at 9:17 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jan 05, 2021 at 04:23:44PM +0800, Lai Jiangshan wrote:
> > On Tue, Jan 5, 2021 at 10:41 AM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> > > On Mon, Jan 4, 2021 at 9:56 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > > > On Sat, Dec 26, 2020 at 10:51:11AM +0800, Lai Jiangshan wrote:
> > > > > From: Lai Jiangshan <laijs@linux.alibaba.com>
> > > > >
> > > > > wq_online_cpumask is the cached result of cpu_online_mask with the
> > > > > going-down cpu cleared.
> > > >
> > > > You can't use cpu_active_mask ?
> > >
> > > When a cpu is going out:
> > > (cpu_active_mask is not protected by workqueue mutexs.)
>
> But it is protected by the hotplug lock, which is really all you need
> afaict.
>
> If the worker thread gets spawned before workqueue_offline_cpu(), said
> function will observe it and adjust the mask, if it gets spawned after
> it, it must observe a 'reduced' cpu_active_mask.

Making the workqueue set workers' cpumask correctly is easy.
The hard part is how to suppress the warning.

It is true that said function will observe it and adjust the mask,
but the warning is already issued.

>
> > >
> > > create_worker() for unbound pool  |  cpu offlining
> > > check cpu_active_mask             |
> > check wq_online_cpumask
> > >                                   |  remove bit from cpu_active_mask
> > >                                   |  no cpu in pool->attrs->cpumask is active
> > > set pool->attrs->cpumask to worker|
> > > and hit the warning
> >                                     |  remove bit from wq_online_cpumask
> >
> > Even with the help of wq_online_cpumask, the patchset can't silence
> > the warning in __set_cpus_allowed_ptr() in this case.  It is indeed
> > hard to suppress the warning for unbound pools.  Maybe we need something
> > like this (outmost callback of CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
> > so that workqueue can do preparation when offlining before AP_ACTIVE):
> >
> > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> > index 0042ef362511..ac2103deb20b 100644
> > --- a/include/linux/cpuhotplug.h
> > +++ b/include/linux/cpuhotplug.h
> > @@ -20,6 +20,9 @@
> >   *               |                               ^
> >   *               v                               |
> >   *              AP_ACTIVE                      AP_ACTIVE
> > + *               |                               ^
> > + *               v                               |
> > + *              ONLINE                         ONLINE
> >   */
> >
> >  enum cpuhp_state {
> > @@ -194,6 +197,7 @@ enum cpuhp_state {
> >         CPUHP_AP_X86_HPET_ONLINE,
> >         CPUHP_AP_X86_KVM_CLK_ONLINE,
> >         CPUHP_AP_ACTIVE,
> > +       CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
> >         CPUHP_ONLINE,
> >  };
> >
>
> That's waay to late, by then userspace is long running and expecting
> things to 'just-work'.

I don't like this way either, I just list three ways I can think of.
I prefer the way that __set_cpus_allowed_ptr() doesn't warn
for kworkers.

>
> But afaict, things will mostly work for you when you use cpu_active_mask
> on cpu-down and cpu_online_mask on cpu-up.
>
> But I think I see the problem, it is spawning a new worker after
> workqueue_online_cpu() but before sched_cpu_activate(), right? That
> wants to have the wider mask set.
>
> To solve that, the spawning of workers thing needs to know where we are
> in the hotplug process, and it can track that using
> workqueue_{on,off}line_cpu(). If it happens after offline, it needs to
> use cpu_active_mask, if it happens after online cpu_online_mask is your
> guy.
>
> Does that make sense?

There are six stages we need to know when spawning a worker:

stageA ap_deactive stageB workqueue_offline stageC
stageD workqueue_online stageE ap_active stageF

I don't think create_worker()/worker_attach_to_pool() can know where
it is in the hotplug process unless it uses get_online_cpus() so that
it knows it is not in the hotplug process.  There is no way to maintain
needed information since there are no workqueue callbacks in the proper
stages in the hotplug process.

Again, making the workqueue set workers' cpumask correctly is easy.
But we can't distinguish stageA&B or stageE&F to suppress the warning
in __set_cpus_allowed_ptr() for new unbound workers when pool->attr->cpumask
has only one cpu online&!active since there is no way to keep
cpu_active_mask stable except get_online_cpus().

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask
  2021-01-05 14:37           ` Lai Jiangshan
@ 2021-01-05 14:40             ` Lai Jiangshan
  0 siblings, 0 replies; 43+ messages in thread
From: Lai Jiangshan @ 2021-01-05 14:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Valentin Schneider, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo

On Tue, Jan 5, 2021 at 10:37 PM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>
> On Tue, Jan 5, 2021 at 9:17 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Jan 05, 2021 at 04:23:44PM +0800, Lai Jiangshan wrote:
> > > On Tue, Jan 5, 2021 at 10:41 AM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> > > > On Mon, Jan 4, 2021 at 9:56 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > On Sat, Dec 26, 2020 at 10:51:11AM +0800, Lai Jiangshan wrote:
> > > > > > From: Lai Jiangshan <laijs@linux.alibaba.com>
> > > > > >
> > > > > > wq_online_cpumask is the cached result of cpu_online_mask with the
> > > > > > going-down cpu cleared.
> > > > >
> > > > > You can't use cpu_active_mask ?
> > > >
> > > > When a cpu is going out:
> > > > (cpu_active_mask is not protected by workqueue mutexs.)
> >
> > But it is protected by the hotplug lock, which is really all you need
> > afaict.
> >
> > If the worker thread gets spawned before workqueue_offline_cpu(), said
> > function will observe it and adjust the mask, if it gets spawned after
> > it, it must observe a 'reduced' cpu_active_mask.
>
> Making the workqueue set workers' cpumask correctly is easy.
> The hard part is how to suppress the warning.
>
> It is true that said function will observe it and adjust the mask,
> but the warning is already issued.
>
> >
> > > >
> > > > create_worker() for unbound pool  |  cpu offlining
> > > > check cpu_active_mask             |
> > > check wq_online_cpumask
> > > >                                   |  remove bit from cpu_active_mask
> > > >                                   |  no cpu in pool->attrs->cpumask is active
> > > > set pool->attrs->cpumask to worker|
> > > > and hit the warning
> > >                                     |  remove bit from wq_online_cpumask
> > >
> > > Even with the help of wq_online_cpumask, the patchset can't silence
> > > the warning in __set_cpus_allowed_ptr() in this case.  It is indeed
> > > hard to suppress the warning for unbound pools.  Maybe we need something
> > > like this (outmost callback of CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
> > > so that workqueue can do preparation when offlining before AP_ACTIVE):
> > >
> > > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> > > index 0042ef362511..ac2103deb20b 100644
> > > --- a/include/linux/cpuhotplug.h
> > > +++ b/include/linux/cpuhotplug.h
> > > @@ -20,6 +20,9 @@
> > >   *               |                               ^
> > >   *               v                               |
> > >   *              AP_ACTIVE                      AP_ACTIVE
> > > + *               |                               ^
> > > + *               v                               |
> > > + *              ONLINE                         ONLINE
> > >   */
> > >
> > >  enum cpuhp_state {
> > > @@ -194,6 +197,7 @@ enum cpuhp_state {
> > >         CPUHP_AP_X86_HPET_ONLINE,
> > >         CPUHP_AP_X86_KVM_CLK_ONLINE,
> > >         CPUHP_AP_ACTIVE,
> > > +       CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
> > >         CPUHP_ONLINE,
> > >  };
> > >
> >
> > That's waay to late, by then userspace is long running and expecting
> > things to 'just-work'.
>
> I don't like this way either, I just list three ways I can think of.
> I prefer the way that __set_cpus_allowed_ptr() doesn't warn
> for kworkers.
>
> >
> > But afaict, things will mostly work for you when you use cpu_active_mask
> > on cpu-down and cpu_online_mask on cpu-up.
> >
> > But I think I see the problem, it is spawning a new worker after
> > workqueue_online_cpu() but before sched_cpu_activate(), right? That
> > wants to have the wider mask set.
> >
> > To solve that, the spawning of workers thing needs to know where we are
> > in the hotplug process, and it can track that using
> > workqueue_{on,off}line_cpu(). If it happens after offline, it needs to
> > use cpu_active_mask, if it happens after online cpu_online_mask is your
> > guy.
> >
> > Does that make sense?
>
> There are six stages we need to know when spawning a worker:
>
> stageA ap_deactive stageB workqueue_offline stageC
> stageD workqueue_online stageE ap_active stageF
>
> I don't think create_worker()/worker_attach_to_pool() can know where
> it is in the hotplug process unless it uses get_online_cpus() so that
> it knows it is not in the hotplug process.  There is no way to maintain
> needed information since there are no workqueue callbacks in the proper
> stages in the hotplug process.
>
> Again, making the workqueue set workers' cpumask correctly is easy.
> But we can't distinguish stageA&B or stageE&F to suppress the warning
> in __set_cpus_allowed_ptr() for new unbound workers when pool->attr->cpumask
> has only one cpu online&!active since there is no way to keep
> cpu_active_mask stable except get_online_cpus().

when pool->attr->cpumask has multi cpus but only one cpu online&!active.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask
  2021-01-05  8:23       ` Lai Jiangshan
  2021-01-05 13:17         ` Peter Zijlstra
@ 2021-01-05 16:24         ` Peter Zijlstra
  1 sibling, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-01-05 16:24 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: LKML, Valentin Schneider, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Tejun Heo

On Tue, Jan 05, 2021 at 04:23:44PM +0800, Lai Jiangshan wrote:

> Even with the help of wq_online_cpumask, the patchset can't silence
> the warning in __set_cpus_allowed_ptr() in this case.  It is indeed
> hard to suppress the warning for unbound pools.  Maybe we need something
                                   ^^^^^^^

Argh.. I forgot about the distinction between bound and unbound again.

> like this (outmost callback of CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
> so that workqueue can do preparation when offlining before AP_ACTIVE):
> 
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 0042ef362511..ac2103deb20b 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -20,6 +20,9 @@
>   *               |                               ^
>   *               v                               |
>   *              AP_ACTIVE                      AP_ACTIVE
> + *               |                               ^
> + *               v                               |
> + *              ONLINE                         ONLINE
>   */
> 
>  enum cpuhp_state {
> @@ -194,6 +197,7 @@ enum cpuhp_state {
>         CPUHP_AP_X86_HPET_ONLINE,
>         CPUHP_AP_X86_KVM_CLK_ONLINE,
>         CPUHP_AP_ACTIVE,
> +       CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE,
>         CPUHP_ONLINE,
>  };

Yes, doing the unbound things late should not be a problem.

> The other way is to modify __set_cpus_allowed_ptr() to suppress the
> warning for kworkers and believe/let the workqueue handle cpumask correctly.

The thing is that we don't want 'random' work injected in the CPU during
the hotplug process since we're not exactly sure what state the thing is
in. Once we're active, we know all is well and we can run regular crud.

The thing with bound workers is that they'll only run work originating
on the CPU itself, and should thus be fine.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2020-12-26  2:51 [PATCH -tip V3 0/8] workqueue: break affinity initiatively Lai Jiangshan
                   ` (7 preceding siblings ...)
  2020-12-26  2:51 ` [PATCH -tip V3 8/8] workqueue: Fix affinity of kworkers when attaching into pool Lai Jiangshan
@ 2021-01-08 11:46 ` Peter Zijlstra
  2021-01-11 10:07   ` Thomas Gleixner
  8 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-01-08 11:46 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Valentin Schneider, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Paul McKenney, Thomas Gleixner,
	Vincent Guittot, Steven Rostedt

On Sat, Dec 26, 2020 at 10:51:08AM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> 06249738a41a ("workqueue: Manually break affinity on hotplug")
> said that scheduler will not force break affinity for us.

So I've been looking at this the past day or so, and the more I look,
the more I think commit:

  1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")

is a real problem and we need to revert it (at least for now).

Let me attempt a brain dump:

 - the assumption that per-cpu kernel threads are 'well behaved' on
   hot-plug has, I think, been proven incorrect, it's far worse than
   just bounded workqueue. Therefore, it makes sense to provide the old
   semantics.

 - making the current code provide the old semantics (forcing affinity
   on per-cpu kernel threads) is tricky, but could probably be done:

    * we need to disallow new per-cpu kthreads while going down
    * we need to force push more agressive; basically when
      rcuwait_active(rq->hotplug_wait) push everything except that task,
      irrespective of is_per_cpu_kthread()
    * we need to disallow wakeups of anything not the hotplug thread or
      stop-machine from happening from the rcuwait_wait_event()

   and I have patches for most of that... except they're adding more
   complexity than 1cf12e08bc4d ever deleted.

However, even with all that, there's a further problem...

Fundamentally, waiting for !rq_has_pinned_tasks() so late in
hot-un-plug, is wrong I think. It means that migrate_disable() code
might encounter a mostly torn down CPU. This is OK-ish for per-cpu
kernel threads [*], but is now exposed to any random odd kernel code
that does migrate_disable().

[*] arguably running 'work' this late is similarly problematic.

Let me go do lunch and ponder this further..

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-08 11:46 ` [PATCH -tip V3 0/8] workqueue: break affinity initiatively Peter Zijlstra
@ 2021-01-11 10:07   ` Thomas Gleixner
  2021-01-11 11:01     ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2021-01-11 10:07 UTC (permalink / raw)
  To: Peter Zijlstra, Lai Jiangshan
  Cc: linux-kernel, Valentin Schneider, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Paul McKenney, Vincent Guittot,
	Steven Rostedt

On Fri, Jan 08 2021 at 12:46, Peter Zijlstra wrote:
> On Sat, Dec 26, 2020 at 10:51:08AM +0800, Lai Jiangshan wrote:
>> From: Lai Jiangshan <laijs@linux.alibaba.com>
>> 
>> 06249738a41a ("workqueue: Manually break affinity on hotplug")
>> said that scheduler will not force break affinity for us.
>
> So I've been looking at this the past day or so, and the more I look,
> the more I think commit:
>
>   1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")
>
> is a real problem and we need to revert it (at least for now).
>
> Let me attempt a brain dump:
>
>  - the assumption that per-cpu kernel threads are 'well behaved' on
>    hot-plug has, I think, been proven incorrect, it's far worse than
>    just bounded workqueue. Therefore, it makes sense to provide the old
>    semantics.

I disagree. Per-cpu kernel threads which are magically stopped during
hotplug and then migrated to a random other CPU are just wrong.

We really need to fix that and not proliferate the sloppy and ill
defined behaviour.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-11 10:07   ` Thomas Gleixner
@ 2021-01-11 11:01     ` Peter Zijlstra
  2021-01-11 15:00       ` Paul E. McKenney
  2021-01-11 17:16       ` Peter Zijlstra
  0 siblings, 2 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-01-11 11:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Lai Jiangshan, linux-kernel, Valentin Schneider, Qian Cai,
	Vincent Donnefort, Dexuan Cui, Lai Jiangshan, Paul McKenney,
	Vincent Guittot, Steven Rostedt

On Mon, Jan 11, 2021 at 11:07:34AM +0100, Thomas Gleixner wrote:
> On Fri, Jan 08 2021 at 12:46, Peter Zijlstra wrote:
> > On Sat, Dec 26, 2020 at 10:51:08AM +0800, Lai Jiangshan wrote:
> >> From: Lai Jiangshan <laijs@linux.alibaba.com>
> >> 
> >> 06249738a41a ("workqueue: Manually break affinity on hotplug")
> >> said that scheduler will not force break affinity for us.
> >
> > So I've been looking at this the past day or so, and the more I look,
> > the more I think commit:
> >
> >   1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")
> >
> > is a real problem and we need to revert it (at least for now).
> >
> > Let me attempt a brain dump:
> >
> >  - the assumption that per-cpu kernel threads are 'well behaved' on
> >    hot-plug has, I think, been proven incorrect, it's far worse than
> >    just bounded workqueue. Therefore, it makes sense to provide the old
> >    semantics.
> 
> I disagree. Per-cpu kernel threads which are magically stopped during
> hotplug and then migrated to a random other CPU are just wrong.
> 
> We really need to fix that and not proliferate the sloppy and ill
> defined behaviour.

Well yes, but afaict the workqueue stuff hasn't been settled yet, and
the rcutorture patch Paul did was just plain racy and who knows what
other daft kthread users are out there. That and we're at -rc3.

So I'm really tempted to revert for now and try again later.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-11 11:01     ` Peter Zijlstra
@ 2021-01-11 15:00       ` Paul E. McKenney
  2021-01-11 17:16       ` Peter Zijlstra
  1 sibling, 0 replies; 43+ messages in thread
From: Paul E. McKenney @ 2021-01-11 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Lai Jiangshan, linux-kernel, Valentin Schneider,
	Qian Cai, Vincent Donnefort, Dexuan Cui, Lai Jiangshan,
	Vincent Guittot, Steven Rostedt

On Mon, Jan 11, 2021 at 12:01:03PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 11, 2021 at 11:07:34AM +0100, Thomas Gleixner wrote:
> > On Fri, Jan 08 2021 at 12:46, Peter Zijlstra wrote:
> > > On Sat, Dec 26, 2020 at 10:51:08AM +0800, Lai Jiangshan wrote:
> > >> From: Lai Jiangshan <laijs@linux.alibaba.com>
> > >> 
> > >> 06249738a41a ("workqueue: Manually break affinity on hotplug")
> > >> said that scheduler will not force break affinity for us.
> > >
> > > So I've been looking at this the past day or so, and the more I look,
> > > the more I think commit:
> > >
> > >   1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")
> > >
> > > is a real problem and we need to revert it (at least for now).
> > >
> > > Let me attempt a brain dump:
> > >
> > >  - the assumption that per-cpu kernel threads are 'well behaved' on
> > >    hot-plug has, I think, been proven incorrect, it's far worse than
> > >    just bounded workqueue. Therefore, it makes sense to provide the old
> > >    semantics.
> > 
> > I disagree. Per-cpu kernel threads which are magically stopped during
> > hotplug and then migrated to a random other CPU are just wrong.
> > 
> > We really need to fix that and not proliferate the sloppy and ill
> > defined behaviour.
> 
> Well yes, but afaict the workqueue stuff hasn't been settled yet, and
> the rcutorture patch Paul did was just plain racy and who knows what
> other daft kthread users are out there. That and we're at -rc3.

Here is what I currently have, which passed 40 hours of each flavor of
rcutorture last night (with Lai Jiangshan's patches).

Except that I cannot explain why the "onoff_task && num_online_cpus"
is needed, just that it doesn't pass otherwise.  So either I am blind
or something else is also messed up.  (Why did I think to add it?
Because experimenting with suppressing individual rcutorture options
pointed in that direction.)

Also, adding that condition to that "if" statement is removing CPU
shuffling entirely from two-CPU runs, which is at best suboptimal.

> So I'm really tempted to revert for now and try again later.

I am not feeling at all good about this being in mainline quite yet.  I do
agree that the semantic might be in some sense cleaner, but let's face it,
rcutorture is a very focused test.  There are large parts of the kernel
that it does not stress at all.  There could be any number of other bugs
involving other parts of the kernel.  It would be good to have a rationale
for why this is safe beforehand rather than random experiment-driven
changes like that added "if" condition in the patch below.

							Thanx, Paul

------------------------------------------------------------------------
commit dd0d40aecf506ed6982d7c98f48b4327d7d59485
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed Dec 23 11:23:48 2020 -0800

    torture: Break affinity of kthreads last running on outgoing CPU
    
    The advent of commit 06249738a41a ("workqueue: Manually break affinity
    on hotplug") means that the scheduler no longer silently breaks affinity
    for kthreads pinned to the outgoing CPU.  This can happen for many of
    rcutorture's kthreads due to shuffling, which periodically affinities
    these ktheads away from a randomly chosen CPU.  This usually works fine
    because these kthreads are allowed to run on any other CPU and because
    shuffling is a no-op any time there is but one online CPU.
    
    However, consider the following sequence of events:
    
    1.      CPUs 0 and 1 are initially online.
    
    2.      The torture_shuffle_tasks() function affinities all the tasks
            away from CPU 0.
    
    3.      CPU 1 goes offline.
    
    4.      All the tasks are now affinitied to an offline CPU, triggering
            the warning added by the commit noted above.
    
    This can trigger the following in sched_cpu_dying() in kernel/sched/core.c:
    
            BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq))
    
    This commit therefore adds a new torture_shuffle_tasks_offline() function
    that is invoked from torture_offline() prior to offlining a CPU.  This new
    function scans the list of shuffled kthreads and for any thread that
    last ran (or is set to run) on the outgoing CPU, sets its affinity to
    all online CPUs.  Thus there will never be a kthread that is affinitied
    only to the outgoing CPU.
    
    Of course, if the sysadm manually applies affinity to any of these
    kthreads, all bets are off.  However, such a sysadm must be fast because
    the torture_shuffle_tasks_offline() function is invoked immediately before
    offlining the outgoing CPU.  Therefore, let it be known that with great
    speed and great power comes great responsibility.
    
    Fixes: 1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")
    [ paulmck: Applied synchronization feedback from Peter Zijlstra. ]
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/torture.c b/kernel/torture.c
index 01e336f..9208a03 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -65,6 +65,7 @@ static int fullstop = FULLSTOP_RMMOD;
 static DEFINE_MUTEX(fullstop_mutex);
 
 static atomic_t verbose_sleep_counter;
+static DEFINE_MUTEX(shuffle_task_mutex);
 
 /*
  * Sleep if needed from VERBOSE_TOROUT*().
@@ -153,14 +154,17 @@ int torture_hrtimeout_s(u32 baset_s, u32 fuzzt_ms, struct torture_random_state *
 }
 EXPORT_SYMBOL_GPL(torture_hrtimeout_s);
 
+static struct task_struct *onoff_task;
+
 #ifdef CONFIG_HOTPLUG_CPU
 
+static void torture_shuffle_tasks_offline(int cpu);
+
 /*
  * Variables for online-offline handling.  Only present if CPU hotplug
  * is enabled, otherwise does nothing.
  */
 
-static struct task_struct *onoff_task;
 static long onoff_holdoff;
 static long onoff_interval;
 static torture_ofl_func *onoff_f;
@@ -212,6 +216,8 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
 			 torture_type, cpu);
 	starttime = jiffies;
 	(*n_offl_attempts)++;
+	mutex_lock(&shuffle_task_mutex);
+	torture_shuffle_tasks_offline(cpu);
 	ret = remove_cpu(cpu);
 	if (ret) {
 		s = "";
@@ -245,6 +251,7 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
 		WRITE_ONCE(torture_online_cpus, torture_online_cpus - 1);
 		WARN_ON_ONCE(torture_online_cpus <= 0);
 	}
+	mutex_unlock(&shuffle_task_mutex);
 
 	return true;
 }
@@ -474,7 +481,6 @@ static struct task_struct *shuffler_task;
 static cpumask_var_t shuffle_tmp_mask;
 static int shuffle_idle_cpu;	/* Force all torture tasks off this CPU */
 static struct list_head shuffle_task_list = LIST_HEAD_INIT(shuffle_task_list);
-static DEFINE_MUTEX(shuffle_task_mutex);
 
 /*
  * Register a task to be shuffled.  If there is no memory, just splat
@@ -512,6 +518,19 @@ static void torture_shuffle_task_unregister_all(void)
 	mutex_unlock(&shuffle_task_mutex);
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+// Unbind all tasks from a CPU that is to be taken offline.
+static void torture_shuffle_tasks_offline(int cpu)
+{
+	struct shuffle_task *stp;
+
+	lockdep_assert_held(&shuffle_task_mutex);
+	list_for_each_entry(stp, &shuffle_task_list, st_l)
+		if (task_cpu(stp->st_t) == cpu)
+			set_cpus_allowed_ptr(stp->st_t, cpu_present_mask);
+}
+#endif // #ifdef CONFIG_HOTPLUG_CPU
+
 /* Shuffle tasks such that we allow shuffle_idle_cpu to become idle.
  * A special case is when shuffle_idle_cpu = -1, in which case we allow
  * the tasks to run on all CPUs.
@@ -521,11 +540,13 @@ static void torture_shuffle_tasks(void)
 	struct shuffle_task *stp;
 
 	cpumask_setall(shuffle_tmp_mask);
-	get_online_cpus();
+	mutex_lock(&shuffle_task_mutex);
+	cpus_read_lock();
 
 	/* No point in shuffling if there is only one online CPU (ex: UP) */
-	if (num_online_cpus() == 1) {
-		put_online_cpus();
+	if (num_online_cpus() == 1 || (onoff_task && num_online_cpus() <= 2)) {
+		cpus_read_unlock();
+		mutex_unlock(&shuffle_task_mutex);
 		return;
 	}
 
@@ -536,12 +557,11 @@ static void torture_shuffle_tasks(void)
 	else
 		cpumask_clear_cpu(shuffle_idle_cpu, shuffle_tmp_mask);
 
-	mutex_lock(&shuffle_task_mutex);
 	list_for_each_entry(stp, &shuffle_task_list, st_l)
 		set_cpus_allowed_ptr(stp->st_t, shuffle_tmp_mask);
-	mutex_unlock(&shuffle_task_mutex);
 
-	put_online_cpus();
+	cpus_read_unlock();
+	mutex_unlock(&shuffle_task_mutex);
 }
 
 /* Shuffle tasks across CPUs, with the intent of allowing each CPU in the

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-11 11:01     ` Peter Zijlstra
  2021-01-11 15:00       ` Paul E. McKenney
@ 2021-01-11 17:16       ` Peter Zijlstra
  2021-01-11 18:09         ` Paul E. McKenney
  2021-01-11 19:21         ` Valentin Schneider
  1 sibling, 2 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-01-11 17:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Lai Jiangshan, linux-kernel, Valentin Schneider, Qian Cai,
	Vincent Donnefort, Dexuan Cui, Lai Jiangshan, Paul McKenney,
	Vincent Guittot, Steven Rostedt


While thinking more about this, I'm thinking a big part of the problem
is that we're not dinstinguishing between geniuine per-cpu kthreads and
kthreads that just happen to be per-cpu.

Geniuine per-cpu kthreads are kthread_bind() and have PF_NO_SETAFFINITY,
but sadly a lot of non-per-cpu kthreads, that might happen to still be
per-cpu also have that -- again workqueue does that even to it's unbound
workers :-(

Now, anything created by smpboot, is created through
kthread_create_on_cpu() and that additionally sets to_kthread(p)->flags
KTHREAD_IS_PER_CPU.

And I'm thinking that might be sufficient, if we modify
is_per_cpu_kthread() to check that, then we only match smpboot threads
(which include the hotplug and stopper threads, but notably not the idle
thread)

Sadly it appears like io_uring() uses kthread_create_on_cpu() without
then having any hotplug crud on, so that needs additinoal frobbing.

Also, init_task is PF_KTHREAD but doesn't have a struct kthread on.. and
I suppose bound workqueues don't go through this either.

Let me rummage around a bit...

This seems to not insta-explode... opinions?

---
 include/linux/kthread.h |  3 +++
 kernel/kthread.c        | 25 ++++++++++++++++++++++++-
 kernel/sched/core.c     |  2 +-
 kernel/sched/sched.h    |  4 ++--
 kernel/smpboot.c        |  1 +
 kernel/workqueue.c      | 12 +++++++++---
 6 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 65b81e0c494d..fdd5a52e35d8 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -33,6 +33,9 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
 					  unsigned int cpu,
 					  const char *namefmt);
 
+void kthread_set_per_cpu(struct task_struct *k, bool set);
+bool kthread_is_per_cpu(struct task_struct *k);
+
 /**
  * kthread_run - create and wake a thread.
  * @threadfn: the function to run until signal_pending(current).
diff --git a/kernel/kthread.c b/kernel/kthread.c
index a5eceecd4513..7f081530e459 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -493,11 +493,34 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
 		return p;
 	kthread_bind(p, cpu);
 	/* CPU hotplug need to bind once again when unparking the thread. */
-	set_bit(KTHREAD_IS_PER_CPU, &to_kthread(p)->flags);
 	to_kthread(p)->cpu = cpu;
 	return p;
 }
 
+void kthread_set_per_cpu(struct task_struct *k, bool set)
+{
+	struct kthread *kthread = to_kthread(k);
+	if (!kthread)
+		return;
+
+	if (set) {
+		WARN_ON_ONCE(!(k->flags & PF_NO_SETAFFINITY));
+		WARN_ON_ONCE(k->nr_cpus_allowed != 1);
+		set_bit(KTHREAD_IS_PER_CPU, &kthread->flags);
+	} else {
+		clear_bit(KTHREAD_IS_PER_CPU, &kthread->flags);
+	}
+}
+
+bool kthread_is_per_cpu(struct task_struct *k)
+{
+	struct kthread *kthread = to_kthread(k);
+	if (!kthread)
+		return false;
+
+	return test_bit(KTHREAD_IS_PER_CPU, &kthread->flags);
+}
+
 /**
  * kthread_unpark - unpark a thread created by kthread_create().
  * @k:		thread created by kthread_create().
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 15d2562118d1..e71f9e44789e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7277,7 +7277,7 @@ static void balance_push(struct rq *rq)
 	 * Both the cpu-hotplug and stop task are in this case and are
 	 * required to complete the hotplug process.
 	 */
-	if (is_per_cpu_kthread(push_task) || is_migration_disabled(push_task)) {
+	if (rq->idle == push_task || is_per_cpu_kthread(push_task) || is_migration_disabled(push_task)) {
 		/*
 		 * If this is the idle task on the outgoing CPU try to wake
 		 * up the hotplug control thread which might wait for the
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 12ada79d40f3..3679f63e0aa2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2697,10 +2697,10 @@ static inline bool is_per_cpu_kthread(struct task_struct *p)
 	if (!(p->flags & PF_KTHREAD))
 		return false;
 
-	if (p->nr_cpus_allowed != 1)
+	if (!(p->flags & PF_NO_SETAFFINITY))
 		return false;
 
-	return true;
+	return kthread_is_per_cpu(p);
 }
 #endif
 
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 2efe1e206167..b0abe575a524 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -188,6 +188,7 @@ __smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
 		kfree(td);
 		return PTR_ERR(tsk);
 	}
+	kthread_set_per_cpu(tsk, true);
 	/*
 	 * Park the thread so that it could start right on the CPU
 	 * when it is available.
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9880b6c0e272..824276e4fb2e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1861,6 +1861,8 @@ static void worker_attach_to_pool(struct worker *worker,
 	 */
 	if (pool->flags & POOL_DISASSOCIATED)
 		worker->flags |= WORKER_UNBOUND;
+	else
+		kthread_set_per_cpu(worker->task, true);
 
 	list_add_tail(&worker->node, &pool->workers);
 	worker->pool = pool;
@@ -4919,8 +4921,10 @@ static void unbind_workers(int cpu)
 
 		raw_spin_unlock_irq(&pool->lock);
 
-		for_each_pool_worker(worker, pool)
-			WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_active_mask) < 0);
+		for_each_pool_worker(worker, pool) {
+			kthread_set_per_cpu(worker->task, false);
+			WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
+		}
 
 		mutex_unlock(&wq_pool_attach_mutex);
 
@@ -4972,9 +4976,11 @@ static void rebind_workers(struct worker_pool *pool)
 	 * of all workers first and then clear UNBOUND.  As we're called
 	 * from CPU_ONLINE, the following shouldn't fail.
 	 */
-	for_each_pool_worker(worker, pool)
+	for_each_pool_worker(worker, pool) {
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
 						  pool->attrs->cpumask) < 0);
+		kthread_set_per_cpu(worker->task, true);
+	}
 
 	raw_spin_lock_irq(&pool->lock);
 

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-11 17:16       ` Peter Zijlstra
@ 2021-01-11 18:09         ` Paul E. McKenney
  2021-01-11 21:50           ` Paul E. McKenney
  2021-01-11 19:21         ` Valentin Schneider
  1 sibling, 1 reply; 43+ messages in thread
From: Paul E. McKenney @ 2021-01-11 18:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Lai Jiangshan, linux-kernel, Valentin Schneider,
	Qian Cai, Vincent Donnefort, Dexuan Cui, Lai Jiangshan,
	Vincent Guittot, Steven Rostedt

On Mon, Jan 11, 2021 at 06:16:39PM +0100, Peter Zijlstra wrote:
> 
> While thinking more about this, I'm thinking a big part of the problem
> is that we're not dinstinguishing between geniuine per-cpu kthreads and
> kthreads that just happen to be per-cpu.
> 
> Geniuine per-cpu kthreads are kthread_bind() and have PF_NO_SETAFFINITY,
> but sadly a lot of non-per-cpu kthreads, that might happen to still be
> per-cpu also have that -- again workqueue does that even to it's unbound
> workers :-(
> 
> Now, anything created by smpboot, is created through
> kthread_create_on_cpu() and that additionally sets to_kthread(p)->flags
> KTHREAD_IS_PER_CPU.
> 
> And I'm thinking that might be sufficient, if we modify
> is_per_cpu_kthread() to check that, then we only match smpboot threads
> (which include the hotplug and stopper threads, but notably not the idle
> thread)
> 
> Sadly it appears like io_uring() uses kthread_create_on_cpu() without
> then having any hotplug crud on, so that needs additinoal frobbing.
> 
> Also, init_task is PF_KTHREAD but doesn't have a struct kthread on.. and
> I suppose bound workqueues don't go through this either.
> 
> Let me rummage around a bit...
> 
> This seems to not insta-explode... opinions?

It passes quick tests on -rcu both with and without the rcutorture fixes,
which is encouraging.  I will start a more vigorous test in about an hour.

							Thanx, Paul

> ---
>  include/linux/kthread.h |  3 +++
>  kernel/kthread.c        | 25 ++++++++++++++++++++++++-
>  kernel/sched/core.c     |  2 +-
>  kernel/sched/sched.h    |  4 ++--
>  kernel/smpboot.c        |  1 +
>  kernel/workqueue.c      | 12 +++++++++---
>  6 files changed, 40 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/kthread.h b/include/linux/kthread.h
> index 65b81e0c494d..fdd5a52e35d8 100644
> --- a/include/linux/kthread.h
> +++ b/include/linux/kthread.h
> @@ -33,6 +33,9 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
>  					  unsigned int cpu,
>  					  const char *namefmt);
>  
> +void kthread_set_per_cpu(struct task_struct *k, bool set);
> +bool kthread_is_per_cpu(struct task_struct *k);
> +
>  /**
>   * kthread_run - create and wake a thread.
>   * @threadfn: the function to run until signal_pending(current).
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index a5eceecd4513..7f081530e459 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -493,11 +493,34 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
>  		return p;
>  	kthread_bind(p, cpu);
>  	/* CPU hotplug need to bind once again when unparking the thread. */
> -	set_bit(KTHREAD_IS_PER_CPU, &to_kthread(p)->flags);
>  	to_kthread(p)->cpu = cpu;
>  	return p;
>  }
>  
> +void kthread_set_per_cpu(struct task_struct *k, bool set)
> +{
> +	struct kthread *kthread = to_kthread(k);
> +	if (!kthread)
> +		return;
> +
> +	if (set) {
> +		WARN_ON_ONCE(!(k->flags & PF_NO_SETAFFINITY));
> +		WARN_ON_ONCE(k->nr_cpus_allowed != 1);
> +		set_bit(KTHREAD_IS_PER_CPU, &kthread->flags);
> +	} else {
> +		clear_bit(KTHREAD_IS_PER_CPU, &kthread->flags);
> +	}
> +}
> +
> +bool kthread_is_per_cpu(struct task_struct *k)
> +{
> +	struct kthread *kthread = to_kthread(k);
> +	if (!kthread)
> +		return false;
> +
> +	return test_bit(KTHREAD_IS_PER_CPU, &kthread->flags);
> +}
> +
>  /**
>   * kthread_unpark - unpark a thread created by kthread_create().
>   * @k:		thread created by kthread_create().
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 15d2562118d1..e71f9e44789e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7277,7 +7277,7 @@ static void balance_push(struct rq *rq)
>  	 * Both the cpu-hotplug and stop task are in this case and are
>  	 * required to complete the hotplug process.
>  	 */
> -	if (is_per_cpu_kthread(push_task) || is_migration_disabled(push_task)) {
> +	if (rq->idle == push_task || is_per_cpu_kthread(push_task) || is_migration_disabled(push_task)) {
>  		/*
>  		 * If this is the idle task on the outgoing CPU try to wake
>  		 * up the hotplug control thread which might wait for the
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 12ada79d40f3..3679f63e0aa2 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2697,10 +2697,10 @@ static inline bool is_per_cpu_kthread(struct task_struct *p)
>  	if (!(p->flags & PF_KTHREAD))
>  		return false;
>  
> -	if (p->nr_cpus_allowed != 1)
> +	if (!(p->flags & PF_NO_SETAFFINITY))
>  		return false;
>  
> -	return true;
> +	return kthread_is_per_cpu(p);
>  }
>  #endif
>  
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index 2efe1e206167..b0abe575a524 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -188,6 +188,7 @@ __smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
>  		kfree(td);
>  		return PTR_ERR(tsk);
>  	}
> +	kthread_set_per_cpu(tsk, true);
>  	/*
>  	 * Park the thread so that it could start right on the CPU
>  	 * when it is available.
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 9880b6c0e272..824276e4fb2e 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1861,6 +1861,8 @@ static void worker_attach_to_pool(struct worker *worker,
>  	 */
>  	if (pool->flags & POOL_DISASSOCIATED)
>  		worker->flags |= WORKER_UNBOUND;
> +	else
> +		kthread_set_per_cpu(worker->task, true);
>  
>  	list_add_tail(&worker->node, &pool->workers);
>  	worker->pool = pool;
> @@ -4919,8 +4921,10 @@ static void unbind_workers(int cpu)
>  
>  		raw_spin_unlock_irq(&pool->lock);
>  
> -		for_each_pool_worker(worker, pool)
> -			WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_active_mask) < 0);
> +		for_each_pool_worker(worker, pool) {
> +			kthread_set_per_cpu(worker->task, false);
> +			WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
> +		}
>  
>  		mutex_unlock(&wq_pool_attach_mutex);
>  
> @@ -4972,9 +4976,11 @@ static void rebind_workers(struct worker_pool *pool)
>  	 * of all workers first and then clear UNBOUND.  As we're called
>  	 * from CPU_ONLINE, the following shouldn't fail.
>  	 */
> -	for_each_pool_worker(worker, pool)
> +	for_each_pool_worker(worker, pool) {
>  		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
>  						  pool->attrs->cpumask) < 0);
> +		kthread_set_per_cpu(worker->task, true);
> +	}
>  
>  	raw_spin_lock_irq(&pool->lock);
>  

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-11 17:16       ` Peter Zijlstra
  2021-01-11 18:09         ` Paul E. McKenney
@ 2021-01-11 19:21         ` Valentin Schneider
  2021-01-11 20:23           ` Peter Zijlstra
  2021-01-12 14:57           ` Jens Axboe
  1 sibling, 2 replies; 43+ messages in thread
From: Valentin Schneider @ 2021-01-11 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: Lai Jiangshan, linux-kernel, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Paul McKenney, Vincent Guittot,
	Steven Rostedt

On 11/01/21 18:16, Peter Zijlstra wrote:
> Sadly it appears like io_uring() uses kthread_create_on_cpu() without
> then having any hotplug crud on, so that needs additinoal frobbing.
>

I noticed that as well sometime ago, and I believed then (still do) this
usage is broken. I don't think usage of kthread_create_on_cpu() outside
of smpboot makes sense, because without any hotplug step to park the
thread, its affinity can end up being reset after its dedicated CPU gets
offlined.

I'm clueless about io_uring, but if it *actually* has a good reason to
use some pcpu kthreads (it seems it doesn't have to be on all CPUs?),
then it needs to register some hotplug step to park them / do something
sensible on hotplug.

> Also, init_task is PF_KTHREAD but doesn't have a struct kthread on.. and
> I suppose bound workqueues don't go through this either.
>
> Let me rummage around a bit...
>
> This seems to not insta-explode... opinions?
>

I like having a proper distinction between 'intended' and 'accidental'
pcpu kthreads.

I'm less fond of the workqueue pcpu flag toggling, but it gets us what
we want: allow those threads to run on !active CPUs during online, but
move them away before !online during offline.

Before I get ahead of myself, do we *actually* require that first part
for workqueue kthreads? I'm thinking (raise alarm) we could try another
approach of making them pcpu kthreads that don't abide by the !active &&
online rule.

> ---
>  include/linux/kthread.h |  3 +++
>  kernel/kthread.c        | 25 ++++++++++++++++++++++++-
>  kernel/sched/core.c     |  2 +-
>  kernel/sched/sched.h    |  4 ++--
>  kernel/smpboot.c        |  1 +
>  kernel/workqueue.c      | 12 +++++++++---
>  6 files changed, 40 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 15d2562118d1..e71f9e44789e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7277,7 +7277,7 @@ static void balance_push(struct rq *rq)
>        * Both the cpu-hotplug and stop task are in this case and are
>        * required to complete the hotplug process.
>        */
> -	if (is_per_cpu_kthread(push_task) || is_migration_disabled(push_task)) {
> +	if (rq->idle == push_task || is_per_cpu_kthread(push_task) || is_migration_disabled(push_task)) {

I take it the p->set_child_tid thing you were complaining about on IRC
is what prevents us from having the idle task seen as a pcpu kthread?

>               /*
>                * If this is the idle task on the outgoing CPU try to wake
>                * up the hotplug control thread which might wait for the
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 12ada79d40f3..3679f63e0aa2 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2697,10 +2697,10 @@ static inline bool is_per_cpu_kthread(struct task_struct *p)
>       if (!(p->flags & PF_KTHREAD))
>               return false;
>
> -	if (p->nr_cpus_allowed != 1)
> +	if (!(p->flags & PF_NO_SETAFFINITY))
>               return false;
>
> -	return true;
> +	return kthread_is_per_cpu(p);
>  }
>  #endif
>
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index 2efe1e206167..b0abe575a524 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -188,6 +188,7 @@ __smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
>               kfree(td);
>               return PTR_ERR(tsk);
>       }
> +	kthread_set_per_cpu(tsk, true);
>       /*
>        * Park the thread so that it could start right on the CPU
>        * when it is available.
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 9880b6c0e272..824276e4fb2e 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1861,6 +1861,8 @@ static void worker_attach_to_pool(struct worker *worker,
>        */
>       if (pool->flags & POOL_DISASSOCIATED)
>               worker->flags |= WORKER_UNBOUND;
> +	else
> +		kthread_set_per_cpu(worker->task, true);
>

I thought only pcpu pools would get the POOL_DISASSOCIATED flag on
offline, but it seems unbound pools also get it at init time. Did I get
that right?

Also, shouldn't this be done before the previous set_cpus_allowed_ptr()
call (in the same function)? That is, if we patch
__set_cpus_allowed_ptr() to also use kthread_is_per_cpu().

>       list_add_tail(&worker->node, &pool->workers);
>       worker->pool = pool;

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-11 19:21         ` Valentin Schneider
@ 2021-01-11 20:23           ` Peter Zijlstra
  2021-01-11 22:47             ` Valentin Schneider
  2021-01-12  4:33             ` Lai Jiangshan
  2021-01-12 14:57           ` Jens Axboe
  1 sibling, 2 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-01-11 20:23 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Thomas Gleixner, Lai Jiangshan, linux-kernel, Qian Cai,
	Vincent Donnefort, Dexuan Cui, Lai Jiangshan, Paul McKenney,
	Vincent Guittot, Steven Rostedt, Jens Axboe

On Mon, Jan 11, 2021 at 07:21:06PM +0000, Valentin Schneider wrote:
> On 11/01/21 18:16, Peter Zijlstra wrote:
> > Sadly it appears like io_uring() uses kthread_create_on_cpu() without
> > then having any hotplug crud on, so that needs additinoal frobbing.
> >
> 
> I noticed that as well sometime ago, and I believed then (still do) this
> usage is broken. I don't think usage of kthread_create_on_cpu() outside
> of smpboot makes sense, because without any hotplug step to park the
> thread, its affinity can end up being reset after its dedicated CPU gets
> offlined.
> 
> I'm clueless about io_uring, but if it *actually* has a good reason to
> use some pcpu kthreads (it seems it doesn't have to be on all CPUs?),
> then it needs to register some hotplug step to park them / do something
> sensible on hotplug.

+Jens, could you perhaps elucidate us as to what io_uring wants from
that kthread?

> > Also, init_task is PF_KTHREAD but doesn't have a struct kthread on.. and
> > I suppose bound workqueues don't go through this either.
> >
> > Let me rummage around a bit...
> >
> > This seems to not insta-explode... opinions?
> >
> 
> I like having a proper distinction between 'intended' and 'accidental'
> pcpu kthreads.
> 
> I'm less fond of the workqueue pcpu flag toggling, but it gets us what
> we want: allow those threads to run on !active CPUs during online, but
> move them away before !online during offline.
> 
> Before I get ahead of myself, do we *actually* require that first part
> for workqueue kthreads? I'm thinking (raise alarm) we could try another
> approach of making them pcpu kthreads that don't abide by the !active &&
> online rule.

There is code that really requires percpu workqueues to be percpu. Such
code will flush the percpu workqueue on hotplug and never hit the unbind
scenario.

Other code uses those same percpu workqueues and only uses it as a
performance enhancer, it likes things to stay local, but if not, meh..
And these users are what got us the weird ass semantics of workqueue.

Sadly workqueue itself can't tell them apart.

> > ---
> >  include/linux/kthread.h |  3 +++
> >  kernel/kthread.c        | 25 ++++++++++++++++++++++++-
> >  kernel/sched/core.c     |  2 +-
> >  kernel/sched/sched.h    |  4 ++--
> >  kernel/smpboot.c        |  1 +
> >  kernel/workqueue.c      | 12 +++++++++---
> >  6 files changed, 40 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 15d2562118d1..e71f9e44789e 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7277,7 +7277,7 @@ static void balance_push(struct rq *rq)
> >        * Both the cpu-hotplug and stop task are in this case and are
> >        * required to complete the hotplug process.
> >        */
> > -	if (is_per_cpu_kthread(push_task) || is_migration_disabled(push_task)) {
> > +	if (rq->idle == push_task || is_per_cpu_kthread(push_task) || is_migration_disabled(push_task)) {
> 
> I take it the p->set_child_tid thing you were complaining about on IRC
> is what prevents us from having the idle task seen as a pcpu kthread?

Yes, to to_kthread() only tests PF_KTHREAD and then assumes
p->set_child_tid points to struct kthread, _however_ init_task has
PF_KTHREAD set, but a NULL ->set_child_tid.

This then means the idle thread for the boot CPU will malfunction with
to_kthread() and will certainly not have KTHREAD_IS_PER_CPU set. Per
construction (fork_idle()) none of the other idle threads will have that
cured either.

For fun and giggles, init (pid-1) will have PF_KTHREAD set for a while
as well, until we exec /sbin/init.

Anyway, idle will fail kthread_is_per_cpu(), and hence without the
above, we'll try and push the idle task away, which results in much
fail.

> > diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> > index 2efe1e206167..b0abe575a524 100644
> > --- a/kernel/smpboot.c
> > +++ b/kernel/smpboot.c
> > @@ -188,6 +188,7 @@ __smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
> >               kfree(td);
> >               return PTR_ERR(tsk);
> >       }
> > +	kthread_set_per_cpu(tsk, true);
> >       /*
> >        * Park the thread so that it could start right on the CPU
> >        * when it is available.
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index 9880b6c0e272..824276e4fb2e 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -1861,6 +1861,8 @@ static void worker_attach_to_pool(struct worker *worker,
> >        */
> >       if (pool->flags & POOL_DISASSOCIATED)
> >               worker->flags |= WORKER_UNBOUND;
> > +	else
> > +		kthread_set_per_cpu(worker->task, true);
> >
> 
> I thought only pcpu pools would get the POOL_DISASSOCIATED flag on
> offline, but it seems unbound pools also get it at init time. Did I get
> that right?

That's how I read it too...

> Also, shouldn't this be done before the previous set_cpus_allowed_ptr()
> call (in the same function)? 

Don't see why; we need nr_cpus_allowed == 1, so best do it after, right?

> That is, if we patch
> __set_cpus_allowed_ptr() to also use kthread_is_per_cpu().

That seems wrong.

> >       list_add_tail(&worker->node, &pool->workers);
> >       worker->pool = pool;

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-11 18:09         ` Paul E. McKenney
@ 2021-01-11 21:50           ` Paul E. McKenney
  2021-01-12 17:14             ` Paul E. McKenney
  0 siblings, 1 reply; 43+ messages in thread
From: Paul E. McKenney @ 2021-01-11 21:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Lai Jiangshan, linux-kernel, Valentin Schneider,
	Qian Cai, Vincent Donnefort, Dexuan Cui, Lai Jiangshan,
	Vincent Guittot, Steven Rostedt

On Mon, Jan 11, 2021 at 10:09:07AM -0800, Paul E. McKenney wrote:
> On Mon, Jan 11, 2021 at 06:16:39PM +0100, Peter Zijlstra wrote:
> > 
> > While thinking more about this, I'm thinking a big part of the problem
> > is that we're not dinstinguishing between geniuine per-cpu kthreads and
> > kthreads that just happen to be per-cpu.
> > 
> > Geniuine per-cpu kthreads are kthread_bind() and have PF_NO_SETAFFINITY,
> > but sadly a lot of non-per-cpu kthreads, that might happen to still be
> > per-cpu also have that -- again workqueue does that even to it's unbound
> > workers :-(
> > 
> > Now, anything created by smpboot, is created through
> > kthread_create_on_cpu() and that additionally sets to_kthread(p)->flags
> > KTHREAD_IS_PER_CPU.
> > 
> > And I'm thinking that might be sufficient, if we modify
> > is_per_cpu_kthread() to check that, then we only match smpboot threads
> > (which include the hotplug and stopper threads, but notably not the idle
> > thread)
> > 
> > Sadly it appears like io_uring() uses kthread_create_on_cpu() without
> > then having any hotplug crud on, so that needs additinoal frobbing.
> > 
> > Also, init_task is PF_KTHREAD but doesn't have a struct kthread on.. and
> > I suppose bound workqueues don't go through this either.
> > 
> > Let me rummage around a bit...
> > 
> > This seems to not insta-explode... opinions?
> 
> It passes quick tests on -rcu both with and without the rcutorture fixes,
> which is encouraging.  I will start a more vigorous test in about an hour.

And 672 ten-minute instances of RUDE01 passed with this patch applied
and with my rcutorture patch reverted.  So looking good, thank you!!!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-11 20:23           ` Peter Zijlstra
@ 2021-01-11 22:47             ` Valentin Schneider
  2021-01-12  4:33             ` Lai Jiangshan
  1 sibling, 0 replies; 43+ messages in thread
From: Valentin Schneider @ 2021-01-11 22:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Lai Jiangshan, linux-kernel, Qian Cai,
	Vincent Donnefort, Dexuan Cui, Lai Jiangshan, Paul McKenney,
	Vincent Guittot, Steven Rostedt, Jens Axboe

On 11/01/21 21:23, Peter Zijlstra wrote:
> On Mon, Jan 11, 2021 at 07:21:06PM +0000, Valentin Schneider wrote:
>> I'm less fond of the workqueue pcpu flag toggling, but it gets us what
>> we want: allow those threads to run on !active CPUs during online, but
>> move them away before !online during offline.
>>
>> Before I get ahead of myself, do we *actually* require that first part
>> for workqueue kthreads? I'm thinking (raise alarm) we could try another
>> approach of making them pcpu kthreads that don't abide by the !active &&
>> online rule.
>
> There is code that really requires percpu workqueues to be percpu. Such
> code will flush the percpu workqueue on hotplug and never hit the unbind
> scenario.
>
> Other code uses those same percpu workqueues and only uses it as a
> performance enhancer, it likes things to stay local, but if not, meh..
> And these users are what got us the weird ass semantics of workqueue.
>
> Sadly workqueue itself can't tell them apart.
>

Oh well...

FWIW now that I've unconfused myself, that does look okay.

>> > ---
>> >  include/linux/kthread.h |  3 +++
>> >  kernel/kthread.c        | 25 ++++++++++++++++++++++++-
>> >  kernel/sched/core.c     |  2 +-
>> >  kernel/sched/sched.h    |  4 ++--
>> >  kernel/smpboot.c        |  1 +
>> >  kernel/workqueue.c      | 12 +++++++++---
>> >  6 files changed, 40 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> > index 15d2562118d1..e71f9e44789e 100644
>> > --- a/kernel/sched/core.c
>> > +++ b/kernel/sched/core.c
>> > @@ -7277,7 +7277,7 @@ static void balance_push(struct rq *rq)
>> >        * Both the cpu-hotplug and stop task are in this case and are
>> >        * required to complete the hotplug process.
>> >        */
>> > -	if (is_per_cpu_kthread(push_task) || is_migration_disabled(push_task)) {
>> > +	if (rq->idle == push_task || is_per_cpu_kthread(push_task) || is_migration_disabled(push_task)) {
>>
>> I take it the p->set_child_tid thing you were complaining about on IRC
>> is what prevents us from having the idle task seen as a pcpu kthread?
>
> Yes, to to_kthread() only tests PF_KTHREAD and then assumes
> p->set_child_tid points to struct kthread, _however_ init_task has
> PF_KTHREAD set, but a NULL ->set_child_tid.
>
> This then means the idle thread for the boot CPU will malfunction with
> to_kthread() and will certainly not have KTHREAD_IS_PER_CPU set. Per
> construction (fork_idle()) none of the other idle threads will have that
> cured either.
>
> For fun and giggles, init (pid-1) will have PF_KTHREAD set for a while
> as well, until we exec /sbin/init.
>
> Anyway, idle will fail kthread_is_per_cpu(), and hence without the
> above, we'll try and push the idle task away, which results in much
> fail.
>

Quite!

>> Also, shouldn't this be done before the previous set_cpus_allowed_ptr()
>> call (in the same function)?
>
> Don't see why; we need nr_cpus_allowed == 1, so best do it after, right?
>

Duh, yes.

>> That is, if we patch
>> __set_cpus_allowed_ptr() to also use kthread_is_per_cpu().
>
> That seems wrong.
>

It is, apologies.

>> >       list_add_tail(&worker->node, &pool->workers);
>> >       worker->pool = pool;

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-11 20:23           ` Peter Zijlstra
  2021-01-11 22:47             ` Valentin Schneider
@ 2021-01-12  4:33             ` Lai Jiangshan
  2021-01-12 14:53               ` Peter Zijlstra
  2021-01-12 17:52               ` Valentin Schneider
  1 sibling, 2 replies; 43+ messages in thread
From: Lai Jiangshan @ 2021-01-12  4:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Valentin Schneider, Thomas Gleixner, LKML, Qian Cai,
	Vincent Donnefort, Dexuan Cui, Lai Jiangshan, Paul McKenney,
	Vincent Guittot, Steven Rostedt, Jens Axboe

> Well yes, but afaict the workqueue stuff hasn't been settled yet, and
> the rcutorture patch Paul did was just plain racy and who knows what
> other daft kthread users are out there. That and we're at -rc3.

I just send the V4 patchset for the workqueue.  Please take a look.

> @@ -1861,6 +1861,8 @@ static void worker_attach_to_pool(struct worker *worker,
>         */
>         if (pool->flags & POOL_DISASSOCIATED)
>                 worker->flags |= WORKER_UNBOUND;
> +       else
> +               kthread_set_per_cpu(worker->task, true);

I think kthread_set_per_cpu(worker->task, false) is also needed for
worker_detach_from_pool() or at least rescuer_thread() who doesn't
go to die after detached from the pool.

> I thought only pcpu pools would get the POOL_DISASSOCIATED flag on
> offline, but it seems unbound pools also get it at init time. Did I get
> that right?

You are right.

The POOL_DISASSOCIATED flag indicates whether the pool is concurrency
management or not (negative way, POOL_DISASSOCIATED means "not concurrency
management"). So it should be applied for all unbound pools.

When !POOL_DISASSOCIATED means it is a percpu pool, and the pool->cpu
is online and the offline callback has not been called yet even the pool->cpu
is going to be offline.  So !POOL_DISASSOCIATED is used a lot in the code.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-12  4:33             ` Lai Jiangshan
@ 2021-01-12 14:53               ` Peter Zijlstra
  2021-01-12 15:38                 ` Lai Jiangshan
  2021-01-12 17:52               ` Valentin Schneider
  1 sibling, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-01-12 14:53 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Valentin Schneider, Thomas Gleixner, LKML, Qian Cai,
	Vincent Donnefort, Dexuan Cui, Lai Jiangshan, Paul McKenney,
	Vincent Guittot, Steven Rostedt, Jens Axboe

On Tue, Jan 12, 2021 at 12:33:03PM +0800, Lai Jiangshan wrote:
> > Well yes, but afaict the workqueue stuff hasn't been settled yet, and
> > the rcutorture patch Paul did was just plain racy and who knows what
> > other daft kthread users are out there. That and we're at -rc3.
> 
> I just send the V4 patchset for the workqueue.  Please take a look.

Yes, I've seen it. But I think this approach is smaller and simpler.

By distinguishing between geniuine per-cpu kthreads and kthreads that
happen to have a single CPU affinity, things become much simpler and
robust.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-11 19:21         ` Valentin Schneider
  2021-01-11 20:23           ` Peter Zijlstra
@ 2021-01-12 14:57           ` Jens Axboe
  2021-01-12 15:51             ` Peter Zijlstra
  1 sibling, 1 reply; 43+ messages in thread
From: Jens Axboe @ 2021-01-12 14:57 UTC (permalink / raw)
  To: Valentin Schneider, Peter Zijlstra, Thomas Gleixner
  Cc: Lai Jiangshan, linux-kernel, Qian Cai, Vincent Donnefort,
	Dexuan Cui, Lai Jiangshan, Paul McKenney, Vincent Guittot,
	Steven Rostedt

On 1/11/21 12:21 PM, Valentin Schneider wrote:
> On 11/01/21 18:16, Peter Zijlstra wrote:
>> Sadly it appears like io_uring() uses kthread_create_on_cpu() without
>> then having any hotplug crud on, so that needs additinoal frobbing.
>>
> 
> I noticed that as well sometime ago, and I believed then (still do) this
> usage is broken. I don't think usage of kthread_create_on_cpu() outside
> of smpboot makes sense, because without any hotplug step to park the
> thread, its affinity can end up being reset after its dedicated CPU gets
> offlined.
> 
> I'm clueless about io_uring, but if it *actually* has a good reason to
> use some pcpu kthreads (it seems it doesn't have to be on all CPUs?),
> then it needs to register some hotplug step to park them / do something
> sensible on hotplug.

For io_uring, it's purely used by the SQPOLL mode, which sets aside a
kernel thread for submissions so the application doesn't have to do
anything but write new SQE entries to submit. The thread then notices
these, and submits them. There's an option to affinitize that thread
to a single CPU, which often makes sense for setups like that. Think
of it like a strong hint.

Things aren't going to break if this CPU goes away and we end up being
affinitized to some other CPU, though it is suboptimal. So I guess we
might need some notifiers to ensure that we reset the CPU back again
if it's gone offline+online again? I can take a look at that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-12 14:53               ` Peter Zijlstra
@ 2021-01-12 15:38                 ` Lai Jiangshan
  2021-01-13 11:10                   ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Lai Jiangshan @ 2021-01-12 15:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Valentin Schneider, Thomas Gleixner, LKML, Qian Cai,
	Vincent Donnefort, Dexuan Cui, Lai Jiangshan, Paul McKenney,
	Vincent Guittot, Steven Rostedt, Jens Axboe

On Tue, Jan 12, 2021 at 10:53 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jan 12, 2021 at 12:33:03PM +0800, Lai Jiangshan wrote:
> > > Well yes, but afaict the workqueue stuff hasn't been settled yet, and
> > > the rcutorture patch Paul did was just plain racy and who knows what
> > > other daft kthread users are out there. That and we're at -rc3.
> >
> > I just send the V4 patchset for the workqueue.  Please take a look.
>
> Yes, I've seen it. But I think this approach is smaller and simpler.
>
> By distinguishing between geniuine per-cpu kthreads and kthreads that
> happen to have a single CPU affinity, things become much simpler and
> robust.

Again, fixing the problem of tasks running on dying cpu is easy.
(In other word, adjusting affinity correctly is easy, not matter adjusting
affinity initiatively like here or passively via new !KTHREAD_IS_PER_CPU)

For workqueue, Valentin Schneider patch can almost complete it, only
a small additional fix for percpu unbound workers needed.

And all four versions of my patchset can complete it.

But the hard problem is "how to suppress the warning of
online&!active in __set_cpus_allowed_ptr()" for late spawned
unbound workers during hotplug.

It is not handled completely until my fourth version patchset.

And your patchset ignores this theoretical case, I agree it is
Okay since no one has reported the warning in practice so far.

So something like CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE is still needed
after your patchset merged.

Or you move the warning in __set_cpus_allowed_ptr() in deeper
places where the problem are really happen.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-12 14:57           ` Jens Axboe
@ 2021-01-12 15:51             ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-01-12 15:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Valentin Schneider, Thomas Gleixner, Lai Jiangshan, linux-kernel,
	Qian Cai, Vincent Donnefort, Dexuan Cui, Lai Jiangshan,
	Paul McKenney, Vincent Guittot, Steven Rostedt

On Tue, Jan 12, 2021 at 07:57:26AM -0700, Jens Axboe wrote:
> On 1/11/21 12:21 PM, Valentin Schneider wrote:
> > On 11/01/21 18:16, Peter Zijlstra wrote:
> >> Sadly it appears like io_uring() uses kthread_create_on_cpu() without
> >> then having any hotplug crud on, so that needs additinoal frobbing.
> >>
> > 
> > I noticed that as well sometime ago, and I believed then (still do) this
> > usage is broken. I don't think usage of kthread_create_on_cpu() outside
> > of smpboot makes sense, because without any hotplug step to park the
> > thread, its affinity can end up being reset after its dedicated CPU gets
> > offlined.
> > 
> > I'm clueless about io_uring, but if it *actually* has a good reason to
> > use some pcpu kthreads (it seems it doesn't have to be on all CPUs?),
> > then it needs to register some hotplug step to park them / do something
> > sensible on hotplug.
> 
> For io_uring, it's purely used by the SQPOLL mode, which sets aside a
> kernel thread for submissions so the application doesn't have to do
> anything but write new SQE entries to submit. The thread then notices
> these, and submits them. There's an option to affinitize that thread
> to a single CPU, which often makes sense for setups like that. Think
> of it like a strong hint.

OK, that matches what I could make of it earlier today.

> Things aren't going to break if this CPU goes away and we end up being
> affinitized to some other CPU, though it is suboptimal. So I guess we
> might need some notifiers to ensure that we reset the CPU back again
> if it's gone offline+online again? I can take a look at that.

Indeed, that would make sense.

Thanks!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-11 21:50           ` Paul E. McKenney
@ 2021-01-12 17:14             ` Paul E. McKenney
  2021-01-12 23:53               ` Paul E. McKenney
  0 siblings, 1 reply; 43+ messages in thread
From: Paul E. McKenney @ 2021-01-12 17:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Lai Jiangshan, linux-kernel, Valentin Schneider,
	Qian Cai, Vincent Donnefort, Dexuan Cui, Lai Jiangshan,
	Vincent Guittot, Steven Rostedt

On Mon, Jan 11, 2021 at 01:50:52PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 11, 2021 at 10:09:07AM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 11, 2021 at 06:16:39PM +0100, Peter Zijlstra wrote:
> > > 
> > > While thinking more about this, I'm thinking a big part of the problem
> > > is that we're not dinstinguishing between geniuine per-cpu kthreads and
> > > kthreads that just happen to be per-cpu.
> > > 
> > > Geniuine per-cpu kthreads are kthread_bind() and have PF_NO_SETAFFINITY,
> > > but sadly a lot of non-per-cpu kthreads, that might happen to still be
> > > per-cpu also have that -- again workqueue does that even to it's unbound
> > > workers :-(
> > > 
> > > Now, anything created by smpboot, is created through
> > > kthread_create_on_cpu() and that additionally sets to_kthread(p)->flags
> > > KTHREAD_IS_PER_CPU.
> > > 
> > > And I'm thinking that might be sufficient, if we modify
> > > is_per_cpu_kthread() to check that, then we only match smpboot threads
> > > (which include the hotplug and stopper threads, but notably not the idle
> > > thread)
> > > 
> > > Sadly it appears like io_uring() uses kthread_create_on_cpu() without
> > > then having any hotplug crud on, so that needs additinoal frobbing.
> > > 
> > > Also, init_task is PF_KTHREAD but doesn't have a struct kthread on.. and
> > > I suppose bound workqueues don't go through this either.
> > > 
> > > Let me rummage around a bit...
> > > 
> > > This seems to not insta-explode... opinions?
> > 
> > It passes quick tests on -rcu both with and without the rcutorture fixes,
> > which is encouraging.  I will start a more vigorous test in about an hour.
> 
> And 672 ten-minute instances of RUDE01 passed with this patch applied
> and with my rcutorture patch reverted.  So looking good, thank you!!!

Still on the yesterday's patch, an overnight 12-hour run hit workqueue
warnings in three of four instances of the SRCU-P scenario, two
at not quite three hours in and the third at about ten hours in.
All runs were otherwise successful.  One of the runs also had "BUG:
using __this_cpu_read() in preemptible" as well, so that is the warning
shown below.  There was a series of these BUGs, then things settled down.

This is the warning at the end of process_one_work() that is complaining
about being on the wrong CPU.

I will fire up some tests on the new series.

							Thanx, Paul

------------------------------------------------------------------------

WARNING: CPU: 0 PID: 413 at kernel/workqueue.c:2193 process_one_work+0x8c/0x5f0
Modules linked in:
CPU: 0 PID: 413 Comm: kworker/3:3 Not tainted 5.11.0-rc3+ #1104
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
Workqueue:  0x0 (events)
RIP: 0010:process_one_work+0x8c/0x5f0
Code: 48 8b 46 38 41 83 e6 20 48 89 45 c0 48 8b 46 40 48 89 45 c8 41 f6 44 24 4c 04 75 10 65 8b 05 eb 5d 78 59 41 39 44 24 40 74 02 <0f> 0b 48 ba eb 83 b5 80 46 86 c8 61 48 0f af d3 48 c1 ea 3a 49 8b
RSP: 0018:ffffb5a540847e70 EFLAGS: 00010006
RAX: 0000000000000000 RBX: ffff8fcc5f4f27e0 RCX: 2b970af959bb2a7d
RDX: ffff8fcc5f4f27e8 RSI: ffff8fcc5f4f27e0 RDI: ffff8fcc4306e3c0
RBP: ffffb5a540847ed0 R08: 0000000000000001 R09: ffff8fcc425e4680
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fcc5f4eadc0
R13: ffff8fcc5f4ef700 R14: 0000000000000000 R15: ffff8fcc4306e3c0
FS:  0000000000000000(0000) GS:ffff8fcc5f400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004001e1 CR3: 0000000003084000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? process_one_work+0x5f0/0x5f0
 worker_thread+0x28/0x3c0
 ? process_one_work+0x5f0/0x5f0
 kthread+0x13b/0x160
 ? kthread_insert_work_sanity_check+0x50/0x50
 ret_from_fork+0x22/0x30
irq event stamp: 138141554
hardirqs last  enabled at (138141553): [<ffffffffa74a928f>] _raw_spin_unlock_irq+0x1f/0x40
hardirqs last disabled at (138141554): [<ffffffffa74a9071>] _raw_spin_lock_irq+0x41/0x50
softirqs last  enabled at (138140828): [<ffffffffa68ece37>] srcu_invoke_callbacks+0xe7/0x1a0
softirqs last disabled at (138140824): [<ffffffffa68ece37>] srcu_invoke_callbacks+0xe7/0x1a0
---[ end trace e31d6dded2c52564 ]---
kvm-guest: stealtime: cpu 3, msr 1f4d7b00
BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/3:3/413
caller is refresh_cpu_vm_stats+0x1a6/0x320
CPU: 5 PID: 413 Comm: kworker/3:3 Tainted: G        W         5.11.0-rc3+ #1104
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
Workqueue: mm_percpu_wq vmstat_update
Call Trace:
 dump_stack+0x77/0x97
 check_preemption_disabled+0xb6/0xd0
 refresh_cpu_vm_stats+0x1a6/0x320
 vmstat_update+0xe/0x60
 process_one_work+0x2a0/0x5f0
 ? process_one_work+0x5f0/0x5f0
 worker_thread+0x28/0x3c0
 ? process_one_work+0x5f0/0x5f0
 kthread+0x13b/0x160
 ? kthread_insert_work_sanity_check+0x50/0x50
 ret_from_fork+0x22/0x30

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-12  4:33             ` Lai Jiangshan
  2021-01-12 14:53               ` Peter Zijlstra
@ 2021-01-12 17:52               ` Valentin Schneider
  1 sibling, 0 replies; 43+ messages in thread
From: Valentin Schneider @ 2021-01-12 17:52 UTC (permalink / raw)
  To: Lai Jiangshan, Peter Zijlstra
  Cc: Thomas Gleixner, LKML, Qian Cai, Vincent Donnefort, Dexuan Cui,
	Lai Jiangshan, Paul McKenney, Vincent Guittot, Steven Rostedt,
	Jens Axboe

On 12/01/21 12:33, Lai Jiangshan wrote:
>> I thought only pcpu pools would get the POOL_DISASSOCIATED flag on
>> offline, but it seems unbound pools also get it at init time. Did I get
>> that right?
>
> You are right.
>
> The POOL_DISASSOCIATED flag indicates whether the pool is concurrency
> management or not (negative way, POOL_DISASSOCIATED means "not concurrency
> management"). So it should be applied for all unbound pools.
>
> When !POOL_DISASSOCIATED means it is a percpu pool, and the pool->cpu
> is online and the offline callback has not been called yet even the pool->cpu
> is going to be offline.  So !POOL_DISASSOCIATED is used a lot in the code.

Thanks!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-12 17:14             ` Paul E. McKenney
@ 2021-01-12 23:53               ` Paul E. McKenney
  2021-01-15  9:11                 ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Paul E. McKenney @ 2021-01-12 23:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Lai Jiangshan, linux-kernel, Valentin Schneider,
	Qian Cai, Vincent Donnefort, Dexuan Cui, Lai Jiangshan,
	Vincent Guittot, Steven Rostedt

On Tue, Jan 12, 2021 at 09:14:11AM -0800, Paul E. McKenney wrote:
> On Mon, Jan 11, 2021 at 01:50:52PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 11, 2021 at 10:09:07AM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 11, 2021 at 06:16:39PM +0100, Peter Zijlstra wrote:
> > > > 
> > > > While thinking more about this, I'm thinking a big part of the problem
> > > > is that we're not dinstinguishing between geniuine per-cpu kthreads and
> > > > kthreads that just happen to be per-cpu.
> > > > 
> > > > Geniuine per-cpu kthreads are kthread_bind() and have PF_NO_SETAFFINITY,
> > > > but sadly a lot of non-per-cpu kthreads, that might happen to still be
> > > > per-cpu also have that -- again workqueue does that even to it's unbound
> > > > workers :-(
> > > > 
> > > > Now, anything created by smpboot, is created through
> > > > kthread_create_on_cpu() and that additionally sets to_kthread(p)->flags
> > > > KTHREAD_IS_PER_CPU.
> > > > 
> > > > And I'm thinking that might be sufficient, if we modify
> > > > is_per_cpu_kthread() to check that, then we only match smpboot threads
> > > > (which include the hotplug and stopper threads, but notably not the idle
> > > > thread)
> > > > 
> > > > Sadly it appears like io_uring() uses kthread_create_on_cpu() without
> > > > then having any hotplug crud on, so that needs additinoal frobbing.
> > > > 
> > > > Also, init_task is PF_KTHREAD but doesn't have a struct kthread on.. and
> > > > I suppose bound workqueues don't go through this either.
> > > > 
> > > > Let me rummage around a bit...
> > > > 
> > > > This seems to not insta-explode... opinions?
> > > 
> > > It passes quick tests on -rcu both with and without the rcutorture fixes,
> > > which is encouraging.  I will start a more vigorous test in about an hour.
> > 
> > And 672 ten-minute instances of RUDE01 passed with this patch applied
> > and with my rcutorture patch reverted.  So looking good, thank you!!!
> 
> Still on the yesterday's patch, an overnight 12-hour run hit workqueue
> warnings in three of four instances of the SRCU-P scenario, two
> at not quite three hours in and the third at about ten hours in.
> All runs were otherwise successful.  One of the runs also had "BUG:
> using __this_cpu_read() in preemptible" as well, so that is the warning
> shown below.  There was a series of these BUGs, then things settled down.
> 
> This is the warning at the end of process_one_work() that is complaining
> about being on the wrong CPU.
> 
> I will fire up some tests on the new series.

An SRCU-P run on the new series reproduced the warning below.  Repeat-by:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10 --configs "112*SRCU-P" --bootargs "rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcutree.softirq=0" --trust-make

A RUDE01 run on the new series completed without incident.  Repeat-by:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10 --configs "672*RUDE01" --bootargs "rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcutree.softirq=0" --trust-make

I will be doing an overnight (PST) run having more variety and longer durations.

							Thanx, Paul

> ------------------------------------------------------------------------
> 
> WARNING: CPU: 0 PID: 413 at kernel/workqueue.c:2193 process_one_work+0x8c/0x5f0
> Modules linked in:
> CPU: 0 PID: 413 Comm: kworker/3:3 Not tainted 5.11.0-rc3+ #1104
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
> Workqueue:  0x0 (events)
> RIP: 0010:process_one_work+0x8c/0x5f0
> Code: 48 8b 46 38 41 83 e6 20 48 89 45 c0 48 8b 46 40 48 89 45 c8 41 f6 44 24 4c 04 75 10 65 8b 05 eb 5d 78 59 41 39 44 24 40 74 02 <0f> 0b 48 ba eb 83 b5 80 46 86 c8 61 48 0f af d3 48 c1 ea 3a 49 8b
> RSP: 0018:ffffb5a540847e70 EFLAGS: 00010006
> RAX: 0000000000000000 RBX: ffff8fcc5f4f27e0 RCX: 2b970af959bb2a7d
> RDX: ffff8fcc5f4f27e8 RSI: ffff8fcc5f4f27e0 RDI: ffff8fcc4306e3c0
> RBP: ffffb5a540847ed0 R08: 0000000000000001 R09: ffff8fcc425e4680
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fcc5f4eadc0
> R13: ffff8fcc5f4ef700 R14: 0000000000000000 R15: ffff8fcc4306e3c0
> FS:  0000000000000000(0000) GS:ffff8fcc5f400000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000004001e1 CR3: 0000000003084000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  ? process_one_work+0x5f0/0x5f0
>  worker_thread+0x28/0x3c0
>  ? process_one_work+0x5f0/0x5f0
>  kthread+0x13b/0x160
>  ? kthread_insert_work_sanity_check+0x50/0x50
>  ret_from_fork+0x22/0x30
> irq event stamp: 138141554
> hardirqs last  enabled at (138141553): [<ffffffffa74a928f>] _raw_spin_unlock_irq+0x1f/0x40
> hardirqs last disabled at (138141554): [<ffffffffa74a9071>] _raw_spin_lock_irq+0x41/0x50
> softirqs last  enabled at (138140828): [<ffffffffa68ece37>] srcu_invoke_callbacks+0xe7/0x1a0
> softirqs last disabled at (138140824): [<ffffffffa68ece37>] srcu_invoke_callbacks+0xe7/0x1a0
> ---[ end trace e31d6dded2c52564 ]---
> kvm-guest: stealtime: cpu 3, msr 1f4d7b00
> BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/3:3/413
> caller is refresh_cpu_vm_stats+0x1a6/0x320
> CPU: 5 PID: 413 Comm: kworker/3:3 Tainted: G        W         5.11.0-rc3+ #1104
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
> Workqueue: mm_percpu_wq vmstat_update
> Call Trace:
>  dump_stack+0x77/0x97
>  check_preemption_disabled+0xb6/0xd0
>  refresh_cpu_vm_stats+0x1a6/0x320
>  vmstat_update+0xe/0x60
>  process_one_work+0x2a0/0x5f0
>  ? process_one_work+0x5f0/0x5f0
>  worker_thread+0x28/0x3c0
>  ? process_one_work+0x5f0/0x5f0
>  kthread+0x13b/0x160
>  ? kthread_insert_work_sanity_check+0x50/0x50
>  ret_from_fork+0x22/0x30

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-12 15:38                 ` Lai Jiangshan
@ 2021-01-13 11:10                   ` Peter Zijlstra
  2021-01-13 12:00                     ` Lai Jiangshan
  2021-01-13 12:57                     ` Lai Jiangshan
  0 siblings, 2 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-01-13 11:10 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Valentin Schneider, Thomas Gleixner, LKML, Qian Cai,
	Vincent Donnefort, Dexuan Cui, Lai Jiangshan, Paul McKenney,
	Vincent Guittot, Steven Rostedt, Jens Axboe

On Tue, Jan 12, 2021 at 11:38:12PM +0800, Lai Jiangshan wrote:

> But the hard problem is "how to suppress the warning of
> online&!active in __set_cpus_allowed_ptr()" for late spawned
> unbound workers during hotplug.

I cannot see create_worker() go bad like that.

The thing is, it uses:

  kthread_bind_mask(, pool->attr->cpumask)
  worker_attach_to_pool()
    set_cpus_allowed_ptr(, pool->attr->cpumask)

which means set_cpus_allowed_ptr() must be a NOP, because the affinity
is already set by kthread_bind_mask(). Further, the first wakeup of that
worker will then hit:

  select_task_rq()
    is_cpu_allowed()
      is_per_cpu_kthread() -- false
    select_fallback_rq()


So normally that really isn't a problem. I can only see a tiny hole
there, where someone changes the cpumask between kthread_bind_mask() and
set_cpus_allowed_ptr(). AFAICT that can be fixed in two ways:

 - add wq_pool_mutex around things in create_worker(), or
 - move the set_cpus_allowed_ptr() out of worker_attach_to_pool() and
   into rescuer_thread().

Which then brings us to rescuer_thread...  If we manage to trigger the
rescuer during hotplug, then yes, I think that can go wobbly.

Let me consider that a bit more while I try and make sense of that splat
Paul reported.

---

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ec0771e4a3fb..fe05308dc472 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1844,15 +1844,19 @@ static struct worker *alloc_worker(int node)
  * cpu-[un]hotplugs.
  */
 static void worker_attach_to_pool(struct worker *worker,
-				   struct worker_pool *pool)
+				  struct worker_pool *pool,
+				  bool set_affinity)
 {
 	mutex_lock(&wq_pool_attach_mutex);
 
-	/*
-	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
-	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
-	 */
-	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+	if (set_affinity) {
+		/*
+		 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have
+		 * any online CPUs.  It'll be re-applied when any of the CPUs
+		 * come up.
+		 */
+		set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+	}
 
 	/*
 	 * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
@@ -1944,7 +1948,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	kthread_bind_mask(worker->task, pool->attrs->cpumask);
 
 	/* successful, attach the worker to the pool */
-	worker_attach_to_pool(worker, pool);
+	worker_attach_to_pool(worker, pool, false);
 
 	/* start the newly created worker */
 	raw_spin_lock_irq(&pool->lock);
@@ -2509,7 +2513,11 @@ static int rescuer_thread(void *__rescuer)
 
 		raw_spin_unlock_irq(&wq_mayday_lock);
 
-		worker_attach_to_pool(rescuer, pool);
+		/*
+		 * XXX can go splat when running during hot-un-plug and
+		 * the pool affinity is wobbly.
+		 */
+		worker_attach_to_pool(rescuer, pool, true);
 
 		raw_spin_lock_irq(&pool->lock);
 

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-13 11:10                   ` Peter Zijlstra
@ 2021-01-13 12:00                     ` Lai Jiangshan
  2021-01-13 12:57                     ` Lai Jiangshan
  1 sibling, 0 replies; 43+ messages in thread
From: Lai Jiangshan @ 2021-01-13 12:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Valentin Schneider, Thomas Gleixner, LKML, Qian Cai,
	Vincent Donnefort, Dexuan Cui, Lai Jiangshan, Paul McKenney,
	Vincent Guittot, Steven Rostedt, Jens Axboe

On Wed, Jan 13, 2021 at 7:11 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jan 12, 2021 at 11:38:12PM +0800, Lai Jiangshan wrote:
>
> > But the hard problem is "how to suppress the warning of
> > online&!active in __set_cpus_allowed_ptr()" for late spawned
> > unbound workers during hotplug.
>
> I cannot see create_worker() go bad like that.
>
> The thing is, it uses:
>
>   kthread_bind_mask(, pool->attr->cpumask)
>   worker_attach_to_pool()
>     set_cpus_allowed_ptr(, pool->attr->cpumask)
>
> which means set_cpus_allowed_ptr() must be a NOP, because the affinity
> is already set by kthread_bind_mask(). Further, the first wakeup of that
> worker will then hit:
>
>   select_task_rq()
>     is_cpu_allowed()
>       is_per_cpu_kthread() -- false
>     select_fallback_rq()
>
>
> So normally that really isn't a problem. I can only see a tiny hole
> there, where someone changes the cpumask between kthread_bind_mask() and
> set_cpus_allowed_ptr(). AFAICT that can be fixed in two ways:
>
>  - add wq_pool_mutex around things in create_worker(), or
>  - move the set_cpus_allowed_ptr() out of worker_attach_to_pool() and
>    into rescuer_thread().
>
> Which then brings us to rescuer_thread...  If we manage to trigger the
> rescuer during hotplug, then yes, I think that can go wobbly.

Oh, I forgot set_cpus_allowed_ptr() is NOP when combined with
kthread_bind_mask()(create_worker()).

So the problem becomes "how to suppress the warning of online&!active in
__set_cpus_allowed_ptr()" for late *attached unbound rescuer* workers
during hotplug.


>
> Let me consider that a bit more while I try and make sense of that splat
> Paul reported.
>
> ---
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index ec0771e4a3fb..fe05308dc472 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1844,15 +1844,19 @@ static struct worker *alloc_worker(int node)
>   * cpu-[un]hotplugs.
>   */
>  static void worker_attach_to_pool(struct worker *worker,
> -                                  struct worker_pool *pool)
> +                                 struct worker_pool *pool,
> +                                 bool set_affinity)
>  {
>         mutex_lock(&wq_pool_attach_mutex);
>
> -       /*
> -        * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
> -        * online CPUs.  It'll be re-applied when any of the CPUs come up.
> -        */
> -       set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> +       if (set_affinity) {
> +               /*
> +                * set_cpus_allowed_ptr() will fail if the cpumask doesn't have
> +                * any online CPUs.  It'll be re-applied when any of the CPUs
> +                * come up.
> +                */
> +               set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> +       }
>
>         /*
>          * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
> @@ -1944,7 +1948,7 @@ static struct worker *create_worker(struct worker_pool *pool)
>         kthread_bind_mask(worker->task, pool->attrs->cpumask);
>
>         /* successful, attach the worker to the pool */
> -       worker_attach_to_pool(worker, pool);
> +       worker_attach_to_pool(worker, pool, false);
>
>         /* start the newly created worker */
>         raw_spin_lock_irq(&pool->lock);
> @@ -2509,7 +2513,11 @@ static int rescuer_thread(void *__rescuer)
>
>                 raw_spin_unlock_irq(&wq_mayday_lock);
>
> -               worker_attach_to_pool(rescuer, pool);
> +               /*
> +                * XXX can go splat when running during hot-un-plug and
> +                * the pool affinity is wobbly.
> +                */
> +               worker_attach_to_pool(rescuer, pool, true);
>
>                 raw_spin_lock_irq(&pool->lock);
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-13 11:10                   ` Peter Zijlstra
  2021-01-13 12:00                     ` Lai Jiangshan
@ 2021-01-13 12:57                     ` Lai Jiangshan
  1 sibling, 0 replies; 43+ messages in thread
From: Lai Jiangshan @ 2021-01-13 12:57 UTC (permalink / raw)
  To: Peter Zijlstra, Lai Jiangshan, Tejun Heo
  Cc: Valentin Schneider, Thomas Gleixner, LKML, Qian Cai,
	Vincent Donnefort, Dexuan Cui, Paul McKenney, Vincent Guittot,
	Steven Rostedt, Jens Axboe


On 2021/1/13 19:10, Peter Zijlstra wrote:
> On Tue, Jan 12, 2021 at 11:38:12PM +0800, Lai Jiangshan wrote:
> 
>> But the hard problem is "how to suppress the warning of
>> online&!active in __set_cpus_allowed_ptr()" for late spawned
>> unbound workers during hotplug.
> 
> I cannot see create_worker() go bad like that.
> 
> The thing is, it uses:
> 
>    kthread_bind_mask(, pool->attr->cpumask)
>    worker_attach_to_pool()
>      set_cpus_allowed_ptr(, pool->attr->cpumask)
> 
> which means set_cpus_allowed_ptr() must be a NOP, because the affinity
> is already set by kthread_bind_mask(). Further, the first wakeup of that
> worker will then hit:
> 
>    select_task_rq()
>      is_cpu_allowed()
>        is_per_cpu_kthread() -- false
>      select_fallback_rq()
> 
> 
> So normally that really isn't a problem. I can only see a tiny hole
> there, where someone changes the cpumask between kthread_bind_mask() and
> set_cpus_allowed_ptr(). AFAICT that can be fixed in two ways:
> 
>   - add wq_pool_mutex around things in create_worker(), or
>   - move the set_cpus_allowed_ptr() out of worker_attach_to_pool() and
>     into rescuer_thread().
> 
> Which then brings us to rescuer_thread...  If we manage to trigger the
> rescuer during hotplug, then yes, I think that can go wobbly.
> 

How about the following idea (not complied, not tested).
It does not call set_cpus_allowed_ptr() for just created workers.
It does not change cpumask for rescuer except when it is per cpu pool.

The only problem is that, unbound rescue worker doesn't comply with
wq_unbound_cpumask nor wq->unbound_attrs->cpumask.  Another 50 Lines
of code can make it complied,  but I don't want to type it in email
and complicated the idea.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9880b6c0e272..df2082283c1e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1849,10 +1849,30 @@ static void worker_attach_to_pool(struct worker *worker,
  	mutex_lock(&wq_pool_attach_mutex);

  	/*
-	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
-	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
+	 * If we called from create_worker(), we don't need to call
+	 * set_cpus_allowed_ptr() since we just kthread_bind_mask() it.
+	 *
+	 * The only other path gets us here is rescuer_thread().
+	 *
+	 * When !(pool->flags & POOL_DISASSOCIATED), it is per-cpu pool
+	 * and we should rebind the rescuer worker to the target CPU.
+	 *
+	 * When it is a rescuer worker attaching to unbound pool, we keep
+	 * the affinity for rescuer worker to be cpu_possible_mask.
+	 *
+	 * Note: unbound rescue worker doesn't comply with wq_unbound_cpumask
+	 * nor wq->unbound_attrs->cpumask.  The optimal choice is to keep
+	 * the affinity for rescuer worker to be
+	 *	wq_unbound_cpumask & wq->unbound_attrs->cpumask
+	 * but there is no reliable way to set it back via
+	 * set_cpus_allowed_ptr() when its affinity is changed by scheduler
+	 * due to CPU hotplug, so we just use cpu_possible_mask for resuer.
+	 *
+	 * set_cpus_allowed_ptr() will not fail since
+	 * !(pool->flags & POOL_DISASSOCIATED)
  	 */
-	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+	if (worker->rescue_wq && !(pool->flags & POOL_DISASSOCIATED))
+		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask) < 0);

  	/*
  	 * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
@@ -5043,7 +5063,8 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)

  	/* as we're called from CPU_ONLINE, the following shouldn't fail */
  	for_each_pool_worker(worker, pool)
-		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0);
+		if (!worker->rescue_wq)
+			WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0);
  }

  int workqueue_prepare_cpu(unsigned int cpu)





^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-12 23:53               ` Paul E. McKenney
@ 2021-01-15  9:11                 ` Peter Zijlstra
  2021-01-15 13:04                   ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-01-15  9:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Lai Jiangshan, linux-kernel, Valentin Schneider,
	Qian Cai, Vincent Donnefort, Dexuan Cui, Lai Jiangshan,
	Vincent Guittot, Steven Rostedt

On Tue, Jan 12, 2021 at 03:53:24PM -0800, Paul E. McKenney wrote:
> An SRCU-P run on the new series reproduced the warning below.  Repeat-by:
> 
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10 --configs "112*SRCU-P" --bootargs "rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcutree.softirq=0" --trust-make

Lemme go wake up an EX ;-)

Anyway, I've folded my patch:

  https://lkml.kernel.org/r/YABknAqDe4h35+GY@hirez.programming.kicks-ass.net

back into the series, so be found here:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/urgent

and will start testing on that while I continue to ponder the rescuer
mysteries.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-15  9:11                 ` Peter Zijlstra
@ 2021-01-15 13:04                   ` Peter Zijlstra
  2021-01-16  6:00                     ` Lai Jiangshan
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-01-15 13:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Lai Jiangshan, linux-kernel, Valentin Schneider,
	Qian Cai, Vincent Donnefort, Dexuan Cui, Lai Jiangshan,
	Vincent Guittot, Steven Rostedt

On Fri, Jan 15, 2021 at 10:11:51AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 12, 2021 at 03:53:24PM -0800, Paul E. McKenney wrote:
> > An SRCU-P run on the new series reproduced the warning below.  Repeat-by:
> > 
> > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10 --configs "112*SRCU-P" --bootargs "rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcutree.softirq=0" --trust-make
> 
> Lemme go wake up an EX ;-)

Whee, rescuer thread goes wobbly... took a few hours but there you have
it.

All I've got so far is fugly, gotta think harder.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively
  2021-01-15 13:04                   ` Peter Zijlstra
@ 2021-01-16  6:00                     ` Lai Jiangshan
  0 siblings, 0 replies; 43+ messages in thread
From: Lai Jiangshan @ 2021-01-16  6:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Thomas Gleixner, LKML, Valentin Schneider,
	Qian Cai, Vincent Donnefort, Dexuan Cui, Lai Jiangshan,
	Vincent Guittot, Steven Rostedt

On Fri, Jan 15, 2021 at 9:05 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jan 15, 2021 at 10:11:51AM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 12, 2021 at 03:53:24PM -0800, Paul E. McKenney wrote:
> > > An SRCU-P run on the new series reproduced the warning below.  Repeat-by:
> > >
> > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10 --configs "112*SRCU-P" --bootargs "rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcutree.softirq=0" --trust-make
> >
> > Lemme go wake up an EX ;-)
>
> Whee, rescuer thread goes wobbly... took a few hours but there you have
> it.
>
> All I've got so far is fugly, gotta think harder.

Hello,

With the help of this patch, you can solve the problem.

https://lore.kernel.org/lkml/20210116065753.2163-1-jiangshanlai@gmail.com/

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2021-01-16  6:02 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-26  2:51 [PATCH -tip V3 0/8] workqueue: break affinity initiatively Lai Jiangshan
2020-12-26  2:51 ` [PATCH -tip V3 1/8] workqueue: use cpu_possible_mask instead of cpu_active_mask to break affinity Lai Jiangshan
2020-12-26  2:51 ` [PATCH -tip V3 2/8] workqueue: Manually break affinity on pool detachment Lai Jiangshan
2020-12-26  2:51 ` [PATCH -tip V3 3/8] workqueue: introduce wq_online_cpumask Lai Jiangshan
2021-01-04 13:56   ` Peter Zijlstra
2021-01-05  2:41     ` Lai Jiangshan
2021-01-05  2:53       ` Lai Jiangshan
2021-01-05  8:23       ` Lai Jiangshan
2021-01-05 13:17         ` Peter Zijlstra
2021-01-05 14:37           ` Lai Jiangshan
2021-01-05 14:40             ` Lai Jiangshan
2021-01-05 16:24         ` Peter Zijlstra
2020-12-26  2:51 ` [PATCH -tip V3 4/8] workqueue: use wq_online_cpumask in restore_unbound_workers_cpumask() Lai Jiangshan
2020-12-26  2:51 ` [PATCH -tip V3 5/8] workqueue: Manually break affinity on hotplug for unbound pool Lai Jiangshan
     [not found]   ` <20201226101631.5448-1-hdanton@sina.com>
2020-12-27 14:04     ` Lai Jiangshan
2020-12-26  2:51 ` [PATCH -tip V3 6/8] workqueue: reorganize workqueue_online_cpu() Lai Jiangshan
2020-12-26  2:51 ` [PATCH -tip V3 7/8] workqueue: reorganize workqueue_offline_cpu() unbind_workers() Lai Jiangshan
2020-12-26  2:51 ` [PATCH -tip V3 8/8] workqueue: Fix affinity of kworkers when attaching into pool Lai Jiangshan
     [not found]   ` <20201229100639.2086-1-hdanton@sina.com>
2020-12-29 10:13     ` Lai Jiangshan
2021-01-08 11:46 ` [PATCH -tip V3 0/8] workqueue: break affinity initiatively Peter Zijlstra
2021-01-11 10:07   ` Thomas Gleixner
2021-01-11 11:01     ` Peter Zijlstra
2021-01-11 15:00       ` Paul E. McKenney
2021-01-11 17:16       ` Peter Zijlstra
2021-01-11 18:09         ` Paul E. McKenney
2021-01-11 21:50           ` Paul E. McKenney
2021-01-12 17:14             ` Paul E. McKenney
2021-01-12 23:53               ` Paul E. McKenney
2021-01-15  9:11                 ` Peter Zijlstra
2021-01-15 13:04                   ` Peter Zijlstra
2021-01-16  6:00                     ` Lai Jiangshan
2021-01-11 19:21         ` Valentin Schneider
2021-01-11 20:23           ` Peter Zijlstra
2021-01-11 22:47             ` Valentin Schneider
2021-01-12  4:33             ` Lai Jiangshan
2021-01-12 14:53               ` Peter Zijlstra
2021-01-12 15:38                 ` Lai Jiangshan
2021-01-13 11:10                   ` Peter Zijlstra
2021-01-13 12:00                     ` Lai Jiangshan
2021-01-13 12:57                     ` Lai Jiangshan
2021-01-12 17:52               ` Valentin Schneider
2021-01-12 14:57           ` Jens Axboe
2021-01-12 15:51             ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.