All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] workqueue: Use active mask for new worker when pool is DISASSOCIATED
@ 2022-07-14  3:16 Schspa Shi
  2022-07-14  8:34 ` Lai Jiangshan
  0 siblings, 1 reply; 3+ messages in thread
From: Schspa Shi @ 2022-07-14  3:16 UTC (permalink / raw)
  To: jiangshanlai, tj; +Cc: linux-kernel, zhaohui.shi, Schspa Shi

When CPU-[un]hotplugs, all workers will be bound to active CPU via
unbind_workers().

But the unbound worker still has a chance to create a new worker, which
has bound the newly created task to pool->attrs->cpumask. But the CPU has
been unplugged.

Please refer to the following scenarios.

           CPU0                                  CPU1
------------------------------------------------------------------
sched_cpu_deactivate(cpu_active_mask clear)
workqueue_offline_cpu(work pool POOL_DISASSOCIATED)
  -- all worker will migrate to another cpu --
                                    worker_thread
                                    -- will create new worker if
                                       pool->worklist is not empty
                                       create_worker()
                                     -- new kworker will bound to CPU0
                               (pool->attrs->cpumask will be mask of CPU0).
      kworker/0:x will running on rq

sched_cpu_dying
  if (rq->nr_running != 1 || rq_has_pinned_tasks(rq))
    WARN(true, "Dying CPU not properly vacated!");
      ---------OOPS-------------

And the crash log as fellowing.

The crash log is as follows:
[ 1622.829074] ------------[ cut here ]------------
[ 1622.829081] Dying CPU not properly vacated!
[ 1622.829091] WARNING: CPU: 3 PID: 31 at kernel/sched/core.c:7756 sched_cpu_dying+0x74/0x204
[ 1622.829374] CPU: 3 PID: 31 Comm: migration/3 Tainted: P           O      5.10.59-rt52 #2
[ 1622.829386] Stopper: multi_cpu_stop+0x0/0x160 <- 0x0
[ 1622.829400] pstate: 60c00009 (nZCv daif +PAN +UAO -TCO BTYPE=--)
[ 1622.829408] pc : sched_cpu_dying+0x74/0x204
[ 1622.829415] lr : sched_cpu_dying+0x74/0x204
[ 1622.829421] sp : ffff800012933c70
[ 1622.829424] pmr_save: 00000060
[ 1622.829426] x29: ffff800012933c70 x28: 0000000000000000
[ 1622.829435] x27: 0000000000000000 x26: 0000000000000001
[ 1622.829444] x25: 0000000000000000 x24: ffff800018353c2c
[ 1622.829452] x23: 0000000000000003 x22: 0000000000000003
[ 1622.829460] x21: 0000000000000059 x20: 0000000000000000
[ 1622.829468] x19: ffff00027ee68a80 x18: 0000000000000000
[ 1622.829477] x17: 0000000000000000 x16: 0000000000000000
[ 1622.829485] x15: ffffffffffffffff x14: ffff80001169ae30
[ 1622.829493] x13: ffffffffffc38c07 x12: ffffffffffffffff
[ 1622.829501] x11: ffffffffffe00000 x10: ffff80001169ae58
[ 1622.829510] x9 : 000000000000001e x8 : ffff80001169ae30
[ 1622.829518] x7 : ffff800012933ab0 x6 : 00000000ffff0e20
[ 1622.829526] x5 : ffff00027ee62988 x4 : 00000000ffff0e20
[ 1622.829535] x3 : ffff800011e30180 x2 : 0000000100000002
[ 1622.829543] x1 : 0000000000000000 x0 : 0000000000000000
[ 1622.829552] Call trace:
[ 1622.829555]  sched_cpu_dying+0x74/0x204
[ 1622.829562]  cpuhp_invoke_callback+0xc0/0x1b0
[ 1622.829571]  take_cpu_down+0xbc/0xd4
[ 1622.829577]  multi_cpu_stop+0x138/0x160
[ 1622.829584]  cpu_stopper_thread+0x9c/0x118
[ 1622.829591]  smpboot_thread_fn+0x1e8/0x1ec
[ 1622.829598]  kthread+0x114/0x124
[ 1622.829606]  ret_from_fork+0x10/0x30
[ 1622.829615] ---[ end trace 0000000000000002 ]---
[ 1623.830273] CPU3 enqueued tasks (2 total):
[ 1623.830291] 	pid: 31, name: migration/3
[ 1623.830440] 	pid: 25654, name: kworker/3:0
[ 1623.830444] task:kworker/3:0     state:R  running task     stack:    0 pid:25654 ppid:     2 flags:0x00000028
[ 1623.830458] Call trace:
[ 1623.830460]  __switch_to+0x164/0x17c
[ 1623.830472]  __schedule+0x4cc/0x5c0
[ 1623.830483]  schedule+0x7c/0xcc
[ 1623.830491]  schedule_preempt_disabled+0x14/0x24
[ 1623.830500]  kthread+0xd8/0x124
[ 1623.830509]  ret_from_fork+0x10/0x30

In the crash log, the error enqueued kworker(pid: 25654, name: kworker/3:0)
is a newly created thread, and have bind to a unpluged CPU 3.

Can crash dump can also verify this.
crash> task -R nr_cpus_allowed,cpus_mask 25654
PID: 25654  TASK: ffff000181ff0000  CPU: 3   COMMAND: "kworker/3:0"
  nr_cpus_allowed = 1,
  cpus_mask = {
    bits = {8, 0}
  },

crash> struct worker_pool.cpu,nr_workers,attrs 0xffff00027ee68380
  cpu = 3,
  nr_workers = 3,
  attrs = 0xffff000180004480,
crash> struct workqueue_attrs 0xffff000180004480
struct workqueue_attrs {
  nice = 0,
  cpumask = {{
      bits = {8, 0}
    }},
  no_numa = false
}

The stack trace of the bad running task was dumped via the following patch:
Link: https://lore.kernel.org/all/20220519161125.41144-1-schspa@gmail.com/
And I think this debug patch needs to be added to the mainline,
it can help us to debug this kind of problem

To fix it, we can use cpu_active_mask when work pool is DISASSOCIATED.

Signed-off-by: Schspa Shi <schspa@gmail.com>

--

Changelog:
v1 -> v2:
        - Move worker task bind to worker_attach_to_pool, remove extra
        wq_pool_attach_mutex added.
        - Add a timing diagram to make this question clearer.
v2 -> v3:
        - Add missing PF_NO_SETAFFINITY, use cpumask_intersects to
        avoid setting bad mask for unbound work pool as Lai Jiangshan
        advised.
        - Call kthread_set_pre_cpu correctly for unbound worker.
---
 kernel/workqueue.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1ea50f6be843..b3e9289d9640 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1860,8 +1860,16 @@ static struct worker *alloc_worker(int node)
 static void worker_attach_to_pool(struct worker *worker,
 				   struct worker_pool *pool)
 {
+	const struct cpumask *cpu_mask;
+
 	mutex_lock(&wq_pool_attach_mutex);
 
+	if (cpumask_intersects(pool->attrs->cpumask, cpu_active_mask))
+		cpu_mask = pool->attrs->cpumask;
+	else
+		cpu_mask = wq_unbound_cpumask;
+
+	set_cpus_allowed_ptr(worker->task, cpu_mask);
 	/*
 	 * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
 	 * stable across this function.  See the comments above the flag
@@ -1870,10 +1878,8 @@ static void worker_attach_to_pool(struct worker *worker,
 	if (pool->flags & POOL_DISASSOCIATED)
 		worker->flags |= WORKER_UNBOUND;
 	else
-		kthread_set_per_cpu(worker->task, pool->cpu);
-
-	if (worker->rescue_wq)
-		set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+		kthread_set_per_cpu(worker->task,
+				cpu_mask == wq_unbound_cpumask ? -1 : pool->cpu);
 
 	list_add_tail(&worker->node, &pool->workers);
 	worker->pool = pool;
@@ -1952,8 +1958,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 		goto fail;
 
 	set_user_nice(worker->task, pool->attrs->nice);
-	kthread_bind_mask(worker->task, pool->attrs->cpumask);
 
+	worker->task->flags |= PF_NO_SETAFFINITY;
 	/* successful, attach the worker to the pool */
 	worker_attach_to_pool(worker, pool);
 
-- 
2.29.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] workqueue: Use active mask for new worker when pool is DISASSOCIATED
  2022-07-14  3:16 [PATCH v3] workqueue: Use active mask for new worker when pool is DISASSOCIATED Schspa Shi
@ 2022-07-14  8:34 ` Lai Jiangshan
  2022-07-14  8:58   ` Zhaohui Shi
  0 siblings, 1 reply; 3+ messages in thread
From: Lai Jiangshan @ 2022-07-14  8:34 UTC (permalink / raw)
  To: Schspa Shi; +Cc: Tejun Heo, LKML, zhaohui.shi, Peter Zijlstra

On Thu, Jul 14, 2022 at 11:16 AM Schspa Shi <schspa@gmail.com> wrote:
>
> When CPU-[un]hotplugs, all workers will be bound to active CPU via
> unbind_workers().
>
> But the unbound worker still has a chance to create a new worker, which
> has bound the newly created task to pool->attrs->cpumask. But the CPU has
> been unplugged.
>
> Please refer to the following scenarios.
>
>            CPU0                                  CPU1
> ------------------------------------------------------------------
> sched_cpu_deactivate(cpu_active_mask clear)
> workqueue_offline_cpu(work pool POOL_DISASSOCIATED)
>   -- all worker will migrate to another cpu --
>                                     worker_thread
>                                     -- will create new worker if
>                                        pool->worklist is not empty
>                                        create_worker()
>                                      -- new kworker will bound to CPU0

How will the new kworker bound to CPU0?  Could you give more details?

Since the pool is POOL_DISASSOCIATED and kthread_is_per_cpu() will
be false for the new worker. ttwu() will put it on a fallback CPU IIUC
(see select_task_rq()).

>                                (pool->attrs->cpumask will be mask of CPU0).
>       kworker/0:x will running on rq
>
> sched_cpu_dying
>   if (rq->nr_running != 1 || rq_has_pinned_tasks(rq))
>     WARN(true, "Dying CPU not properly vacated!");
>       ---------OOPS-------------
>


> The stack trace of the bad running task was dumped via the following patch:
> Link: https://lore.kernel.org/all/20220519161125.41144-1-schspa@gmail.com/
> And I think this debug patch needs to be added to the mainline,
> it can help us to debug this kind of problem
>
> To fix it, we can use cpu_active_mask when work pool is DISASSOCIATED.

use wq_unbound_cpumask.

>
> Signed-off-by: Schspa Shi <schspa@gmail.com>

Please solo CC Peter, as:

CC: Peter Zijlstra <peterz@infradead.org>

>
> --
>
> Changelog:
> v1 -> v2:
>         - Move worker task bind to worker_attach_to_pool, remove extra
>         wq_pool_attach_mutex added.
>         - Add a timing diagram to make this question clearer.
> v2 -> v3:
>         - Add missing PF_NO_SETAFFINITY, use cpumask_intersects to
>         avoid setting bad mask for unbound work pool as Lai Jiangshan
>         advised.
>         - Call kthread_set_pre_cpu correctly for unbound worker.
> ---
>  kernel/workqueue.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 1ea50f6be843..b3e9289d9640 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1860,8 +1860,16 @@ static struct worker *alloc_worker(int node)
>  static void worker_attach_to_pool(struct worker *worker,
>                                    struct worker_pool *pool)
>  {
> +       const struct cpumask *cpu_mask;
> +
>         mutex_lock(&wq_pool_attach_mutex);
>
> +       if (cpumask_intersects(pool->attrs->cpumask, cpu_active_mask))
> +               cpu_mask = pool->attrs->cpumask;
> +       else
> +               cpu_mask = wq_unbound_cpumask;
> +
> +       set_cpus_allowed_ptr(worker->task, cpu_mask);
>         /*
>          * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
>          * stable across this function.  See the comments above the flag
> @@ -1870,10 +1878,8 @@ static void worker_attach_to_pool(struct worker *worker,
>         if (pool->flags & POOL_DISASSOCIATED)
>                 worker->flags |= WORKER_UNBOUND;
>         else
> -               kthread_set_per_cpu(worker->task, pool->cpu);
> -
> -       if (worker->rescue_wq)
> -               set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> +               kthread_set_per_cpu(worker->task,
> +                               cpu_mask == wq_unbound_cpumask ? -1 : pool->cpu);

Only workers for percpu pool need to set kthread_set_per_cpu().
So it is already handled in the above code, the branch is unneeded.

>
>         list_add_tail(&worker->node, &pool->workers);
>         worker->pool = pool;
> @@ -1952,8 +1958,8 @@ static struct worker *create_worker(struct worker_pool *pool)
>                 goto fail;
>
>         set_user_nice(worker->task, pool->attrs->nice);
> -       kthread_bind_mask(worker->task, pool->attrs->cpumask);
>
> +       worker->task->flags |= PF_NO_SETAFFINITY;
>         /* successful, attach the worker to the pool */
>         worker_attach_to_pool(worker, pool);
>
> --
> 2.29.0
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] workqueue: Use active mask for new worker when pool is DISASSOCIATED
  2022-07-14  8:34 ` Lai Jiangshan
@ 2022-07-14  8:58   ` Zhaohui Shi
  0 siblings, 0 replies; 3+ messages in thread
From: Zhaohui Shi @ 2022-07-14  8:58 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Tejun Heo, LKML, Peter Zijlstra


Lai Jiangshan <jiangshanlai@gmail.com> writes:

> On Thu, Jul 14, 2022 at 11:16 AM Schspa Shi <schspa@gmail.com> wrote:
>>
>> When CPU-[un]hotplugs, all workers will be bound to active CPU via
>> unbind_workers().
>>
>> But the unbound worker still has a chance to create a new worker, which
>> has bound the newly created task to pool->attrs->cpumask. But the CPU has
>> been unplugged.
>>
>> Please refer to the following scenarios.
>>
>>            CPU0                                  CPU1
>> ------------------------------------------------------------------
>> sched_cpu_deactivate(cpu_active_mask clear)
>> workqueue_offline_cpu(work pool POOL_DISASSOCIATED)
>>   -- all worker will migrate to another cpu --
>>                                     worker_thread
>>                                     -- will create new worker if
>>                                        pool->worklist is not empty
>>                                        create_worker()
>>                                      -- new kworker will bound to CPU0
>
> How will the new kworker bound to CPU0?  Could you give more details?
>

It's because we use pool->attrs->cpumask for the newly created worker
to bind it to CPU0.

Please note that the CPU0 marked here does not correspond to the
fault log below, maybe I should change this CPU0 to CPU3, then it
can be correspond to the exception log below.


> Since the pool is POOL_DISASSOCIATED and kthread_is_per_cpu() will
> be false for the new worker. ttwu() will put it on a fallback CPU IIUC
> (see select_task_rq()).
>

It won't put it on a fallback CPU, is_cpu_allowed() will use
cpu_online_mask to check if a kthread can running on this CPU, we
need cpu_active_mask in this case.

see:
static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
{
	...
	/* KTHREAD_IS_PER_CPU is always allowed. */
	if (kthread_is_per_cpu(p))
		return cpu_online(cpu);
	...
}

>>                                (pool->attrs->cpumask will be mask of CPU0).
>>       kworker/0:x will running on rq
>>
>> sched_cpu_dying
>>   if (rq->nr_running != 1 || rq_has_pinned_tasks(rq))
>>     WARN(true, "Dying CPU not properly vacated!");
>>       ---------OOPS-------------
>>
>
>
>> The stack trace of the bad running task was dumped via the following patch:
>> Link: https://lore.kernel.org/all/20220519161125.41144-1-schspa@gmail.com/
>> And I think this debug patch needs to be added to the mainline,
>> it can help us to debug this kind of problem
>>
>> To fix it, we can use cpu_active_mask when work pool is DISASSOCIATED.
>
> use wq_unbound_cpumask.
>

Yes, I forgot to change this.

>>
>> Signed-off-by: Schspa Shi <schspa@gmail.com>
>
> Please solo CC Peter, as:
>
> CC: Peter Zijlstra <peterz@infradead.org>
>

OK, thanks for reminding me.

>>
>> --
>>
>> Changelog:
>> v1 -> v2:
>>         - Move worker task bind to worker_attach_to_pool, remove extra
>>         wq_pool_attach_mutex added.
>>         - Add a timing diagram to make this question clearer.
>> v2 -> v3:
>>         - Add missing PF_NO_SETAFFINITY, use cpumask_intersects to
>>         avoid setting bad mask for unbound work pool as Lai Jiangshan
>>         advised.
>>         - Call kthread_set_pre_cpu correctly for unbound worker.
>> ---
>>  kernel/workqueue.c | 16 +++++++++++-----
>>  1 file changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 1ea50f6be843..b3e9289d9640 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -1860,8 +1860,16 @@ static struct worker *alloc_worker(int node)
>>  static void worker_attach_to_pool(struct worker *worker,
>>                                    struct worker_pool *pool)
>>  {
>> +       const struct cpumask *cpu_mask;
>> +
>>         mutex_lock(&wq_pool_attach_mutex);
>>
>> +       if (cpumask_intersects(pool->attrs->cpumask, cpu_active_mask))
>> +               cpu_mask = pool->attrs->cpumask;
>> +       else
>> +               cpu_mask = wq_unbound_cpumask;
>> +
>> +       set_cpus_allowed_ptr(worker->task, cpu_mask);
>>         /*
>>          * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
>>          * stable across this function.  See the comments above the flag
>> @@ -1870,10 +1878,8 @@ static void worker_attach_to_pool(struct worker *worker,
>>         if (pool->flags & POOL_DISASSOCIATED)
>>                 worker->flags |= WORKER_UNBOUND;
>>         else
>> -               kthread_set_per_cpu(worker->task, pool->cpu);
>> -
>> -       if (worker->rescue_wq)
>> -               set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
>> +               kthread_set_per_cpu(worker->task,
>> +                               cpu_mask == wq_unbound_cpumask ? -1 : pool->cpu);
>
> Only workers for percpu pool need to set kthread_set_per_cpu().
> So it is already handled in the above code, the branch is unneeded.
>
Change this to something like fellowing.

	if (pool->flags & POOL_DISASSOCIATED)
		worker->flags |= WORKER_UNBOUND;
	else if (cpu_mask == pool->attrs->cpumask)
    	kthread_set_per_cpu(worker->task, pool->cpu);

I add this because we will set it = -1 on unbind_workers too.
static void unbind_workers(int cpu)
{
	...
			for_each_pool_worker(worker, pool) {
				kthread_set_per_cpu(worker->task, -1);
				WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0);
			}
    ...
}

>>
>>         list_add_tail(&worker->node, &pool->workers);
>>         worker->pool = pool;
>> @@ -1952,8 +1958,8 @@ static struct worker *create_worker(struct worker_pool *pool)
>>                 goto fail;
>>
>>         set_user_nice(worker->task, pool->attrs->nice);
>> -       kthread_bind_mask(worker->task, pool->attrs->cpumask);
>>
>> +       worker->task->flags |= PF_NO_SETAFFINITY;
>>         /* successful, attach the worker to the pool */
>>         worker_attach_to_pool(worker, pool);
>>
>> --
>> 2.29.0
>>


-- 
BRs
Zhaohui Shi

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-07-14  9:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14  3:16 [PATCH v3] workqueue: Use active mask for new worker when pool is DISASSOCIATED Schspa Shi
2022-07-14  8:34 ` Lai Jiangshan
2022-07-14  8:58   ` Zhaohui Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.