All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lai Jiangshan <jiangshanlai@gmail.com>
To: Schspa Shi <schspa@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	zhaohui.shi@horizon.ai, Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v3] workqueue: Use active mask for new worker when pool is DISASSOCIATED
Date: Thu, 14 Jul 2022 16:34:22 +0800	[thread overview]
Message-ID: <CAJhGHyD=7t+-Env=Wim-3atq=qJg1j5EKiTvsbqhX1xCdi27Wg@mail.gmail.com> (raw)
In-Reply-To: <20220714031645.28004-1-schspa@gmail.com>

On Thu, Jul 14, 2022 at 11:16 AM Schspa Shi <schspa@gmail.com> wrote:
>
> When CPU-[un]hotplugs, all workers will be bound to active CPU via
> unbind_workers().
>
> But the unbound worker still has a chance to create a new worker, which
> has bound the newly created task to pool->attrs->cpumask. But the CPU has
> been unplugged.
>
> Please refer to the following scenarios.
>
>            CPU0                                  CPU1
> ------------------------------------------------------------------
> sched_cpu_deactivate(cpu_active_mask clear)
> workqueue_offline_cpu(work pool POOL_DISASSOCIATED)
>   -- all worker will migrate to another cpu --
>                                     worker_thread
>                                     -- will create new worker if
>                                        pool->worklist is not empty
>                                        create_worker()
>                                      -- new kworker will bound to CPU0

How will the new kworker bound to CPU0?  Could you give more details?

Since the pool is POOL_DISASSOCIATED and kthread_is_per_cpu() will
be false for the new worker. ttwu() will put it on a fallback CPU IIUC
(see select_task_rq()).

>                                (pool->attrs->cpumask will be mask of CPU0).
>       kworker/0:x will running on rq
>
> sched_cpu_dying
>   if (rq->nr_running != 1 || rq_has_pinned_tasks(rq))
>     WARN(true, "Dying CPU not properly vacated!");
>       ---------OOPS-------------
>


> The stack trace of the bad running task was dumped via the following patch:
> Link: https://lore.kernel.org/all/20220519161125.41144-1-schspa@gmail.com/
> And I think this debug patch needs to be added to the mainline,
> it can help us to debug this kind of problem
>
> To fix it, we can use cpu_active_mask when work pool is DISASSOCIATED.

use wq_unbound_cpumask.

>
> Signed-off-by: Schspa Shi <schspa@gmail.com>

Please solo CC Peter, as:

CC: Peter Zijlstra <peterz@infradead.org>

>
> --
>
> Changelog:
> v1 -> v2:
>         - Move worker task bind to worker_attach_to_pool, remove extra
>         wq_pool_attach_mutex added.
>         - Add a timing diagram to make this question clearer.
> v2 -> v3:
>         - Add missing PF_NO_SETAFFINITY, use cpumask_intersects to
>         avoid setting bad mask for unbound work pool as Lai Jiangshan
>         advised.
>         - Call kthread_set_pre_cpu correctly for unbound worker.
> ---
>  kernel/workqueue.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 1ea50f6be843..b3e9289d9640 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1860,8 +1860,16 @@ static struct worker *alloc_worker(int node)
>  static void worker_attach_to_pool(struct worker *worker,
>                                    struct worker_pool *pool)
>  {
> +       const struct cpumask *cpu_mask;
> +
>         mutex_lock(&wq_pool_attach_mutex);
>
> +       if (cpumask_intersects(pool->attrs->cpumask, cpu_active_mask))
> +               cpu_mask = pool->attrs->cpumask;
> +       else
> +               cpu_mask = wq_unbound_cpumask;
> +
> +       set_cpus_allowed_ptr(worker->task, cpu_mask);
>         /*
>          * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
>          * stable across this function.  See the comments above the flag
> @@ -1870,10 +1878,8 @@ static void worker_attach_to_pool(struct worker *worker,
>         if (pool->flags & POOL_DISASSOCIATED)
>                 worker->flags |= WORKER_UNBOUND;
>         else
> -               kthread_set_per_cpu(worker->task, pool->cpu);
> -
> -       if (worker->rescue_wq)
> -               set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> +               kthread_set_per_cpu(worker->task,
> +                               cpu_mask == wq_unbound_cpumask ? -1 : pool->cpu);

Only workers for percpu pool need to set kthread_set_per_cpu().
So it is already handled in the above code, the branch is unneeded.

>
>         list_add_tail(&worker->node, &pool->workers);
>         worker->pool = pool;
> @@ -1952,8 +1958,8 @@ static struct worker *create_worker(struct worker_pool *pool)
>                 goto fail;
>
>         set_user_nice(worker->task, pool->attrs->nice);
> -       kthread_bind_mask(worker->task, pool->attrs->cpumask);
>
> +       worker->task->flags |= PF_NO_SETAFFINITY;
>         /* successful, attach the worker to the pool */
>         worker_attach_to_pool(worker, pool);
>
> --
> 2.29.0
>

  reply	other threads:[~2022-07-14  8:34 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-14  3:16 [PATCH v3] workqueue: Use active mask for new worker when pool is DISASSOCIATED Schspa Shi
2022-07-14  8:34 ` Lai Jiangshan [this message]
2022-07-14  8:58   ` Zhaohui Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJhGHyD=7t+-Env=Wim-3atq=qJg1j5EKiTvsbqhX1xCdi27Wg@mail.gmail.com' \
    --to=jiangshanlai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=schspa@gmail.com \
    --cc=tj@kernel.org \
    --cc=zhaohui.shi@horizon.ai \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.