linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] workqueue: Make create_worker() safe against spurious wakeups
Date: Thu, 23 Jun 2022 09:14:50 +0200	[thread overview]
Message-ID: <YrQS6tiSIOyK2Gi9@dhcp22.suse.cz> (raw)
In-Reply-To: <YrQPfBYBZ3wM7GjR@alley>

On Thu 23-06-22 09:00:12, Petr Mladek wrote:
> On Wed 2022-06-22 16:08:53, Petr Mladek wrote:
> > A system crashed with the following BUG() report:
> > 
> >   [115147.050484] BUG: kernel NULL pointer dereference, address: 0000000000000000
> >   [115147.050488] #PF: supervisor write access in kernel mode
> >   [115147.050489] #PF: error_code(0x0002) - not-present page
> >   [115147.050490] PGD 0 P4D 0
> >   [115147.050494] Oops: 0002 [#1] PREEMPT_RT SMP NOPTI
> >   [115147.050498] CPU: 1 PID: 16213 Comm: kthreadd Kdump: loaded Tainted: G           O   X    5.3.18-2-rt #1 SLE15-SP2 (unreleased)
> >   [115147.050510] RIP: 0010:_raw_spin_lock_irq+0x14/0x30
> >   [115147.050513] Code: 89 c6 e8 5f 7a 9b ff 66 90 c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 fa 65 ff 05 fb 53 6c 55 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 2e 7a 9b ff 66 90 c3 90 90 90 90 90
> >   [115147.050514] RSP: 0018:ffffb0f68822fed8 EFLAGS: 00010046
> >   [115147.050515] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> >   [115147.050516] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 0000000000000000
> >   [115147.050517] RBP: ffff9ca73af40a40 R08: 0000000000000001 R09: 0000000000027340
> >   [115147.050519] R10: ffffb0f68822fe70 R11: 00000000000000a9 R12: ffffb0f688067dc0
> >   [115147.050520] R13: ffff9ca77e9a8000 R14: ffff9ca7634ca780 R15: ffff9ca7634ca780
> >   [115147.050521] FS:  0000000000000000(0000) GS:ffff9ca77fb00000(0000) knlGS:0000000000000000
> >   [115147.050523] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   [115147.050524] CR2: 00000000000000b8 CR3: 000000004472e000 CR4: 00000000003406e0
> >   [115147.050524] Call Trace:
> >   [115147.050533]  worker_thread+0xb4/0x3c0
> >   [115147.050538]  ? process_one_work+0x4a0/0x4a0
> >   [115147.050540]  kthread+0x152/0x170
> >   [115147.050542]  ? kthread_park+0xa0/0xa0
> >   [115147.050544]  ret_from_fork+0x35/0x40
> > 
> > Further debugging shown that the worker thread was woken
> > before worker_attach_to_pool() finished in create_worker().
> > 
> > Any kthread is supposed to stay in TASK_UNINTERRUPTIBLE sleep
> > until it is explicitly woken. But a spurious wakeup might
> > break this expectation.
> > 
> > As a result, worker_thread() might read worker->pool before
> > it was set in worker create_worker() by worker_attach_to_pool().
> > Also manage_workers() might want to create yet another worker
> > before worker->pool->nr_workers is updated. It is a kind off
> > a chicken & egg problem.
> > 
> > Synchronize these operations using a completion API.
> > 
> > Note that worker->pool might be then read without wq_pool_attach_mutex.
> > Normal worker always belongs to the same pool.
> > 
> > Also note that rescuer_thread() does not need this because all
> > needed values are set before the kthreads is created. It is
> > connected with a particular workqueue. It is attached to different
> > pools when needed.
> > 
> > Signed-off-by: Petr Mladek <pmladek@suse.com>
> > ---
> >  kernel/workqueue.c          | 13 ++++++++++++-
> >  kernel/workqueue_internal.h |  1 +
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index 1ea50f6be843..9586b0797145 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -2378,10 +2380,19 @@ static void set_pf_worker(bool val)
> >  static int worker_thread(void *__worker)
> >  {
> >  	struct worker *worker = __worker;
> > -	struct worker_pool *pool = worker->pool;
> > +	struct worker_pool *pool;
> >  
> >  	/* tell the scheduler that this is a workqueue worker */
> >  	set_pf_worker(true);
> > +
> > +	/*
> > +	 * Wait until the worker is ready to get started. It must be attached
> > +	 * to a pool first. This is needed because of spurious wakeups.
> > +	 */
> > +	while (wait_for_completion_interruptible(&worker->ready_to_start))
> 
> I have realized that this is wrong. I used _interruptible() variant
> because we do not know how long it would need to sleep. And long
> sleeping in TASK_UNINTERRUPTIBLE might trigger hung task warning.
> 
> But kthread_bind_mask() requires that the kthread will be in
> TASK_UNINTERRUPTIBLE state.
> 
> Note that even the freshly created kthread is sleeping in
> TASK_UNTERRUPTIBLE, see kthread() in kernel/kthread.c. But
> it does not trigger the hung task warning because there is
> a special check for this case, see check_hung_task():
> 
> 	/*
> 	 * When a freshly created task is scheduled once, changes its state to
> 	 * TASK_UNINTERRUPTIBLE without having ever been switched out once, it
> 	 * musn't be checked.
> 	 */
> 	if (unlikely(!switch_count))
> 		return;
> 
> The following seems to work well:
> 
> 	while (!wait_for_completion_timeout(&worker->ready_to_start, 5 * HZ))
> 		;

This is really ugly and I do not really see why this is needed. Should
there be such an overloaded system that create_worker cannot finish the
initialization before khungtask triggers then we should be notified. I
do not see how papering this over would be helpful.

That being said, I really thing that this should be plain
wait_for_completion()

With that changed, feel free to add
Reviewed-by: Michal Hocko <mhocko@suse.com>

and thanks helping to debug this and prepare the fix. I was not really
aware of the spurious wake up scenario. Btw. it would be great to
describe that in a more detail as well for future reference.

Thanks!
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2022-06-23  7:14 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22 14:08 [PATCH] workqueue: Make create_worker() safe against spurious wakeups Petr Mladek
2022-06-23  7:00 ` Petr Mladek
2022-06-23  7:14   ` Michal Hocko [this message]
2022-06-25  5:00 ` re. Spurious wakeup on a newly created kthread Tejun Heo
2022-06-25 17:01   ` Linus Torvalds
2022-06-25 17:36     ` Eric W. Biederman
2022-06-25 18:25       ` Linus Torvalds
2022-06-25 18:43         ` Linus Torvalds
2022-06-25 23:28           ` Eric W. Biederman
2022-06-25 23:41             ` Eric W. Biederman
2022-06-25 23:43             ` Linus Torvalds
2022-06-25 23:48               ` Linus Torvalds
2022-06-26  0:19                 ` Eric W. Biederman
2022-06-27  0:01                   ` Wedson Almeida Filho
2022-06-27  7:11                     ` Peter Zijlstra
2022-06-27 18:23                       ` Wedson Almeida Filho
2022-06-27 18:45                         ` Linus Torvalds
2022-06-26 19:14                 ` [PATCH 0/3] kthread: Stop using TASK_UNINTERRUPTIBLE Eric W. Biederman
2022-06-26 19:15                   ` [PATCH 1/3] kthread: Remove the flags argument from kernel_thread Eric W. Biederman
2022-06-26 21:20                     ` Linus Torvalds
2022-06-26 19:16                   ` [PATCH 2/3] kthread: Replace kernel_thread with new_kthread Eric W. Biederman
2022-06-26 19:16                   ` [PATCH 3/3] kthread: Stop abusing TASK_UNINTERRUPTIBLE (INCOMPLETE) Eric W. Biederman
2022-06-26 19:59                     ` Linus Torvalds
2022-06-26 20:23                       ` Tejun Heo
2022-06-26 20:55                         ` Linus Torvalds
2022-06-27  7:22                         ` Peter Zijlstra
2022-06-27  8:11                           ` Tejun Heo
2022-06-27 18:04                             ` Wedson Almeida Filho
2022-06-27 22:06                               ` Peter Zijlstra
2022-06-27 22:34                                 ` Linus Torvalds
2022-06-27 22:45                                 ` Wedson Almeida Filho
2022-06-28  0:32                                 ` Wedson Almeida Filho
2022-06-28  7:58                                   ` Peter Zijlstra
2022-06-30  0:57                                     ` Wedson Almeida Filho
2022-06-26 22:14                     ` kernel test robot
2022-06-26 22:34                     ` kernel test robot
2022-06-26  0:21               ` re. Spurious wakeup on a newly created kthread Eric W. Biederman
2022-06-28 14:16           ` Christian Brauner
2022-06-26  0:26         ` Eric W. Biederman
2022-06-26  1:58     ` Tejun Heo
2022-06-26  2:53       ` Linus Torvalds
2022-06-26  6:09         ` Tejun Heo
2022-06-27 12:04         ` Michal Hocko
2022-06-28  9:51     ` Petr Mladek
2022-06-28 10:07       ` Tejun Heo
2022-06-27  8:07   ` Michal Hocko
2022-06-27  8:21     ` Tejun Heo
2022-06-27 10:18       ` Michal Hocko
2022-06-28 15:08     ` Petr Mladek
2022-08-04  8:57 ` [PATCH] workqueue: Make create_worker() safe against spurious wakeups Lai Jiangshan
2022-08-04 10:19   ` Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YrQS6tiSIOyK2Gi9@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=jiangshanlai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).