All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Liu <liumartin@google.com>
To: Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Petr Mladek <pmladek@suse.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>
Cc: minchan@google.com, www@google.com, davidchao@google.com,
	jenhaochen@google.com, linux-kernel@vger.kernel.org,
	clang-built-linux@googlegroups.com
Subject: Re: [PATCH] kthread: Fix kthread_mod_delayed_work vs kthread_cancel_delayed_work_sync race
Date: Thu, 20 May 2021 00:11:47 +0800	[thread overview]
Message-ID: <YKU4w9pDWn3lj1V+@google.com> (raw)
In-Reply-To: <20210513065458.941403-1-liumartin@google.com>

Hi Folks,

Could I get some help for reviewing this patch? Thank you.

On Thu, May 13, 2021 at 02:54:57PM +0800, Martin Liu wrote:
> We encountered a system hang issue while doing the tests. The callstack
> is as following
> 
> 	schedule+0x80/0x100
> 	schedule_timeout+0x48/0x138
> 	wait_for_common+0xa4/0x134
> 	wait_for_completion+0x1c/0x2c
> 	kthread_flush_work+0x114/0x1cc
> 	kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144
> 	kthread_cancel_delayed_work_sync+0x18/0x2c
> 	xxxx_pm_notify+0xb0/0xd8
> 	blocking_notifier_call_chain_robust+0x80/0x194
> 	pm_notifier_call_chain_robust+0x28/0x4c
> 	suspend_prepare+0x40/0x260
> 	enter_state+0x80/0x3f4
> 	pm_suspend+0x60/0xdc
> 	state_store+0x108/0x144
> 	kobj_attr_store+0x38/0x88
> 	sysfs_kf_write+0x64/0xc0
> 	kernfs_fop_write_iter+0x108/0x1d0
> 	vfs_write+0x2f4/0x368
> 	ksys_write+0x7c/0xec
> 
> When we started investigating, we found race between
> kthread_mod_delayed_work vs kthread_cancel_delayed_work_sync. The race's
> result could be simply reproduced as a kthread_mod_delayed_work with
> a following kthread_flush_work call.
> 
> Thing is we release kthread_mod_delayed_work kspin_lock in
> __kthread_cancel_work so it opens a race window for
> kthread_cancel_delayed_work_sync to change the canceling count used to
> prevent dwork from being requeued before calling kthread_flush_work.
> However, we don't check the canceling count after returning from
> __kthread_cancel_work and then insert the dwork to the worker. It
> results the following kthread_flush_work inserts flush work to dwork's
> tail which is at worker's dealyed_work_list. Therefore, flush work will
> never get moved to the worker's work_list to be executed. Finally,
> kthread_cancel_delayed_work_sync will NOT be able to get completed and
> wait forever. The code sequence diagram is as following
> 
> Thread A                Thread B
> kthread_mod_delayed_work
>   spin_lock
>    __kthread_cancel_work
>     canceling = 1
>     spin_unlock
>                         kthread_cancel_delayed_work_sync
>                           spin_lock
>                             kthread_cancel_work
>                           canceling = 2
>                           spin_unlock
>     del_timer_sync
>     spin_lock
>     canceling = 1 // canceling count gets update in ThreadB before
>   queue_delayed_work // dwork is put into the woker’s dealyed_work_list
>                         without checking the canceling count
>  spin_unlock
>                           kthread_flush_work
>                             spin_lock
>                             Insert flush work // at the tail of the
> 			                         dwork which is at
> 						 the worker’s
> 						 dealyed_work_list
>                             spin_unlock
>                             wait_for_completion // Thread B stuck here as
> 			                           flush work will never
> 						   get executed
> 
> The canceling count could change in __kthread_cancel_work as the spinlock
> get released and regained in between, let's check the count again before
> we queue the delayed work to avoid the race.
> 
> Fixes: 37be45d49dec2 ("kthread: allow to cancel kthread work")
> Tested-by: David Chao <davidchao@google.com>
> Signed-off-by: Martin Liu <liumartin@google.com>
> ---
>  kernel/kthread.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index fe3f2a40d61e..064eae335c1f 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -1181,6 +1181,19 @@ bool kthread_mod_delayed_work(struct kthread_worker *worker,
>  		goto out;
>  
>  	ret = __kthread_cancel_work(work, true, &flags);
> +
> +	/*
> +	 * Canceling could run in parallel from kthread_cancel_delayed_work_sync
> +	 * and change work's canceling count as the spinlock is released and regain
> +	 * in __kthread_cancel_work so we need to check the count again. Otherwise,
> +	 * we might incorrectly queue the dwork and further cause
> +	 * cancel_delayed_work_sync thread waiting for flush dwork endlessly.
> +	 */
> +	if (work->canceling) {
> +		ret = false;
> +		goto out;
> +	}
> +
>  fast_queue:
>  	__kthread_queue_delayed_work(worker, dwork, delay);
>  out:
> -- 
> 2.31.1.607.g51e8a6a459-goog
> 

  reply	other threads:[~2021-05-19 16:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-13  6:54 [PATCH] kthread: Fix kthread_mod_delayed_work vs kthread_cancel_delayed_work_sync race Martin Liu
2021-05-19 16:11 ` Martin Liu [this message]
2021-05-20 11:26 ` Petr Mladek
2021-05-20 11:44 ` Petr Mladek
2021-05-20 21:48 ` Andrew Morton
2021-05-21  7:12   ` Petr Mladek
2021-05-21  8:50     ` Martin Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YKU4w9pDWn3lj1V+@google.com \
    --to=liumartin@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=clang-built-linux@googlegroups.com \
    --cc=davidchao@google.com \
    --cc=jenhaochen@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minchan@google.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=pmladek@suse.com \
    --cc=tj@kernel.org \
    --cc=www@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.