linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Petr Mladek <pmladek@suse.com>,
	Martin Liu <liumartin@google.com>,
	jenhaochen@google.com, Minchan Kim <minchan@google.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Oleg Nesterov <oleg@redhat.com>, Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.14 24/25] kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
Date: Fri,  9 Jul 2021 15:18:55 +0200	[thread overview]
Message-ID: <20210709131642.430593730@linuxfoundation.org> (raw)
In-Reply-To: <20210709131627.928131764@linuxfoundation.org>

From: Petr Mladek <pmladek@suse.com>

commit 5fa54346caf67b4b1b10b1f390316ae466da4d53 upstream.

The system might hang with the following backtrace:

	schedule+0x80/0x100
	schedule_timeout+0x48/0x138
	wait_for_common+0xa4/0x134
	wait_for_completion+0x1c/0x2c
	kthread_flush_work+0x114/0x1cc
	kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144
	kthread_cancel_delayed_work_sync+0x18/0x2c
	xxxx_pm_notify+0xb0/0xd8
	blocking_notifier_call_chain_robust+0x80/0x194
	pm_notifier_call_chain_robust+0x28/0x4c
	suspend_prepare+0x40/0x260
	enter_state+0x80/0x3f4
	pm_suspend+0x60/0xdc
	state_store+0x108/0x144
	kobj_attr_store+0x38/0x88
	sysfs_kf_write+0x64/0xc0
	kernfs_fop_write_iter+0x108/0x1d0
	vfs_write+0x2f4/0x368
	ksys_write+0x7c/0xec

It is caused by the following race between kthread_mod_delayed_work()
and kthread_cancel_delayed_work_sync():

CPU0				CPU1

Context: Thread A		Context: Thread B

kthread_mod_delayed_work()
  spin_lock()
  __kthread_cancel_work()
     spin_unlock()
     del_timer_sync()
				kthread_cancel_delayed_work_sync()
				  spin_lock()
				  __kthread_cancel_work()
				    spin_unlock()
				    del_timer_sync()
				    spin_lock()

				  work->canceling++
				  spin_unlock
     spin_lock()
   queue_delayed_work()
     // dwork is put into the worker->delayed_work_list

   spin_unlock()

				  kthread_flush_work()
     // flush_work is put at the tail of the dwork

				    wait_for_completion()

Context: IRQ

  kthread_delayed_work_timer_fn()
    spin_lock()
    list_del_init(&work->node);
    spin_unlock()

BANG: flush_work is not longer linked and will never get proceed.

The problem is that kthread_mod_delayed_work() checks work->canceling
flag before canceling the timer.

A simple solution is to (re)check work->canceling after
__kthread_cancel_work().  But then it is not clear what should be
returned when __kthread_cancel_work() removed the work from the queue
(list) and it can't queue it again with the new @delay.

The return value might be used for reference counting.  The caller has
to know whether a new work has been queued or an existing one was
replaced.

The proper solution is that kthread_mod_delayed_work() will remove the
work from the queue (list) _only_ when work->canceling is not set.  The
flag must be checked after the timer is stopped and the remaining
operations can be done under worker->lock.

Note that kthread_mod_delayed_work() could remove the timer and then
bail out.  It is fine.  The other canceling caller needs to cancel the
timer as well.  The important thing is that the queue (list)
manipulation is done atomically under worker->lock.

Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com
Fixes: 9a6b06c8d9a220860468a ("kthread: allow to modify delayed kthread work")
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reported-by: Martin Liu <liumartin@google.com>
Cc: <jenhaochen@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/kthread.c |   35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1006,8 +1006,11 @@ static void kthread_cancel_delayed_work_
 }
 
 /*
- * This function removes the work from the worker queue. Also it makes sure
- * that it won't get queued later via the delayed work's timer.
+ * This function removes the work from the worker queue.
+ *
+ * It is called under worker->lock. The caller must make sure that
+ * the timer used by delayed work is not running, e.g. by calling
+ * kthread_cancel_delayed_work_timer().
  *
  * The work might still be in use when this function finishes. See the
  * current_work proceed by the worker.
@@ -1015,13 +1018,8 @@ static void kthread_cancel_delayed_work_
  * Return: %true if @work was pending and successfully canceled,
  *	%false if @work was not pending
  */
-static bool __kthread_cancel_work(struct kthread_work *work, bool is_dwork,
-				  unsigned long *flags)
+static bool __kthread_cancel_work(struct kthread_work *work)
 {
-	/* Try to cancel the timer if exists. */
-	if (is_dwork)
-		kthread_cancel_delayed_work_timer(work, flags);
-
 	/*
 	 * Try to remove the work from a worker list. It might either
 	 * be from worker->work_list or from worker->delayed_work_list.
@@ -1074,11 +1072,23 @@ bool kthread_mod_delayed_work(struct kth
 	/* Work must not be used with >1 worker, see kthread_queue_work() */
 	WARN_ON_ONCE(work->worker != worker);
 
-	/* Do not fight with another command that is canceling this work. */
+	/*
+	 * Temporary cancel the work but do not fight with another command
+	 * that is canceling the work as well.
+	 *
+	 * It is a bit tricky because of possible races with another
+	 * mod_delayed_work() and cancel_delayed_work() callers.
+	 *
+	 * The timer must be canceled first because worker->lock is released
+	 * when doing so. But the work can be removed from the queue (list)
+	 * only when it can be queued again so that the return value can
+	 * be used for reference counting.
+	 */
+	kthread_cancel_delayed_work_timer(work, &flags);
 	if (work->canceling)
 		goto out;
+	ret = __kthread_cancel_work(work);
 
-	ret = __kthread_cancel_work(work, true, &flags);
 fast_queue:
 	__kthread_queue_delayed_work(worker, dwork, delay);
 out:
@@ -1100,7 +1110,10 @@ static bool __kthread_cancel_work_sync(s
 	/* Work must not be used with >1 worker, see kthread_queue_work(). */
 	WARN_ON_ONCE(work->worker != worker);
 
-	ret = __kthread_cancel_work(work, is_dwork, &flags);
+	if (is_dwork)
+		kthread_cancel_delayed_work_timer(work, &flags);
+
+	ret = __kthread_cancel_work(work);
 
 	if (worker->current_work != work)
 		goto out_fast;



  parent reply	other threads:[~2021-07-09 13:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-09 13:18 [PATCH 4.14 00/25] 4.14.239-rc1 review Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 01/25] include/linux/mmdebug.h: make VM_WARN* non-rvals Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 02/25] mm: add VM_WARN_ON_ONCE_PAGE() macro Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 03/25] mm/rmap: remove unneeded semicolon in page_not_mapped() Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 04/25] mm/rmap: use page_not_mapped in try_to_unmap() Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 05/25] mm/thp: try_to_unmap() use TTU_SYNC for safe splitting Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 06/25] mm/thp: fix vma_address() if virtual address below file offset Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 07/25] mm/thp: fix page_address_in_vma() on file THP tails Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 08/25] mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 09/25] mm: page_vma_mapped_walk(): use page for pvmw->page Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 10/25] mm: page_vma_mapped_walk(): settle PageHuge on entry Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 11/25] mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 12/25] mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 13/25] mm: page_vma_mapped_walk(): crossing page table boundary Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 14/25] mm: page_vma_mapped_walk(): add a level of indentation Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 15/25] mm: page_vma_mapped_walk(): use goto instead of while (1) Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 16/25] mm: page_vma_mapped_walk(): get vma_address_end() earlier Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 17/25] mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 18/25] mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 19/25] mm, futex: fix shared futex pgoff on shmem huge page Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 20/25] scsi: sr: Return appropriate error code when disk is ejected Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 21/25] drm/nouveau: fix dma_address check for CPU/GPU sync Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 22/25] kfifo: DECLARE_KIFO_PTR(fifo, u64) does not work on arm 32 bit Greg Kroah-Hartman
2021-07-09 13:18 ` [PATCH 4.14 23/25] kthread_worker: split code for canceling the delayed work timer Greg Kroah-Hartman
2021-07-09 13:18 ` Greg Kroah-Hartman [this message]
2021-07-09 13:18 ` [PATCH 4.14 25/25] xen/events: reset active flag for lateeoi events later Greg Kroah-Hartman
2021-07-10 14:05 ` [PATCH 4.14 00/25] 4.14.239-rc1 review Naresh Kamboju
2021-07-10 19:51 ` Guenter Roeck
2021-07-12  0:57 ` Samuel Zou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210709131642.430593730@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=jenhaochen@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liumartin@google.com \
    --cc=minchan@google.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=oleg@redhat.com \
    --cc=pmladek@suse.com \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).