All of lore.kernel.org
 help / color / mirror / Atom feed
* + kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race.patch added to -mm tree
@ 2021-05-20 21:47 akpm
  2021-05-21 16:35 ` Oleg Nesterov
  0 siblings, 1 reply; 5+ messages in thread
From: akpm @ 2021-05-20 21:47 UTC (permalink / raw)
  To: bp, davidchao, jenhaochen, jkosina, josh, liumartin, mhocko,
	mingo, mm-commits, nathan, ndesaulniers, oleg, paulmck, peterz,
	pmladek, rostedt, stable, tglx, tj, vbabka


The patch titled
     Subject: kthread: fix kthread_mod_delayed_work vs kthread_cancel_delayed_work_sync race
has been added to the -mm tree.  Its filename is
     kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Martin Liu <liumartin@google.com>
Subject: kthread: fix kthread_mod_delayed_work vs kthread_cancel_delayed_work_sync race

We encountered a system hang issue while doing the tests.  The callstack
is as following

	schedule+0x80/0x100
	schedule_timeout+0x48/0x138
	wait_for_common+0xa4/0x134
	wait_for_completion+0x1c/0x2c
	kthread_flush_work+0x114/0x1cc
	kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144
	kthread_cancel_delayed_work_sync+0x18/0x2c
	xxxx_pm_notify+0xb0/0xd8
	blocking_notifier_call_chain_robust+0x80/0x194
	pm_notifier_call_chain_robust+0x28/0x4c
	suspend_prepare+0x40/0x260
	enter_state+0x80/0x3f4
	pm_suspend+0x60/0xdc
	state_store+0x108/0x144
	kobj_attr_store+0x38/0x88
	sysfs_kf_write+0x64/0xc0
	kernfs_fop_write_iter+0x108/0x1d0
	vfs_write+0x2f4/0x368
	ksys_write+0x7c/0xec

When we started investigating, we found race between
kthread_mod_delayed_work vs kthread_cancel_delayed_work_sync.  The race's
result could be simply reproduced as a kthread_mod_delayed_work with a
following kthread_flush_work call.

Thing is we release kthread_mod_delayed_work kspin_lock in
__kthread_cancel_work so it opens a race window for
kthread_cancel_delayed_work_sync to change the canceling count used to
prevent dwork from being requeued before calling kthread_flush_work. 
However, we don't check the canceling count after returning from
__kthread_cancel_work and then insert the dwork to the worker.  It results
the following kthread_flush_work inserts flush work to dwork's tail which
is at worker's dealyed_work_list.  Therefore, flush work will never get
moved to the worker's work_list to be executed.  Finally,
kthread_cancel_delayed_work_sync will NOT be able to get completed and
wait forever.  The code sequence diagram is as following

Thread A                Thread B
kthread_mod_delayed_work
  spin_lock
   __kthread_cancel_work
    canceling = 1
    spin_unlock
                        kthread_cancel_delayed_work_sync
                          spin_lock
                            kthread_cancel_work
                          canceling = 2
                          spin_unlock
    del_timer_sync
    spin_lock
    canceling = 1 // canceling count gets update in ThreadB before
  queue_delayed_work // dwork is put into the woker's dealyed_work_list
                        without checking the canceling count
 spin_unlock
                          kthread_flush_work
                            spin_lock
                            Insert flush work // at the tail of the
			                         dwork which is at
						 the worker's
						 dealyed_work_list
                            spin_unlock
                            wait_for_completion // Thread B stuck here as
			                           flush work will never
						   get executed

The canceling count could change in __kthread_cancel_work as the spinlock
get released and regained in between, let's check the count again before
we queue the delayed work to avoid the race.

Link: https://lkml.kernel.org/r/20210513065458.941403-1-liumartin@google.com
Fixes: 37be45d49dec2 ("kthread: allow to cancel kthread work")
Signed-off-by: Martin Liu <liumartin@google.com>
Tested-by: David Chao <davidchao@google.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: <jenhaochen@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/kthread.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

--- a/kernel/kthread.c~kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race
+++ a/kernel/kthread.c
@@ -1181,6 +1181,19 @@ bool kthread_mod_delayed_work(struct kth
 		goto out;
 
 	ret = __kthread_cancel_work(work, true, &flags);
+
+	/*
+	 * Canceling could run in parallel from kthread_cancel_delayed_work_sync
+	 * and change work's canceling count as the spinlock is released and regain
+	 * in __kthread_cancel_work so we need to check the count again. Otherwise,
+	 * we might incorrectly queue the dwork and further cause
+	 * cancel_delayed_work_sync thread waiting for flush dwork endlessly.
+	 */
+	if (work->canceling) {
+		ret = false;
+		goto out;
+	}
+
 fast_queue:
 	__kthread_queue_delayed_work(worker, dwork, delay);
 out:
_

Patches currently in -mm which might be from liumartin@google.com are

kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race.patch


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: + kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race.patch added to -mm tree
  2021-05-20 21:47 + kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race.patch added to -mm tree akpm
@ 2021-05-21 16:35 ` Oleg Nesterov
  2021-05-24 15:08   ` Petr Mladek
  0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2021-05-21 16:35 UTC (permalink / raw)
  To: akpm
  Cc: bp, davidchao, jenhaochen, jkosina, josh, liumartin, mhocko,
	mingo, mm-commits, nathan, ndesaulniers, paulmck, peterz,
	pmladek, rostedt, stable, tglx, tj, vbabka, linux-kernel

On 05/20, Andrew Morton wrote:
>
> --- a/kernel/kthread.c~kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race
> +++ a/kernel/kthread.c
> @@ -1181,6 +1181,19 @@ bool kthread_mod_delayed_work(struct kth
>  		goto out;
>
>  	ret = __kthread_cancel_work(work, true, &flags);
> +
> +	/*
> +	 * Canceling could run in parallel from kthread_cancel_delayed_work_sync
> +	 * and change work's canceling count as the spinlock is released and regain
> +	 * in __kthread_cancel_work so we need to check the count again. Otherwise,
> +	 * we might incorrectly queue the dwork and further cause
> +	 * cancel_delayed_work_sync thread waiting for flush dwork endlessly.
> +	 */
> +	if (work->canceling) {
> +		ret = false;
> +		goto out;
> +	}
> +
>  fast_queue:
>  	__kthread_queue_delayed_work(worker, dwork, delay);

Never looked at this code before, can't review...

but note that another caller of __kthread_queue_delayed_work() needs to
check work->canceling too. So perhaps we should simply add queuing_blocked()
into __kthread_queue_delayed_work() ?

Something like below, uncompiled/untested, most probably incorrect.

Either way, this comment

	 * Return: %true if @dwork was pending and its timer was modified,
	 * %false otherwise.

above kthread_mod_delayed_work looks obviously wrong. Currently it returns
true if this work was pending. With your patch it returns true if it was
pending and not canceling.

With the patch below it returns true if the work was (re)queued successfully,
and this makes more sense to me. But again, I can easily misread this code.

In any case, even if my patch is correct, I won't insist, your fix is
much simpler.

Oleg.

--- x/kernel/kthread.c
+++ x/kernel/kthread.c
@@ -977,7 +977,7 @@ void kthread_delayed_work_timer_fn(struc
 }
 EXPORT_SYMBOL(kthread_delayed_work_timer_fn);
 
-static void __kthread_queue_delayed_work(struct kthread_worker *worker,
+static bool __kthread_queue_delayed_work(struct kthread_worker *worker,
 					 struct kthread_delayed_work *dwork,
 					 unsigned long delay)
 {
@@ -987,6 +987,9 @@ static void __kthread_queue_delayed_work
 	WARN_ON_FUNCTION_MISMATCH(timer->function,
 				  kthread_delayed_work_timer_fn);
 
+	if (queuing_blocked(worker, work))
+		return false;
+
 	/*
 	 * If @delay is 0, queue @dwork->work immediately.  This is for
 	 * both optimization and correctness.  The earliest @timer can
@@ -995,7 +998,7 @@ static void __kthread_queue_delayed_work
 	 */
 	if (!delay) {
 		kthread_insert_work(worker, work, &worker->work_list);
-		return;
+		return true;
 	}
 
 	/* Be paranoid and try to detect possible races already now. */
@@ -1005,6 +1008,7 @@ static void __kthread_queue_delayed_work
 	work->worker = worker;
 	timer->expires = jiffies + delay;
 	add_timer(timer);
+	return true;
 }
 
 /**
@@ -1028,16 +1032,12 @@ bool kthread_queue_delayed_work(struct k
 {
 	struct kthread_work *work = &dwork->work;
 	unsigned long flags;
-	bool ret = false;
+	bool ret;
 
 	raw_spin_lock_irqsave(&worker->lock, flags);
-
-	if (!queuing_blocked(worker, work)) {
-		__kthread_queue_delayed_work(worker, dwork, delay);
-		ret = true;
-	}
-
+	ret = __kthread_queue_delayed_work(worker, dwork, delay);
 	raw_spin_unlock_irqrestore(&worker->lock, flags);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(kthread_queue_delayed_work);
@@ -1180,9 +1180,9 @@ bool kthread_mod_delayed_work(struct kth
 	if (work->canceling)
 		goto out;
 
-	ret = __kthread_cancel_work(work, true, &flags);
+	__kthread_cancel_work(work, true, &flags);
 fast_queue:
-	__kthread_queue_delayed_work(worker, dwork, delay);
+	ret = __kthread_queue_delayed_work(worker, dwork, delay);
 out:
 	raw_spin_unlock_irqrestore(&worker->lock, flags);
 	return ret;


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: + kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race.patch added to -mm tree
  2021-05-21 16:35 ` Oleg Nesterov
@ 2021-05-24 15:08   ` Petr Mladek
  2021-05-26 17:06     ` Oleg Nesterov
  0 siblings, 1 reply; 5+ messages in thread
From: Petr Mladek @ 2021-05-24 15:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: akpm, bp, davidchao, jenhaochen, jkosina, josh, liumartin,
	mhocko, mingo, mm-commits, nathan, ndesaulniers, paulmck, peterz,
	rostedt, stable, tglx, tj, vbabka, linux-kernel

On Fri 2021-05-21 18:35:27, Oleg Nesterov wrote:
> On 05/20, Andrew Morton wrote:
> >
> > --- a/kernel/kthread.c~kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race
> > +++ a/kernel/kthread.c
> > @@ -1181,6 +1181,19 @@ bool kthread_mod_delayed_work(struct kth
> >  		goto out;
> >
> >  	ret = __kthread_cancel_work(work, true, &flags);
> > +
> > +	/*
> > +	 * Canceling could run in parallel from kthread_cancel_delayed_work_sync
> > +	 * and change work's canceling count as the spinlock is released and regain
> > +	 * in __kthread_cancel_work so we need to check the count again. Otherwise,
> > +	 * we might incorrectly queue the dwork and further cause
> > +	 * cancel_delayed_work_sync thread waiting for flush dwork endlessly.
> > +	 */
> > +	if (work->canceling) {
> > +		ret = false;
> > +		goto out;
> > +	}
> > +
> >  fast_queue:
> >  	__kthread_queue_delayed_work(worker, dwork, delay);
> 
> Never looked at this code before, can't review...
> 
> but note that another caller of __kthread_queue_delayed_work() needs to
> check work->canceling too. So perhaps we should simply add queuing_blocked()
> into __kthread_queue_delayed_work() ?

Good point. I do not have strong opinion. But if we move the check
to __kthread_queue_delayed_work() than it would make sense to
move it also into kthread_insert_work() to keep it symmetric.
But then we would do the check twice in some code paths.
Well, it would make the API more safe.


> Something like below, uncompiled/untested, most probably incorrect.
> 
> Either way, this comment
> 
> 	 * Return: %true if @dwork was pending and its timer was modified,
> 	 * %false otherwise.
> 
> above kthread_mod_delayed_work looks obviously wrong. Currently it returns
> true if this work was pending. With your patch it returns true if it was
> pending and not canceling.
>
> With the patch below it returns true if the work was (re)queued successfully,
> and this makes more sense to me. But again, I can easily misread this code.

Your patch changes the semantic. The current semantic is the same for
the workqueue's counter-part mod_delayed_work_on().

It look's weird by it makes sense.

kthread_mod_delayed_work() should always succeed and queue the work
with the new delay. Normally, the only interesting information is
whether the work was canceled (queued but not proceed). It means
that some job was not done.

The only situation when kthread_mod_delayed_work() is not able to
queue the work is when another process is canceling the work at
the same time. But it means that kthread_mod_delayed_work()
and kthread_cancel_delayed_work_sync() are called in parallel.
The result is racy by definition. It means that the code is racy.
And it typically means that the API is used a wrong way.
Note the comment:

 * A special case is when the work is being canceled in parallel.
 * It might be caused either by the real kthread_cancel_delayed_work_sync()
 * or yet another kthread_mod_delayed_work() call. We let the other command
 * win and return %false here. The caller is supposed to synchronize these
 * operations a reasonable way.


But you have a point. The new code returns "false" even when the work
was canceled. It means that the previously queue work was not
proceed.

We should actually keep the "ret" value as is to stay compatible with
workqueue API:

	/*
	 * Canceling could run in parallel from kthread_cancel_delayed_work_sync
	 * and change work's canceling count as the spinlock is released and regain
	 * in __kthread_cancel_work so we need to check the count again. Otherwise,
	 * we might incorrectly queue the dwork and further cause
	 * cancel_delayed_work_sync thread waiting for flush dwork endlessly.
	 *
	 * Keep the ret code. The API primary informs the caller
	 * whether some pending work has been canceled (not proceed).
	 */
	if (work->canceling)
		goto out;

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: + kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race.patch added to -mm tree
  2021-05-24 15:08   ` Petr Mladek
@ 2021-05-26 17:06     ` Oleg Nesterov
  2021-05-27 10:15       ` Petr Mladek
  0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2021-05-26 17:06 UTC (permalink / raw)
  To: Petr Mladek
  Cc: akpm, bp, davidchao, jenhaochen, jkosina, josh, liumartin,
	mhocko, mingo, mm-commits, nathan, ndesaulniers, paulmck, peterz,
	rostedt, stable, tglx, tj, vbabka, linux-kernel

On 05/24, Petr Mladek wrote:
>
> Your patch changes the semantic. The current semantic is the same for
> the workqueue's counter-part mod_delayed_work_on().

OK, I see, thanks. I was confused by the comment.

> We should actually keep the "ret" value as is to stay compatible with
> workqueue API:
>
> 	/*
> 	 * Canceling could run in parallel from kthread_cancel_delayed_work_sync
> 	 * and change work's canceling count as the spinlock is released and regain
> 	 * in __kthread_cancel_work so we need to check the count again. Otherwise,
> 	 * we might incorrectly queue the dwork and further cause
> 	 * cancel_delayed_work_sync thread waiting for flush dwork endlessly.
> 	 *
> 	 * Keep the ret code. The API primary informs the caller
> 	 * whether some pending work has been canceled (not proceed).
> 	 */
> 	if (work->canceling)
> 		goto out;

Agreed, we should keep the "ret" value.

but unless I am confused again this doesn't match mod_delayed_work_on()
which always returns true if it races with cancel(). Nevermind, I think
this doesn't matter.

Thanks,

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: + kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race.patch added to -mm tree
  2021-05-26 17:06     ` Oleg Nesterov
@ 2021-05-27 10:15       ` Petr Mladek
  0 siblings, 0 replies; 5+ messages in thread
From: Petr Mladek @ 2021-05-27 10:15 UTC (permalink / raw)
  To: Oleg Nesterov, liumartin, akpm, Tejun Heo
  Cc: bp, davidchao, jenhaochen, jkosina, josh, mhocko, mingo,
	mm-commits, nathan, ndesaulniers, paulmck, peterz, rostedt,
	stable, tglx, tj, vbabka, linux-kernel

Added Tejun into CC because of the workqueue API related question
at the end of the mail.

On Wed 2021-05-26 19:06:06, Oleg Nesterov wrote:
> On 05/24, Petr Mladek wrote:
> >
> > Your patch changes the semantic. The current semantic is the same for
> > the workqueue's counter-part mod_delayed_work_on().
> 
> OK, I see, thanks. I was confused by the comment.
> 
> > We should actually keep the "ret" value as is to stay compatible with
> > workqueue API:
> >
> > 	/*
> > 	 * Canceling could run in parallel from kthread_cancel_delayed_work_sync
> > 	 * and change work's canceling count as the spinlock is released and regain
> > 	 * in __kthread_cancel_work so we need to check the count again. Otherwise,
> > 	 * we might incorrectly queue the dwork and further cause
> > 	 * cancel_delayed_work_sync thread waiting for flush dwork endlessly.
> > 	 *
> > 	 * Keep the ret code. The API primary informs the caller
> > 	 * whether some pending work has been canceled (not proceed).
> > 	 */
> > 	if (work->canceling)
> > 		goto out;
> 
> Agreed, we should keep the "ret" value.

Martin Liu, could you please resend the patch without the "ret =
false" line? See above.

Andrew, could you please remove this patch from the -mm tree for now?

> but unless I am confused again this doesn't match mod_delayed_work_on()
> which always returns true if it races with cancel(). Nevermind, I think
> this doesn't matter.

Good point. I think that it is actually a bug. Most callers ignore
the return code but there is the following user:

static void addrconf_del_dad_work(struct inet6_ifaddr *ifp)
{
	if (cancel_delayed_work(&ifp->dad_work))
		__in6_ifa_put(ifp);
}
static void addrconf_mod_dad_work(struct inet6_ifaddr *ifp,
				   unsigned long delay)
{
	in6_ifa_hold(ifp);
	if (mod_delayed_work(addrconf_wq, &ifp->dad_work, delay))
		in6_ifa_put(ifp);
}

If mod_delayed_work() races with cancel_delayed_work() then both might
return true and call in6_ifa_put(ifp).

I thought that they were serialized by ifp->lock. But, for example,
addrconf_dad_start() calls addrconf_mod_dad_work() after releasing
this lock.

It is possible that they are serialized another way. But I think that
in principle only the one that really cancelled a pending work
should return "true".

Tejun, any opinion?  Feel free to ask for more context.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-27 10:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-20 21:47 + kthread-fix-kthread_mod_delayed_work-vs-kthread_cancel_delayed_work_sync-race.patch added to -mm tree akpm
2021-05-21 16:35 ` Oleg Nesterov
2021-05-24 15:08   ` Petr Mladek
2021-05-26 17:06     ` Oleg Nesterov
2021-05-27 10:15       ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.