From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
Long Gao <gaolong@kylinos.com.cn>,
Al Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH] sched: fix the theoretical signal_wake_up() vs schedule() race
Date: Tue, 13 Aug 2013 16:33:25 +0200 [thread overview]
Message-ID: <20130813143325.GA5541@redhat.com> (raw)
In-Reply-To: <20130813075550.GS27162@twins.programming.kicks-ass.net>
On 08/13, Peter Zijlstra wrote:
>
> On Mon, Aug 12, 2013 at 07:02:57PM +0200, Oleg Nesterov wrote:
> > +/*
> > + * Despite its name it doesn't necessarily has to be a full barrier.
> > + * It should only guarantee that a STORE before the critical section
> > + * can not be reordered with a LOAD inside this section.
> > + * So the default implementation simply ensures that a STORE can not
> > + * move into the critical section, smp_wmb() should serialize it with
> > + * another STORE done by spin_lock().
> > + */
> > +#ifndef smp_mb__before_spinlock
> > +#define smp_mb__before_spinlock() smp_wmb()
> > #endif
>
> I would have expected mention of the ACQUIRE of the lock keeping the
> LOAD inside the locked section.
OK, please see v2 below.
---
>From 8de96d3feae3b4b51669902b7c24ac1748ecdbfe Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Mon, 12 Aug 2013 18:14:00 +0200
Subject: sched: fix the theoretical signal_wake_up() vs schedule() race
This is only theoretical, but after try_to_wake_up(p) was changed
to check p->state under p->pi_lock the code like
__set_current_state(TASK_INTERRUPTIBLE);
schedule();
can miss a signal. This is the special case of wait-for-condition,
it relies on try_to_wake_up/schedule interaction and thus it does
not need mb() between __set_current_state() and if(signal_pending).
However, this __set_current_state() can move into the critical
section protected by rq->lock, now that try_to_wake_up() takes
another lock we need to ensure that it can't be reordered with
"if (signal_pending(current))" check inside that section.
The patch is actually one-liner, it simply adds smp_wmb() before
spin_lock_irq(rq->lock). This is what try_to_wake_up() already
does by the same reason.
We turn this wmb() into the new helper, smp_mb__before_spinlock(),
for better documentation and to allow the architectures to change
the default implementation.
While at it, kill smp_mb__after_lock(), it has no callers.
Perhaps we can also add smp_mb__before/after_spinunlock() for
prepare_to_wait().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
arch/x86/include/asm/spinlock.h | 4 ----
include/linux/spinlock.h | 14 +++++++++++---
kernel/sched/core.c | 14 +++++++++++++-
3 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 33692ea..e3ddd7d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -233,8 +233,4 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
#define arch_read_relax(lock) cpu_relax()
#define arch_write_relax(lock) cpu_relax()
-/* The {read|write|spin}_lock() on x86 are full memory barriers. */
-static inline void smp_mb__after_lock(void) { }
-#define ARCH_HAS_SMP_MB_AFTER_LOCK
-
#endif /* _ASM_X86_SPINLOCK_H */
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 7d537ce..75f3494 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -117,9 +117,17 @@ do { \
#endif /*arch_spin_is_contended*/
#endif
-/* The lock does not imply full memory barrier. */
-#ifndef ARCH_HAS_SMP_MB_AFTER_LOCK
-static inline void smp_mb__after_lock(void) { smp_mb(); }
+/*
+ * Despite its name it doesn't necessarily has to be a full barrier.
+ * It should only guarantee that a STORE before the critical section
+ * can not be reordered with a LOAD inside this section.
+ * spin_lock() is the one-way barrier, this LOAD can not escape out
+ * of the region. So the default implementation simply ensures that
+ * a STORE can not move into the critical section, smp_wmb() should
+ * serialize it with another STORE done by spin_lock().
+ */
+#ifndef smp_mb__before_spinlock
+#define smp_mb__before_spinlock() smp_wmb()
#endif
/**
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6df0fbe..97dac0e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1491,7 +1491,13 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
unsigned long flags;
int cpu, success = 0;
- smp_wmb();
+ /*
+ * If we are going to wake up a thread waiting for CONDITION we
+ * need to ensure that CONDITION=1 done by the caller can not be
+ * reordered with p->state check below. This pairs with mb() in
+ * set_current_state() the waiting thread does.
+ */
+ smp_mb__before_spinlock();
raw_spin_lock_irqsave(&p->pi_lock, flags);
if (!(p->state & state))
goto out;
@@ -2394,6 +2400,12 @@ need_resched:
if (sched_feat(HRTICK))
hrtick_clear(rq);
+ /*
+ * Make sure that signal_pending_state()->signal_pending() below
+ * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
+ * done by the caller to avoid the race with signal_wake_up().
+ */
+ smp_mb__before_spinlock();
raw_spin_lock_irq(&rq->lock);
switch_count = &prev->nivcsw;
--
1.5.5.1
next prev parent reply other threads:[~2013-08-13 14:39 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <tencent_26310211398C21034BD3B2F9@qq.com>
2013-08-08 18:19 ` Patch for lost wakeups Linus Torvalds
2013-08-08 19:17 ` Oleg Nesterov
2013-08-08 19:51 ` Linus Torvalds
2013-08-09 13:04 ` Oleg Nesterov
2013-08-09 18:21 ` Linus Torvalds
2013-08-11 17:25 ` Oleg Nesterov
2013-08-11 17:27 ` Oleg Nesterov
[not found] ` <tencent_293B72F26D71A4191C7C999A@qq.com>
2013-08-11 17:39 ` Oleg Nesterov
2013-08-11 23:52 ` James Bottomley
2013-08-12 17:02 ` [PATCH] sched: fix the theoretical signal_wake_up() vs schedule() race Oleg Nesterov
2013-08-13 7:55 ` Peter Zijlstra
2013-08-13 14:33 ` Oleg Nesterov [this message]
2013-08-16 18:46 ` [tip:sched/core] sched: Fix the theoretical signal_wake_up() vs. " tip-bot for Oleg Nesterov
2013-08-17 15:05 ` Oleg Nesterov
2013-08-19 7:13 ` Ingo Molnar
2013-08-09 15:18 ` [PATCH 0/1] dlm: kill the unnecessary and wrong device_close()->recalc_sigpending() Oleg Nesterov
2013-08-09 15:19 ` [PATCH 1/1] " Oleg Nesterov
2013-08-12 20:26 ` David Teigland
2013-08-09 13:28 ` Patch for lost wakeups Oleg Nesterov
2013-08-09 15:31 ` block_all_signals() must die (Was: Patch for lost wakeups) Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130813143325.GA5541@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=gaolong@kylinos.com.cn \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).