linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	Long Gao <gaolong@kylinos.com.cn>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH] sched: fix the theoretical signal_wake_up() vs schedule() race
Date: Tue, 13 Aug 2013 16:33:25 +0200	[thread overview]
Message-ID: <20130813143325.GA5541@redhat.com> (raw)
In-Reply-To: <20130813075550.GS27162@twins.programming.kicks-ass.net>

On 08/13, Peter Zijlstra wrote:
>
> On Mon, Aug 12, 2013 at 07:02:57PM +0200, Oleg Nesterov wrote:
> > +/*
> > + * Despite its name it doesn't necessarily has to be a full barrier.
> > + * It should only guarantee that a STORE before the critical section
> > + * can not be reordered with a LOAD inside this section.
> > + * So the default implementation simply ensures that a STORE can not
> > + * move into the critical section, smp_wmb() should serialize it with
> > + * another STORE done by spin_lock().
> > + */
> > +#ifndef smp_mb__before_spinlock
> > +#define smp_mb__before_spinlock()	smp_wmb()
> >  #endif
>
> I would have expected mention of the ACQUIRE of the lock keeping the
> LOAD inside the locked section.

OK, please see v2 below.

---
>From 8de96d3feae3b4b51669902b7c24ac1748ecdbfe Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Mon, 12 Aug 2013 18:14:00 +0200
Subject: sched: fix the theoretical signal_wake_up() vs schedule() race

This is only theoretical, but after try_to_wake_up(p) was changed
to check p->state under p->pi_lock the code like

	__set_current_state(TASK_INTERRUPTIBLE);
	schedule();

can miss a signal. This is the special case of wait-for-condition,
it relies on try_to_wake_up/schedule interaction and thus it does
not need mb() between __set_current_state() and if(signal_pending).

However, this __set_current_state() can move into the critical
section protected by rq->lock, now that try_to_wake_up() takes
another lock we need to ensure that it can't be reordered with
"if (signal_pending(current))" check inside that section.

The patch is actually one-liner, it simply adds smp_wmb() before
spin_lock_irq(rq->lock). This is what try_to_wake_up() already
does by the same reason.

We turn this wmb() into the new helper, smp_mb__before_spinlock(),
for better documentation and to allow the architectures to change
the default implementation.

While at it, kill smp_mb__after_lock(), it has no callers.

Perhaps we can also add smp_mb__before/after_spinunlock() for
prepare_to_wait().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/include/asm/spinlock.h |    4 ----
 include/linux/spinlock.h        |   14 +++++++++++---
 kernel/sched/core.c             |   14 +++++++++++++-
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 33692ea..e3ddd7d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -233,8 +233,4 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
 #define arch_read_relax(lock)	cpu_relax()
 #define arch_write_relax(lock)	cpu_relax()
 
-/* The {read|write|spin}_lock() on x86 are full memory barriers. */
-static inline void smp_mb__after_lock(void) { }
-#define ARCH_HAS_SMP_MB_AFTER_LOCK
-
 #endif /* _ASM_X86_SPINLOCK_H */
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 7d537ce..75f3494 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -117,9 +117,17 @@ do {								\
 #endif /*arch_spin_is_contended*/
 #endif
 
-/* The lock does not imply full memory barrier. */
-#ifndef ARCH_HAS_SMP_MB_AFTER_LOCK
-static inline void smp_mb__after_lock(void) { smp_mb(); }
+/*
+ * Despite its name it doesn't necessarily has to be a full barrier.
+ * It should only guarantee that a STORE before the critical section
+ * can not be reordered with a LOAD inside this section.
+ * spin_lock() is the one-way barrier, this LOAD can not escape out
+ * of the region. So the default implementation simply ensures that
+ * a STORE can not move into the critical section, smp_wmb() should
+ * serialize it with another STORE done by spin_lock().
+ */
+#ifndef smp_mb__before_spinlock
+#define smp_mb__before_spinlock()	smp_wmb()
 #endif
 
 /**
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6df0fbe..97dac0e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1491,7 +1491,13 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	unsigned long flags;
 	int cpu, success = 0;
 
-	smp_wmb();
+	/*
+	 * If we are going to wake up a thread waiting for CONDITION we
+	 * need to ensure that CONDITION=1 done by the caller can not be
+	 * reordered with p->state check below. This pairs with mb() in
+	 * set_current_state() the waiting thread does.
+	 */
+	smp_mb__before_spinlock();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	if (!(p->state & state))
 		goto out;
@@ -2394,6 +2400,12 @@ need_resched:
 	if (sched_feat(HRTICK))
 		hrtick_clear(rq);
 
+	/*
+	 * Make sure that signal_pending_state()->signal_pending() below
+	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
+	 * done by the caller to avoid the race with signal_wake_up().
+	 */
+	smp_mb__before_spinlock();
 	raw_spin_lock_irq(&rq->lock);
 
 	switch_count = &prev->nivcsw;
-- 
1.5.5.1



  reply	other threads:[~2013-08-13 14:39 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <tencent_26310211398C21034BD3B2F9@qq.com>
2013-08-08 18:19 ` Patch for lost wakeups Linus Torvalds
2013-08-08 19:17   ` Oleg Nesterov
2013-08-08 19:51     ` Linus Torvalds
2013-08-09 13:04       ` Oleg Nesterov
2013-08-09 18:21         ` Linus Torvalds
2013-08-11 17:25           ` Oleg Nesterov
2013-08-11 17:27             ` Oleg Nesterov
     [not found]           ` <tencent_293B72F26D71A4191C7C999A@qq.com>
2013-08-11 17:39             ` Oleg Nesterov
2013-08-11 23:52               ` James Bottomley
2013-08-12 17:02           ` [PATCH] sched: fix the theoretical signal_wake_up() vs schedule() race Oleg Nesterov
2013-08-13  7:55             ` Peter Zijlstra
2013-08-13 14:33               ` Oleg Nesterov [this message]
2013-08-16 18:46                 ` [tip:sched/core] sched: Fix the theoretical signal_wake_up() vs. " tip-bot for Oleg Nesterov
2013-08-17 15:05                   ` Oleg Nesterov
2013-08-19  7:13                     ` Ingo Molnar
2013-08-09 15:18     ` [PATCH 0/1] dlm: kill the unnecessary and wrong device_close()->recalc_sigpending() Oleg Nesterov
2013-08-09 15:19       ` [PATCH 1/1] " Oleg Nesterov
2013-08-12 20:26         ` David Teigland
2013-08-09 13:28   ` Patch for lost wakeups Oleg Nesterov
2013-08-09 15:31   ` block_all_signals() must die (Was: Patch for lost wakeups) Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130813143325.GA5541@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=gaolong@kylinos.com.cn \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).