From: tip-bot for Oleg Nesterov <tipbot@zytor.com>
To: linux-tip-commits@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@kernel.org,
gaolong@kylinos.com.cn, torvalds@linux-foundation.org,
peterz@infradead.org, paulmck@linux.vnet.ibm.com,
akpm@linux-foundation.org, tglx@linutronix.de, oleg@redhat.com
Subject: [tip:sched/core] sched: Fix the theoretical signal_wake_up() vs. schedule() race
Date: Fri, 16 Aug 2013 11:46:46 -0700 [thread overview]
Message-ID: <tip-e4c711cfd32f3c74a699af3121a3b0ff631328a7@git.kernel.org> (raw)
In-Reply-To: <20130813143325.GA5541@redhat.com>
Commit-ID: e4c711cfd32f3c74a699af3121a3b0ff631328a7
Gitweb: http://git.kernel.org/tip/e4c711cfd32f3c74a699af3121a3b0ff631328a7
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Tue, 13 Aug 2013 16:33:25 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 16 Aug 2013 17:44:54 +0200
sched: Fix the theoretical signal_wake_up() vs. schedule() race
This is only theoretical, but after try_to_wake_up(p) was changed
to check p->state under p->pi_lock the code like
__set_current_state(TASK_INTERRUPTIBLE);
schedule();
can miss a signal. This is the special case of wait-for-condition,
it relies on try_to_wake_up/schedule interaction and thus it does
not need mb() between __set_current_state() and if(signal_pending).
However, this __set_current_state() can move into the critical
section protected by rq->lock, now that try_to_wake_up() takes
another lock we need to ensure that it can't be reordered with
"if (signal_pending(current))" check inside that section.
The patch is actually one-liner, it simply adds smp_wmb() before
spin_lock_irq(rq->lock). This is what try_to_wake_up() already
does by the same reason.
We turn this wmb() into the new helper, smp_mb__before_spinlock(),
for better documentation and to allow the architectures to change
the default implementation.
While at it, kill smp_mb__after_lock(), it has no callers.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Long Gao <gaolong@kylinos.com.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130813143325.GA5541@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/spinlock.h | 4 ----
include/linux/spinlock.h | 14 +++++++++++---
kernel/sched/core.c | 14 +++++++++++++-
3 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 33692ea..e3ddd7d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -233,8 +233,4 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
#define arch_read_relax(lock) cpu_relax()
#define arch_write_relax(lock) cpu_relax()
-/* The {read|write|spin}_lock() on x86 are full memory barriers. */
-static inline void smp_mb__after_lock(void) { }
-#define ARCH_HAS_SMP_MB_AFTER_LOCK
-
#endif /* _ASM_X86_SPINLOCK_H */
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 7d537ce..75f3494 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -117,9 +117,17 @@ do { \
#endif /*arch_spin_is_contended*/
#endif
-/* The lock does not imply full memory barrier. */
-#ifndef ARCH_HAS_SMP_MB_AFTER_LOCK
-static inline void smp_mb__after_lock(void) { smp_mb(); }
+/*
+ * Despite its name it doesn't necessarily has to be a full barrier.
+ * It should only guarantee that a STORE before the critical section
+ * can not be reordered with a LOAD inside this section.
+ * spin_lock() is the one-way barrier, this LOAD can not escape out
+ * of the region. So the default implementation simply ensures that
+ * a STORE can not move into the critical section, smp_wmb() should
+ * serialize it with another STORE done by spin_lock().
+ */
+#ifndef smp_mb__before_spinlock
+#define smp_mb__before_spinlock() smp_wmb()
#endif
/**
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cf8f100..d60b325 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1491,7 +1491,13 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
unsigned long flags;
int cpu, success = 0;
- smp_wmb();
+ /*
+ * If we are going to wake up a thread waiting for CONDITION we
+ * need to ensure that CONDITION=1 done by the caller can not be
+ * reordered with p->state check below. This pairs with mb() in
+ * set_current_state() the waiting thread does.
+ */
+ smp_mb__before_spinlock();
raw_spin_lock_irqsave(&p->pi_lock, flags);
if (!(p->state & state))
goto out;
@@ -2394,6 +2400,12 @@ need_resched:
if (sched_feat(HRTICK))
hrtick_clear(rq);
+ /*
+ * Make sure that signal_pending_state()->signal_pending() below
+ * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
+ * done by the caller to avoid the race with signal_wake_up().
+ */
+ smp_mb__before_spinlock();
raw_spin_lock_irq(&rq->lock);
switch_count = &prev->nivcsw;
next prev parent reply other threads:[~2013-08-16 18:47 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <tencent_26310211398C21034BD3B2F9@qq.com>
2013-08-08 18:19 ` Patch for lost wakeups Linus Torvalds
2013-08-08 19:17 ` Oleg Nesterov
2013-08-08 19:51 ` Linus Torvalds
2013-08-09 13:04 ` Oleg Nesterov
2013-08-09 18:21 ` Linus Torvalds
2013-08-11 17:25 ` Oleg Nesterov
2013-08-11 17:27 ` Oleg Nesterov
[not found] ` <tencent_293B72F26D71A4191C7C999A@qq.com>
2013-08-11 17:39 ` Oleg Nesterov
2013-08-11 23:52 ` James Bottomley
2013-08-12 17:02 ` [PATCH] sched: fix the theoretical signal_wake_up() vs schedule() race Oleg Nesterov
2013-08-13 7:55 ` Peter Zijlstra
2013-08-13 14:33 ` Oleg Nesterov
2013-08-16 18:46 ` tip-bot for Oleg Nesterov [this message]
2013-08-17 15:05 ` [tip:sched/core] sched: Fix the theoretical signal_wake_up() vs. " Oleg Nesterov
2013-08-19 7:13 ` Ingo Molnar
2013-08-09 15:18 ` [PATCH 0/1] dlm: kill the unnecessary and wrong device_close()->recalc_sigpending() Oleg Nesterov
2013-08-09 15:19 ` [PATCH 1/1] " Oleg Nesterov
2013-08-12 20:26 ` David Teigland
2013-08-09 13:28 ` Patch for lost wakeups Oleg Nesterov
2013-08-09 15:31 ` block_all_signals() must die (Was: Patch for lost wakeups) Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tip-e4c711cfd32f3c74a699af3121a3b0ff631328a7@git.kernel.org \
--to=tipbot@zytor.com \
--cc=akpm@linux-foundation.org \
--cc=gaolong@kylinos.com.cn \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).