* [PATCH v2 0/4] sched/wait: fix and then kill abort_exclusive_wait()
@ 2016-09-06 14:00 Oleg Nesterov
2016-09-06 14:00 ` [PATCH V2 1/4] sched/wait: abort_exclusive_wait() should pass TASK_NORMAL to wake_up() Oleg Nesterov
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Oleg Nesterov @ 2016-09-06 14:00 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: Al Viro, Bart Van Assche, Johannes Weiner, Linus Torvalds,
Neil Brown, linux-kernel
On 09/02, Peter Zijlstra wrote:
>
> On Fri, Sep 02, 2016 at 02:06:43PM +0200, Oleg Nesterov wrote:
>
> > And, if you agree with this change I will try to change __wait_event()
> > as well and kill abort_exclusive_wait().
>
> Yeah, I think this'll work. Please send a new series with 'enhanced'
> changelog so that when we have to look at this again in a few
> weeks/months time we won't be cursing at ourselves for how the heck it
> was supposed to work.
OK, thanks, please see V2.
I think it makes sense to change ___wait_event() first, this simplifies the
documentation in __wait_on_bit_lock().
See also 4/4, it is new and simple. From the changelog
In particular we are ready to remove the signal_pending_state()
checks from wait_bit_action_f helpers and change __wait_on_bit_lock()
to use prepare_to_wait_event().
Yes. The necessary change is trivial. But this probably needs a separate
discussion/testing because we need to re-investigate the problem which was
somehow fixed by 68985633bccb60 "sched/wait: Fix signal handling in bit wait
helpers".
Oleg.
include/linux/wait.h | 17 ++-------
kernel/sched/wait.c | 101 ++++++++++++++++++++++++---------------------------
2 files changed, 52 insertions(+), 66 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH V2 1/4] sched/wait: abort_exclusive_wait() should pass TASK_NORMAL to wake_up()
2016-09-06 14:00 [PATCH v2 0/4] sched/wait: fix and then kill abort_exclusive_wait() Oleg Nesterov
@ 2016-09-06 14:00 ` Oleg Nesterov
2016-09-30 11:56 ` [tip:sched/core] sched/wait: Fix abort_exclusive_wait(), it " tip-bot for Oleg Nesterov
2016-09-06 14:00 ` [PATCH V2 2/4] sched/wait: avoid abort_exclusive_wait() in ___wait_event() Oleg Nesterov
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2016-09-06 14:00 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: Al Viro, Bart Van Assche, Johannes Weiner, Linus Torvalds,
Neil Brown, linux-kernel
Otherwise this logic only works if mode is "compatible" with another
exclusive waiter.
If some wq has both TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE waiters,
abort_exclusive_wait() won't wait an uninterruptible waiter.
The main user is __wait_on_bit_lock() and currently it is fine but only
because TASK_KILLABLE includes TASK_UNINTERRUPTIBLE and we do not have
lock_page_interruptible() yet.
Just use TASK_NORMAL and remove the "mode" arg from abort_exclusive_wait().
Yes, this means that (say) wake_up_interruptible() can wake up the non-
interruptible waiter(s), but I think this is fine. And in fact I think
that abort_exclusive_wait() must die, see the next change.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/linux/wait.h | 6 +++---
kernel/sched/wait.c | 8 +++-----
2 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 27d7a0a..329f796 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -281,8 +281,8 @@ wait_queue_head_t *bit_waitqueue(void *, int);
if (___wait_is_interruptible(state) && __int) { \
__ret = __int; \
if (exclusive) { \
- abort_exclusive_wait(&wq, &__wait, \
- state, NULL); \
+ abort_exclusive_wait(&wq, &__wait, \
+ NULL); \
goto __out; \
} \
break; \
@@ -976,7 +976,7 @@ void prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state);
void prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state);
long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state);
void finish_wait(wait_queue_head_t *q, wait_queue_t *wait);
-void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, unsigned int mode, void *key);
+void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, void *key);
long wait_woken(wait_queue_t *wait, unsigned mode, long timeout);
int woken_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index f15d6b6..2bbba01 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -259,7 +259,6 @@ EXPORT_SYMBOL(finish_wait);
* abort_exclusive_wait - abort exclusive waiting in a queue
* @q: waitqueue waited on
* @wait: wait descriptor
- * @mode: runstate of the waiter to be woken
* @key: key to identify a wait bit queue or %NULL
*
* Sets current thread back to running state and removes
@@ -273,8 +272,7 @@ EXPORT_SYMBOL(finish_wait);
* aborts and is woken up concurrently and no one wakes up
* the next waiter.
*/
-void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait,
- unsigned int mode, void *key)
+void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, void *key)
{
unsigned long flags;
@@ -283,7 +281,7 @@ void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait,
if (!list_empty(&wait->task_list))
list_del_init(&wait->task_list);
else if (waitqueue_active(q))
- __wake_up_locked_key(q, mode, key);
+ __wake_up_locked_key(q, TASK_NORMAL, key);
spin_unlock_irqrestore(&q->lock, flags);
}
EXPORT_SYMBOL(abort_exclusive_wait);
@@ -434,7 +432,7 @@ __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
ret = action(&q->key, mode);
if (!ret)
continue;
- abort_exclusive_wait(wq, &q->wait, mode, &q->key);
+ abort_exclusive_wait(wq, &q->wait, &q->key);
return ret;
} while (test_and_set_bit(q->key.bit_nr, q->key.flags));
finish_wait(wq, &q->wait);
--
2.5.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH V2 2/4] sched/wait: avoid abort_exclusive_wait() in ___wait_event()
2016-09-06 14:00 [PATCH v2 0/4] sched/wait: fix and then kill abort_exclusive_wait() Oleg Nesterov
2016-09-06 14:00 ` [PATCH V2 1/4] sched/wait: abort_exclusive_wait() should pass TASK_NORMAL to wake_up() Oleg Nesterov
@ 2016-09-06 14:00 ` Oleg Nesterov
2016-09-08 16:48 ` [PATCH V3 " Oleg Nesterov
2016-09-06 14:00 ` [PATCH V2 3/4] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock() Oleg Nesterov
2016-09-06 14:00 ` [PATCH V2 4/4] sched/wait: introduce init_wait_entry() Oleg Nesterov
3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2016-09-06 14:00 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: Al Viro, Bart Van Assche, Johannes Weiner, Linus Torvalds,
Neil Brown, linux-kernel
___wait_event() doesn't really need abort_exclusive_wait(), we can simply
change prepare_to_wait_event() to remove the waiter from q->task_list if
it was interrupted.
This simplifies the code/logic, and this way prepare_to_wait_event() can
have more users, see the next change.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/linux/wait.h | 7 +------
kernel/sched/wait.c | 23 ++++++++++++++++++-----
2 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 329f796..5179915 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -280,12 +280,7 @@ wait_queue_head_t *bit_waitqueue(void *, int);
\
if (___wait_is_interruptible(state) && __int) { \
__ret = __int; \
- if (exclusive) { \
- abort_exclusive_wait(&wq, &__wait, \
- NULL); \
- goto __out; \
- } \
- break; \
+ goto __out; \
} \
\
cmd; \
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 2bbba01..4af0dc8 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -199,15 +199,28 @@ EXPORT_SYMBOL(prepare_to_wait_exclusive);
long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state)
{
unsigned long flags;
-
- if (signal_pending_state(state, current))
- return -ERESTARTSYS;
+ long ret = 0;
wait->private = current;
wait->func = autoremove_wake_function;
spin_lock_irqsave(&q->lock, flags);
- if (list_empty(&wait->task_list)) {
+ if (unlikely(signal_pending_state(state, current))) {
+ /*
+ * Exclusive waiter must not fail if it was selected by wakeup,
+ * it should "consume" the condition we were waiting for.
+ *
+ * The caller will recheck the condition and return success if
+ * we were already woken up, we can not miss the event because
+ * wakeup locks/unlocks the same q->lock.
+ *
+ * But we need to ensure that set-condition + wakeup after that
+ * can't see us, it should wake up another exclusive waiter if
+ * we fail.
+ */
+ list_del_init(&wait->task_list);
+ ret = -ERESTARTSYS;
+ } else if (list_empty(&wait->task_list)) {
if (wait->flags & WQ_FLAG_EXCLUSIVE)
__add_wait_queue_tail(q, wait);
else
@@ -216,7 +229,7 @@ long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state)
set_current_state(state);
spin_unlock_irqrestore(&q->lock, flags);
- return 0;
+ return ret;
}
EXPORT_SYMBOL(prepare_to_wait_event);
--
2.5.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH V2 3/4] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
2016-09-06 14:00 [PATCH v2 0/4] sched/wait: fix and then kill abort_exclusive_wait() Oleg Nesterov
2016-09-06 14:00 ` [PATCH V2 1/4] sched/wait: abort_exclusive_wait() should pass TASK_NORMAL to wake_up() Oleg Nesterov
2016-09-06 14:00 ` [PATCH V2 2/4] sched/wait: avoid abort_exclusive_wait() in ___wait_event() Oleg Nesterov
@ 2016-09-06 14:00 ` Oleg Nesterov
2016-09-30 11:57 ` [tip:sched/core] sched/wait: Avoid " tip-bot for Oleg Nesterov
2016-09-06 14:00 ` [PATCH V2 4/4] sched/wait: introduce init_wait_entry() Oleg Nesterov
3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2016-09-06 14:00 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: Al Viro, Bart Van Assche, Johannes Weiner, Linus Torvalds,
Neil Brown, linux-kernel
__wait_on_bit_lock() doesn't need abort_exclusive_wait() too. Right
now it can't use prepare_to_wait_event() (see the next change), but
it can do the additional finish_wait() if action() fails.
abort_exclusive_wait() no longer has callers, remove it.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/linux/wait.h | 1 -
kernel/sched/wait.c | 64 +++++++++++++++++-----------------------------------
2 files changed, 21 insertions(+), 44 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 5179915..52953a5 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -971,7 +971,6 @@ void prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state);
void prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state);
long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state);
void finish_wait(wait_queue_head_t *q, wait_queue_t *wait);
-void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, void *key);
long wait_woken(wait_queue_t *wait, unsigned mode, long timeout);
int woken_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 4af0dc8..47c4d62 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -268,37 +268,6 @@ void finish_wait(wait_queue_head_t *q, wait_queue_t *wait)
}
EXPORT_SYMBOL(finish_wait);
-/**
- * abort_exclusive_wait - abort exclusive waiting in a queue
- * @q: waitqueue waited on
- * @wait: wait descriptor
- * @key: key to identify a wait bit queue or %NULL
- *
- * Sets current thread back to running state and removes
- * the wait descriptor from the given waitqueue if still
- * queued.
- *
- * Wakes up the next waiter if the caller is concurrently
- * woken up through the queue.
- *
- * This prevents waiter starvation where an exclusive waiter
- * aborts and is woken up concurrently and no one wakes up
- * the next waiter.
- */
-void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, void *key)
-{
- unsigned long flags;
-
- __set_current_state(TASK_RUNNING);
- spin_lock_irqsave(&q->lock, flags);
- if (!list_empty(&wait->task_list))
- list_del_init(&wait->task_list);
- else if (waitqueue_active(q))
- __wake_up_locked_key(q, TASK_NORMAL, key);
- spin_unlock_irqrestore(&q->lock, flags);
-}
-EXPORT_SYMBOL(abort_exclusive_wait);
-
int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key)
{
int ret = default_wake_function(wait, mode, sync, key);
@@ -436,20 +405,29 @@ int __sched
__wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
wait_bit_action_f *action, unsigned mode)
{
- do {
- int ret;
+ int ret = 0;
+ for (;;) {
prepare_to_wait_exclusive(wq, &q->wait, mode);
- if (!test_bit(q->key.bit_nr, q->key.flags))
- continue;
- ret = action(&q->key, mode);
- if (!ret)
- continue;
- abort_exclusive_wait(wq, &q->wait, &q->key);
- return ret;
- } while (test_and_set_bit(q->key.bit_nr, q->key.flags));
- finish_wait(wq, &q->wait);
- return 0;
+ if (test_bit(q->key.bit_nr, q->key.flags)) {
+ ret = action(&q->key, mode);
+ /*
+ * See the comment in prepare_to_wait_event().
+ * finish_wait() does not necessarily takes wq->lock,
+ * but test_and_set_bit() implies mb() which pairs with
+ * smp_mb__after_atomic() before wake_up_page().
+ */
+ if (ret)
+ finish_wait(wq, &q->wait);
+ }
+ if (!test_and_set_bit(q->key.bit_nr, q->key.flags)) {
+ if (!ret)
+ finish_wait(wq, &q->wait);
+ return 0;
+ } else if (ret) {
+ return ret;
+ }
+ }
}
EXPORT_SYMBOL(__wait_on_bit_lock);
--
2.5.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH V2 4/4] sched/wait: introduce init_wait_entry()
2016-09-06 14:00 [PATCH v2 0/4] sched/wait: fix and then kill abort_exclusive_wait() Oleg Nesterov
` (2 preceding siblings ...)
2016-09-06 14:00 ` [PATCH V2 3/4] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock() Oleg Nesterov
@ 2016-09-06 14:00 ` Oleg Nesterov
2016-09-30 11:57 ` [tip:sched/core] sched/wait: Introduce init_wait_entry() tip-bot for Oleg Nesterov
3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2016-09-06 14:00 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: Al Viro, Bart Van Assche, Johannes Weiner, Linus Torvalds,
Neil Brown, linux-kernel
The partial initialization of wait_queue_t in prepare_to_wait_event() looks
ugly. This was done to shrink .text, but we can simply add the new helper
which does the full initialization and shrink the compiled code a bit more.
And. This way prepare_to_wait_event() can have more users. In particular we
are ready to remove the signal_pending_state() checks from wait_bit_action_f
helpers and change __wait_on_bit_lock() to use prepare_to_wait_event().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/linux/wait.h | 9 +++------
kernel/sched/wait.c | 12 +++++++++---
2 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 52953a5..1b8a930 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -248,6 +248,8 @@ wait_queue_head_t *bit_waitqueue(void *, int);
(!__builtin_constant_p(state) || \
state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE) \
+extern void init_wait_entry(wait_queue_t *__wait, int flags);
+
/*
* The below macro ___wait_event() has an explicit shadow of the __ret
* variable when used from the wait_event_*() macros.
@@ -266,12 +268,7 @@ wait_queue_head_t *bit_waitqueue(void *, int);
wait_queue_t __wait; \
long __ret = ret; /* explicit shadow */ \
\
- INIT_LIST_HEAD(&__wait.task_list); \
- if (exclusive) \
- __wait.flags = WQ_FLAG_EXCLUSIVE; \
- else \
- __wait.flags = 0; \
- \
+ init_wait_entry(&__wait, exclusive ? WQ_FLAG_EXCLUSIVE : 0); \
for (;;) { \
long __int = prepare_to_wait_event(&wq, &__wait, state);\
\
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 47c4d62..61dcee1 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -196,14 +196,20 @@ prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
}
EXPORT_SYMBOL(prepare_to_wait_exclusive);
+void init_wait_entry(wait_queue_t *wait, int flags)
+{
+ wait->flags = flags;
+ wait->private = current;
+ wait->func = autoremove_wake_function;
+ INIT_LIST_HEAD(&wait->task_list);
+}
+EXPORT_SYMBOL(init_wait_entry);
+
long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state)
{
unsigned long flags;
long ret = 0;
- wait->private = current;
- wait->func = autoremove_wake_function;
-
spin_lock_irqsave(&q->lock, flags);
if (unlikely(signal_pending_state(state, current))) {
/*
--
2.5.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH V3 2/4] sched/wait: avoid abort_exclusive_wait() in ___wait_event()
2016-09-06 14:00 ` [PATCH V2 2/4] sched/wait: avoid abort_exclusive_wait() in ___wait_event() Oleg Nesterov
@ 2016-09-08 16:48 ` Oleg Nesterov
2016-09-30 11:56 ` [tip:sched/core] sched/wait: Avoid " tip-bot for Oleg Nesterov
0 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2016-09-08 16:48 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: Al Viro, Bart Van Assche, Johannes Weiner, Linus Torvalds,
Neil Brown, linux-kernel
On 09/06, Oleg Nesterov wrote:
>
> + if (unlikely(signal_pending_state(state, current))) {
> + /*
> + * Exclusive waiter must not fail if it was selected by wakeup,
> + * it should "consume" the condition we were waiting for.
> + *
> + * The caller will recheck the condition and return success if
> + * we were already woken up, we can not miss the event because
> + * wakeup locks/unlocks the same q->lock.
> + *
> + * But we need to ensure that set-condition + wakeup after that
> + * can't see us, it should wake up another exclusive waiter if
> + * we fail.
> + */
> + list_del_init(&wait->task_list);
> + ret = -ERESTARTSYS;
Yes, but we should not do set_current_state() in this case, please sere V3.
-------------------------------------------------------------------------------
Subject: [PATCH V3 2/4] sched/wait: avoid abort_exclusive_wait() in ___wait_event()
___wait_event() doesn't really need abort_exclusive_wait(), we can simply
change prepare_to_wait_event() to remove the waiter from q->task_list if
it was interrupted.
This simplifies the code/logic, and this way prepare_to_wait_event() can
have more users, see the next change.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/linux/wait.h | 7 +------
kernel/sched/wait.c | 35 +++++++++++++++++++++++++----------
2 files changed, 26 insertions(+), 16 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 329f796..5179915 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -280,12 +280,7 @@ wait_queue_head_t *bit_waitqueue(void *, int);
\
if (___wait_is_interruptible(state) && __int) { \
__ret = __int; \
- if (exclusive) { \
- abort_exclusive_wait(&wq, &__wait, \
- NULL); \
- goto __out; \
- } \
- break; \
+ goto __out; \
} \
\
cmd; \
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 2bbba01..2612393 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -199,24 +199,39 @@ EXPORT_SYMBOL(prepare_to_wait_exclusive);
long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state)
{
unsigned long flags;
-
- if (signal_pending_state(state, current))
- return -ERESTARTSYS;
+ long ret = 0;
wait->private = current;
wait->func = autoremove_wake_function;
spin_lock_irqsave(&q->lock, flags);
- if (list_empty(&wait->task_list)) {
- if (wait->flags & WQ_FLAG_EXCLUSIVE)
- __add_wait_queue_tail(q, wait);
- else
- __add_wait_queue(q, wait);
+ if (unlikely(signal_pending_state(state, current))) {
+ /*
+ * Exclusive waiter must not fail if it was selected by wakeup,
+ * it should "consume" the condition we were waiting for.
+ *
+ * The caller will recheck the condition and return success if
+ * we were already woken up, we can not miss the event because
+ * wakeup locks/unlocks the same q->lock.
+ *
+ * But we need to ensure that set-condition + wakeup after that
+ * can't see us, it should wake up another exclusive waiter if
+ * we fail.
+ */
+ list_del_init(&wait->task_list);
+ ret = -ERESTARTSYS;
+ } else {
+ if (list_empty(&wait->task_list)) {
+ if (wait->flags & WQ_FLAG_EXCLUSIVE)
+ __add_wait_queue_tail(q, wait);
+ else
+ __add_wait_queue(q, wait);
+ }
+ set_current_state(state);
}
- set_current_state(state);
spin_unlock_irqrestore(&q->lock, flags);
- return 0;
+ return ret;
}
EXPORT_SYMBOL(prepare_to_wait_event);
--
2.5.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [tip:sched/core] sched/wait: Fix abort_exclusive_wait(), it should pass TASK_NORMAL to wake_up()
2016-09-06 14:00 ` [PATCH V2 1/4] sched/wait: abort_exclusive_wait() should pass TASK_NORMAL to wake_up() Oleg Nesterov
@ 2016-09-30 11:56 ` tip-bot for Oleg Nesterov
0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Oleg Nesterov @ 2016-09-30 11:56 UTC (permalink / raw)
To: linux-tip-commits
Cc: bvanassche, mingo, hpa, linux-kernel, neilb, efault, torvalds,
viro, oleg, hannes, peterz, tglx
Commit-ID: 38a3e1fc1dac480f3672ab22fc97e1f995c80ed7
Gitweb: http://git.kernel.org/tip/38a3e1fc1dac480f3672ab22fc97e1f995c80ed7
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Tue, 6 Sep 2016 16:00:47 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 30 Sep 2016 10:53:19 +0200
sched/wait: Fix abort_exclusive_wait(), it should pass TASK_NORMAL to wake_up()
Otherwise this logic only works if mode is "compatible" with another
exclusive waiter.
If some wq has both TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE waiters,
abort_exclusive_wait() won't wait an uninterruptible waiter.
The main user is __wait_on_bit_lock() and currently it is fine but only
because TASK_KILLABLE includes TASK_UNINTERRUPTIBLE and we do not have
lock_page_interruptible() yet.
Just use TASK_NORMAL and remove the "mode" arg from abort_exclusive_wait().
Yes, this means that (say) wake_up_interruptible() can wake up the non-
interruptible waiter(s), but I think this is fine. And in fact I think
that abort_exclusive_wait() must die, see the next change.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160906140047.GA6157@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
include/linux/wait.h | 6 +++---
kernel/sched/wait.c | 8 +++-----
2 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index c3ff74d..e4cfd1e 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -281,8 +281,8 @@ wait_queue_head_t *bit_waitqueue(void *, int);
if (___wait_is_interruptible(state) && __int) { \
__ret = __int; \
if (exclusive) { \
- abort_exclusive_wait(&wq, &__wait, \
- state, NULL); \
+ abort_exclusive_wait(&wq, &__wait, \
+ NULL); \
goto __out; \
} \
break; \
@@ -989,7 +989,7 @@ void prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state);
void prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state);
long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state);
void finish_wait(wait_queue_head_t *q, wait_queue_t *wait);
-void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, unsigned int mode, void *key);
+void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, void *key);
long wait_woken(wait_queue_t *wait, unsigned mode, long timeout);
int woken_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index f15d6b6..2bbba01 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -259,7 +259,6 @@ EXPORT_SYMBOL(finish_wait);
* abort_exclusive_wait - abort exclusive waiting in a queue
* @q: waitqueue waited on
* @wait: wait descriptor
- * @mode: runstate of the waiter to be woken
* @key: key to identify a wait bit queue or %NULL
*
* Sets current thread back to running state and removes
@@ -273,8 +272,7 @@ EXPORT_SYMBOL(finish_wait);
* aborts and is woken up concurrently and no one wakes up
* the next waiter.
*/
-void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait,
- unsigned int mode, void *key)
+void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, void *key)
{
unsigned long flags;
@@ -283,7 +281,7 @@ void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait,
if (!list_empty(&wait->task_list))
list_del_init(&wait->task_list);
else if (waitqueue_active(q))
- __wake_up_locked_key(q, mode, key);
+ __wake_up_locked_key(q, TASK_NORMAL, key);
spin_unlock_irqrestore(&q->lock, flags);
}
EXPORT_SYMBOL(abort_exclusive_wait);
@@ -434,7 +432,7 @@ __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
ret = action(&q->key, mode);
if (!ret)
continue;
- abort_exclusive_wait(wq, &q->wait, mode, &q->key);
+ abort_exclusive_wait(wq, &q->wait, &q->key);
return ret;
} while (test_and_set_bit(q->key.bit_nr, q->key.flags));
finish_wait(wq, &q->wait);
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [tip:sched/core] sched/wait: Avoid abort_exclusive_wait() in ___wait_event()
2016-09-08 16:48 ` [PATCH V3 " Oleg Nesterov
@ 2016-09-30 11:56 ` tip-bot for Oleg Nesterov
0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Oleg Nesterov @ 2016-09-30 11:56 UTC (permalink / raw)
To: linux-tip-commits
Cc: neilb, viro, linux-kernel, hannes, torvalds, efault, bvanassche,
hpa, peterz, tglx, mingo, oleg
Commit-ID: b1ea06a90f528e516929a4da1d9b8838752bceb9
Gitweb: http://git.kernel.org/tip/b1ea06a90f528e516929a4da1d9b8838752bceb9
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Thu, 8 Sep 2016 18:48:15 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 30 Sep 2016 10:53:44 +0200
sched/wait: Avoid abort_exclusive_wait() in ___wait_event()
___wait_event() doesn't really need abort_exclusive_wait(), we can simply
change prepare_to_wait_event() to remove the waiter from q->task_list if
it was interrupted.
This simplifies the code/logic, and this way prepare_to_wait_event() can
have more users, see the next change.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160908164815.GA18801@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
--
include/linux/wait.h | 7 +------
kernel/sched/wait.c | 35 +++++++++++++++++++++++++----------
2 files changed, 26 insertions(+), 16 deletions(-)
---
include/linux/wait.h | 7 +------
kernel/sched/wait.c | 35 +++++++++++++++++++++++++----------
2 files changed, 26 insertions(+), 16 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index e4cfd1e..7261dcb 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -280,12 +280,7 @@ wait_queue_head_t *bit_waitqueue(void *, int);
\
if (___wait_is_interruptible(state) && __int) { \
__ret = __int; \
- if (exclusive) { \
- abort_exclusive_wait(&wq, &__wait, \
- NULL); \
- goto __out; \
- } \
- break; \
+ goto __out; \
} \
\
cmd; \
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 2bbba01..2612393 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -199,24 +199,39 @@ EXPORT_SYMBOL(prepare_to_wait_exclusive);
long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state)
{
unsigned long flags;
-
- if (signal_pending_state(state, current))
- return -ERESTARTSYS;
+ long ret = 0;
wait->private = current;
wait->func = autoremove_wake_function;
spin_lock_irqsave(&q->lock, flags);
- if (list_empty(&wait->task_list)) {
- if (wait->flags & WQ_FLAG_EXCLUSIVE)
- __add_wait_queue_tail(q, wait);
- else
- __add_wait_queue(q, wait);
+ if (unlikely(signal_pending_state(state, current))) {
+ /*
+ * Exclusive waiter must not fail if it was selected by wakeup,
+ * it should "consume" the condition we were waiting for.
+ *
+ * The caller will recheck the condition and return success if
+ * we were already woken up, we can not miss the event because
+ * wakeup locks/unlocks the same q->lock.
+ *
+ * But we need to ensure that set-condition + wakeup after that
+ * can't see us, it should wake up another exclusive waiter if
+ * we fail.
+ */
+ list_del_init(&wait->task_list);
+ ret = -ERESTARTSYS;
+ } else {
+ if (list_empty(&wait->task_list)) {
+ if (wait->flags & WQ_FLAG_EXCLUSIVE)
+ __add_wait_queue_tail(q, wait);
+ else
+ __add_wait_queue(q, wait);
+ }
+ set_current_state(state);
}
- set_current_state(state);
spin_unlock_irqrestore(&q->lock, flags);
- return 0;
+ return ret;
}
EXPORT_SYMBOL(prepare_to_wait_event);
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [tip:sched/core] sched/wait: Avoid abort_exclusive_wait() in __wait_on_bit_lock()
2016-09-06 14:00 ` [PATCH V2 3/4] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock() Oleg Nesterov
@ 2016-09-30 11:57 ` tip-bot for Oleg Nesterov
0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Oleg Nesterov @ 2016-09-30 11:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: neilb, mingo, torvalds, hannes, oleg, bvanassche, viro,
linux-kernel, efault, tglx, peterz, hpa
Commit-ID: eaf9ef52241b545fe63621266bfc6fd8b06559ff
Gitweb: http://git.kernel.org/tip/eaf9ef52241b545fe63621266bfc6fd8b06559ff
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Tue, 6 Sep 2016 16:00:53 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 30 Sep 2016 10:54:03 +0200
sched/wait: Avoid abort_exclusive_wait() in __wait_on_bit_lock()
__wait_on_bit_lock() doesn't need abort_exclusive_wait() too. Right
now it can't use prepare_to_wait_event() (see the next change), but
it can do the additional finish_wait() if action() fails.
abort_exclusive_wait() no longer has callers, remove it.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160906140053.GA6164@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
include/linux/wait.h | 1 -
kernel/sched/wait.c | 64 +++++++++++++++++-----------------------------------
2 files changed, 21 insertions(+), 44 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 7261dcb..19c75f9 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -984,7 +984,6 @@ void prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state);
void prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state);
long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state);
void finish_wait(wait_queue_head_t *q, wait_queue_t *wait);
-void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, void *key);
long wait_woken(wait_queue_t *wait, unsigned mode, long timeout);
int woken_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 2612393..0cb615d 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -270,37 +270,6 @@ void finish_wait(wait_queue_head_t *q, wait_queue_t *wait)
}
EXPORT_SYMBOL(finish_wait);
-/**
- * abort_exclusive_wait - abort exclusive waiting in a queue
- * @q: waitqueue waited on
- * @wait: wait descriptor
- * @key: key to identify a wait bit queue or %NULL
- *
- * Sets current thread back to running state and removes
- * the wait descriptor from the given waitqueue if still
- * queued.
- *
- * Wakes up the next waiter if the caller is concurrently
- * woken up through the queue.
- *
- * This prevents waiter starvation where an exclusive waiter
- * aborts and is woken up concurrently and no one wakes up
- * the next waiter.
- */
-void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, void *key)
-{
- unsigned long flags;
-
- __set_current_state(TASK_RUNNING);
- spin_lock_irqsave(&q->lock, flags);
- if (!list_empty(&wait->task_list))
- list_del_init(&wait->task_list);
- else if (waitqueue_active(q))
- __wake_up_locked_key(q, TASK_NORMAL, key);
- spin_unlock_irqrestore(&q->lock, flags);
-}
-EXPORT_SYMBOL(abort_exclusive_wait);
-
int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key)
{
int ret = default_wake_function(wait, mode, sync, key);
@@ -438,20 +407,29 @@ int __sched
__wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
wait_bit_action_f *action, unsigned mode)
{
- do {
- int ret;
+ int ret = 0;
+ for (;;) {
prepare_to_wait_exclusive(wq, &q->wait, mode);
- if (!test_bit(q->key.bit_nr, q->key.flags))
- continue;
- ret = action(&q->key, mode);
- if (!ret)
- continue;
- abort_exclusive_wait(wq, &q->wait, &q->key);
- return ret;
- } while (test_and_set_bit(q->key.bit_nr, q->key.flags));
- finish_wait(wq, &q->wait);
- return 0;
+ if (test_bit(q->key.bit_nr, q->key.flags)) {
+ ret = action(&q->key, mode);
+ /*
+ * See the comment in prepare_to_wait_event().
+ * finish_wait() does not necessarily takes wq->lock,
+ * but test_and_set_bit() implies mb() which pairs with
+ * smp_mb__after_atomic() before wake_up_page().
+ */
+ if (ret)
+ finish_wait(wq, &q->wait);
+ }
+ if (!test_and_set_bit(q->key.bit_nr, q->key.flags)) {
+ if (!ret)
+ finish_wait(wq, &q->wait);
+ return 0;
+ } else if (ret) {
+ return ret;
+ }
+ }
}
EXPORT_SYMBOL(__wait_on_bit_lock);
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [tip:sched/core] sched/wait: Introduce init_wait_entry()
2016-09-06 14:00 ` [PATCH V2 4/4] sched/wait: introduce init_wait_entry() Oleg Nesterov
@ 2016-09-30 11:57 ` tip-bot for Oleg Nesterov
0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Oleg Nesterov @ 2016-09-30 11:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hannes, mingo, viro, efault, tglx, torvalds, neilb,
bvanassche, peterz, oleg, hpa
Commit-ID: 0176beaffbe9ed627b6a4dfa61d640f1a848086f
Gitweb: http://git.kernel.org/tip/0176beaffbe9ed627b6a4dfa61d640f1a848086f
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Tue, 6 Sep 2016 16:00:55 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 30 Sep 2016 10:54:03 +0200
sched/wait: Introduce init_wait_entry()
The partial initialization of wait_queue_t in prepare_to_wait_event() looks
ugly. This was done to shrink .text, but we can simply add the new helper
which does the full initialization and shrink the compiled code a bit more.
And. This way prepare_to_wait_event() can have more users. In particular we
are ready to remove the signal_pending_state() checks from wait_bit_action_f
helpers and change __wait_on_bit_lock() to use prepare_to_wait_event().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160906140055.GA6167@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
include/linux/wait.h | 9 +++------
kernel/sched/wait.c | 12 +++++++++---
2 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 19c75f9..2408e8d5 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -248,6 +248,8 @@ wait_queue_head_t *bit_waitqueue(void *, int);
(!__builtin_constant_p(state) || \
state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE) \
+extern void init_wait_entry(wait_queue_t *__wait, int flags);
+
/*
* The below macro ___wait_event() has an explicit shadow of the __ret
* variable when used from the wait_event_*() macros.
@@ -266,12 +268,7 @@ wait_queue_head_t *bit_waitqueue(void *, int);
wait_queue_t __wait; \
long __ret = ret; /* explicit shadow */ \
\
- INIT_LIST_HEAD(&__wait.task_list); \
- if (exclusive) \
- __wait.flags = WQ_FLAG_EXCLUSIVE; \
- else \
- __wait.flags = 0; \
- \
+ init_wait_entry(&__wait, exclusive ? WQ_FLAG_EXCLUSIVE : 0); \
for (;;) { \
long __int = prepare_to_wait_event(&wq, &__wait, state);\
\
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 0cb615d..4f70535 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -196,14 +196,20 @@ prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
}
EXPORT_SYMBOL(prepare_to_wait_exclusive);
+void init_wait_entry(wait_queue_t *wait, int flags)
+{
+ wait->flags = flags;
+ wait->private = current;
+ wait->func = autoremove_wake_function;
+ INIT_LIST_HEAD(&wait->task_list);
+}
+EXPORT_SYMBOL(init_wait_entry);
+
long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state)
{
unsigned long flags;
long ret = 0;
- wait->private = current;
- wait->func = autoremove_wake_function;
-
spin_lock_irqsave(&q->lock, flags);
if (unlikely(signal_pending_state(state, current))) {
/*
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-09-30 11:58 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-06 14:00 [PATCH v2 0/4] sched/wait: fix and then kill abort_exclusive_wait() Oleg Nesterov
2016-09-06 14:00 ` [PATCH V2 1/4] sched/wait: abort_exclusive_wait() should pass TASK_NORMAL to wake_up() Oleg Nesterov
2016-09-30 11:56 ` [tip:sched/core] sched/wait: Fix abort_exclusive_wait(), it " tip-bot for Oleg Nesterov
2016-09-06 14:00 ` [PATCH V2 2/4] sched/wait: avoid abort_exclusive_wait() in ___wait_event() Oleg Nesterov
2016-09-08 16:48 ` [PATCH V3 " Oleg Nesterov
2016-09-30 11:56 ` [tip:sched/core] sched/wait: Avoid " tip-bot for Oleg Nesterov
2016-09-06 14:00 ` [PATCH V2 3/4] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock() Oleg Nesterov
2016-09-30 11:57 ` [tip:sched/core] sched/wait: Avoid " tip-bot for Oleg Nesterov
2016-09-06 14:00 ` [PATCH V2 4/4] sched/wait: introduce init_wait_entry() Oleg Nesterov
2016-09-30 11:57 ` [tip:sched/core] sched/wait: Introduce init_wait_entry() tip-bot for Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).