linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
       [not found] <00e501d201cf$7bfecd40$73fc67c0$@alibaba-inc.com>
@ 2016-08-29  8:41 ` Hillf Danton
  2016-08-29 13:48   ` Oleg Nesterov
  0 siblings, 1 reply; 11+ messages in thread
From: Hillf Danton @ 2016-08-29  8:41 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel

> > On 08/26, Oleg Nesterov wrote:
> >
> > We do not need anything tricky to avoid the race, we can just call
> > finish_wait() if action() fails. test_and_set_bit() implies mb() so
> > the lockless list_empty_careful() case is fine, we can not miss the
> > condition if we race with unlock_page().
> 
> To simplify the review see the code with the patch applied:
> 
> 
> int __sched
> __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
> 			wait_bit_action_f *action, unsigned mode)
> {
> 	int ret = 0;
> 
> 	for (;;) {
> 		prepare_to_wait_exclusive(wq, &q->wait, mode);
> 		if (test_bit(q->key.bit_nr, q->key.flags)) {
> 			ret = action(&q->key, mode);
> 			/*
> 			 * Ensure that clear_bit() + wake_up() right after
> 			 * test_and_set_bit() below can't see us; it should
> 			 * wake up another exclusive waiter if we fail.
> 			 */
> 			if (ret)
> 				finish_wait(wq, &q->wait);
> 		}
> 		if (!test_and_set_bit(q->key.bit_nr, q->key.flags)) {
> 			if (!ret)
> 				finish_wait(wq, &q->wait);
> 			return 0;
> 		} else if (ret) {
> 			return ret;
> 		}
> 	}
> }
> 
Can we fold two bit operations into one?

thanks
Hillf
--- linux-4.7/kernel/sched/wait.c	Mon Jul 25 03:23:50 2016
+++ b/kernel/sched/wait.c	Mon Aug 29 16:17:01 2016
@@ -425,20 +425,17 @@ int __sched
 __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
 			wait_bit_action_f *action, unsigned mode)
 {
-	do {
-		int ret;
+	int ret = 0;
 
-		prepare_to_wait_exclusive(wq, &q->wait, mode);
-		if (!test_bit(q->key.bit_nr, q->key.flags))
-			continue;
+	prepare_to_wait_exclusive(wq, &q->wait, mode);
+	do {
+		if (!test_and_set_bit(q->key.bit_nr, q->key.flags))
+			break;
 		ret = action(&q->key, mode);
-		if (!ret)
-			continue;
-		abort_exclusive_wait(wq, &q->wait, mode, &q->key);
-		return ret;
-	} while (test_and_set_bit(q->key.bit_nr, q->key.flags));
+	} while (!ret);
+
 	finish_wait(wq, &q->wait);
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL(__wait_on_bit_lock);
 
--

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
  2016-08-29  8:41 ` [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock() Hillf Danton
@ 2016-08-29 13:48   ` Oleg Nesterov
  0 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-08-29 13:48 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-kernel

On 08/29, Hillf Danton wrote:
>
> > > On 08/26, Oleg Nesterov wrote:
> > __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
> > 			wait_bit_action_f *action, unsigned mode)
> > {
> > 	int ret = 0;
> >
> > 	for (;;) {
> > 		prepare_to_wait_exclusive(wq, &q->wait, mode);
> > 		if (test_bit(q->key.bit_nr, q->key.flags)) {
> > 			ret = action(&q->key, mode);
> > 			/*
> > 			 * Ensure that clear_bit() + wake_up() right after
> > 			 * test_and_set_bit() below can't see us; it should
> > 			 * wake up another exclusive waiter if we fail.
> > 			 */
> > 			if (ret)
> > 				finish_wait(wq, &q->wait);
> > 		}
> > 		if (!test_and_set_bit(q->key.bit_nr, q->key.flags)) {
> > 			if (!ret)
> > 				finish_wait(wq, &q->wait);
> > 			return 0;
> > 		} else if (ret) {
> > 			return ret;
> > 		}
> > 	}
> > }
> >
> Can we fold two bit operations into one?

Yes, we can, but I am not sure we want. I guess the first test_bit()
is optimization to avoid the locked test_and_set_bit() which should
likely fail.

>  __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
>  			wait_bit_action_f *action, unsigned mode)
>  {
> -	do {
> -		int ret;
> +	int ret = 0;
>  
> -		prepare_to_wait_exclusive(wq, &q->wait, mode);
> -		if (!test_bit(q->key.bit_nr, q->key.flags))
> -			continue;
> +	prepare_to_wait_exclusive(wq, &q->wait, mode);
> +	do {
> +		if (!test_and_set_bit(q->key.bit_nr, q->key.flags))
> +			break;
>  		ret = action(&q->key, mode);
> -		if (!ret)
> -			continue;
> -		abort_exclusive_wait(wq, &q->wait, mode, &q->key);
> -		return ret;
> -	} while (test_and_set_bit(q->key.bit_nr, q->key.flags));
> +	} while (!ret);
> +

No, no, this is wrong.

We can't simply remove abort_exclusive_wait(), please read the comment
above this function and the comment added by this patch.

And we can't move prepare_to_wait_exclusive() outside of the main loop.

Oleg.

>  	finish_wait(wq, &q->wait);
> -	return 0;
> +	return ret;
>  }
>  EXPORT_SYMBOL(__wait_on_bit_lock);
>  
> --
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
  2016-09-02 12:06       ` Oleg Nesterov
@ 2016-09-02 13:20         ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2016-09-02 13:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Al Viro, Bart Van Assche, Johannes Weiner,
	Linus Torvalds, Neil Brown, linux-kernel

On Fri, Sep 02, 2016 at 02:06:43PM +0200, Oleg Nesterov wrote:

> Yes, I considered this option, but to me the addtional finish_wait()
> looks simpler.

its all relative, this stuff always makes my head hurt one way or the
other ;-)

> And, if you agree with this change I will try to change __wait_event()
> as well and kill abort_exclusive_wait().

Yeah, I think this'll work. Please send a new series with 'enhanced'
changelog so that when we have to look at this again in a few
weeks/months time we won't be cursing at ourselves for how the heck it
was supposed to work.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
  2016-09-01 22:17     ` Peter Zijlstra
@ 2016-09-02 12:06       ` Oleg Nesterov
  2016-09-02 13:20         ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2016-09-02 12:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Al Viro, Bart Van Assche, Johannes Weiner,
	Linus Torvalds, Neil Brown, linux-kernel

On 09/02, Peter Zijlstra wrote:
>
> FWIW, the way the mutex code avoids this issue is by doing the
> signal_pending test while holding the q->lock, that way its exclusive
> with wakeup.

And __wait_event_interruptible_locked() too.

BTW it is buggy anyway, it needs the

	-	__add_wait_queue_tail(&(wq), &__wait);
	+	if (exclusive)
	+		__add_wait_queue_tail(&(wq), &__wait);
	+	else
	+		__add_wait_queue((&(wq), &__wait);

and in fact it should use __add_wait_queue_exclusive() so that we
can remove another "if (exclusive)" but this is off-topic.

Yes, I considered this option, but to me the addtional finish_wait()
looks simpler.

And, if you agree with this change I will try to change __wait_event()
as well and kill abort_exclusive_wait().

And in this case we certainly do not want to check the "condition" with
q->lock held, because this would mean that "condition" won't be able to
take this lock.

Oleg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
  2016-09-01 19:08     ` Peter Zijlstra
@ 2016-09-02 12:06       ` Oleg Nesterov
  0 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-09-02 12:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Al Viro, Bart Van Assche, Johannes Weiner,
	Linus Torvalds, Neil Brown, linux-kernel

On 09/01, Peter Zijlstra wrote:
>
> > 	ret = 0;
> >
> > 	for (;;) {
> > 		prepare_to_wait_exclusive(wq, &q->wait, mode);
> >
> > 		if (test_bit(&q->key.bit_nr, &q->key.flag))
> > 			ret = action(&q->key, mode);
> >
> > 		if (!test_and_set_bit(&q->key.bit_nr, &q->key.flag)) {
> > 			/* we got the lock anyway, ignore the signal */
> > 			ret = 0;
> > 			break;
> > 		}
> >
> > 		if (ret)
> > 			break;
> > 	}
> > 	finish_wait(wq, &q->wait);
> >
> > 	return ret;
> >
> >
> > Would not that work too?
>
> Nope, because we need to do that finish_wait() before
> test_and_set_bit()..

Yes, I meant

	int __sched
	__wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
				wait_bit_action_f *action, unsigned mode)
	{
		int ret = 0;

		for (;;) {
			prepare_to_wait_exclusive(wq, &q->wait, mode);
			if (test_bit(q->key.bit_nr, q->key.flags))
				ret = action(&q->key, mode);

			finish_wait(wq, &q->wait);

			if (!test_and_set_bit(q->key.bit_nr, q->key.flags))
				return 0;
			else if (ret)
				return ret;

		}
	}

> Also the problem with doing finish_wait() unconditionally would be
> destroying the FIFO order. With a bit of bad luck you'd get starvation
> cases :/

OK, I didn't think about that, thanks.

Oleg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
  2016-09-01 19:01   ` Peter Zijlstra
  2016-09-01 19:08     ` Peter Zijlstra
  2016-09-01 22:17     ` Peter Zijlstra
@ 2016-09-02 12:06     ` Oleg Nesterov
  2 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-09-02 12:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Al Viro, Bart Van Assche, Johannes Weiner,
	Linus Torvalds, Neil Brown, linux-kernel

On 09/01, Peter Zijlstra wrote:
>
> On Fri, Aug 26, 2016 at 02:45:52PM +0200, Oleg Nesterov wrote:
>
> > We do not need anything tricky to avoid the race,
>
> The race being:
>
> CPU0			CPU1			CPU2
>
> 			__wait_on_bit_lock()
> 			  bit_wait_io()
> 			    io_schedule()
>
> clear_bit_unlock()
> __wake_up_common(.nr_exclusive=1)
>   list_for_each_entry()
>     if (curr->func() && --nr_exclusive)
>       break
>
> 						signal()
>
> 			    if (signal_pending_state()) == TRUE
> 			      return -EINTR
>
> And no progress because CPU1 exits without acquiring the lock and CPU0
> thinks its done because it woke someone.

Yes,

> > we can just call finish_wait() if action() fails.
>
> That would be bit_wait*() returning -EINTR because sigpending.

Hmm. Not sure I understand... Let me reply just in case, even if
I am sure you get it right.

Yes, in the likely case we are going to fail with -EINTR, but only
if test-and-set after thar fails.

> Sure, you can always call that, first thing through the loop does
> prepare again, so no harm. That however does not connect to your
> condition,.. /me puzzled

If ->action() fails we will abort the loop in any case, prepare
won't be called. So in this case finish_wait() does the right thing.

> > test_and_set_bit() implies mb() so
> > the lockless list_empty_careful() case is fine, we can not miss the
> > condition if we race with unlock_page().
>
> You're talking about this ordering?:
>
> 	finish_wait()			clear_bit_unlock();
> 	  list_empty_careful()
>
> 	/* MB implied */		smp_mb__after_atomic();
> 	test_and_set_bit()		wake_up_page()
> 					  ...
> 					    autoremove_wake_function()
> 					      list_del_init();
>
>
> That could do with spelling out I feel.. :-)

Yes, yes.

> >  __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
> >  			wait_bit_action_f *action, unsigned mode)
> >  {
> > +	int ret = 0;
> >
> > +	for (;;) {
> >  		prepare_to_wait_exclusive(wq, &q->wait, mode);
> > +		if (test_bit(q->key.bit_nr, q->key.flags)) {
> > +			ret = action(&q->key, mode);
> > +			/*
> > +			 * Ensure that clear_bit() + wake_up() right after
> > +			 * test_and_set_bit() below can't see us; it should
> > +			 * wake up another exclusive waiter if we fail.
> > +			 */
> > +			if (ret)
> > +				finish_wait(wq, &q->wait);
> > +		}
> > +		if (!test_and_set_bit(q->key.bit_nr, q->key.flags)) {
>
> So this is the actual difference, instead of failing the lock and
> aborting on signal, we acquire the lock if possible. If its not
> possible, someone else has it, which guarantees that someone else will
> do an unlock which implies another wakeup and life goes on.

Yes. This way we eliminate the need for the additional wake_up.

Oleg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
  2016-09-01 19:01   ` Peter Zijlstra
  2016-09-01 19:08     ` Peter Zijlstra
@ 2016-09-01 22:17     ` Peter Zijlstra
  2016-09-02 12:06       ` Oleg Nesterov
  2016-09-02 12:06     ` Oleg Nesterov
  2 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2016-09-01 22:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Al Viro, Bart Van Assche, Johannes Weiner,
	Linus Torvalds, Neil Brown, linux-kernel

On Thu, Sep 01, 2016 at 09:01:41PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 26, 2016 at 02:45:52PM +0200, Oleg Nesterov wrote:
> 
> > We do not need anything tricky to avoid the race,
> 
> The race being:
> 
> CPU0			CPU1			CPU2
> 			
> 			__wait_on_bit_lock()
> 			  bit_wait_io()
> 			    io_schedule()
> 
> clear_bit_unlock()
> __wake_up_common(.nr_exclusive=1)
>   list_for_each_entry()
>     if (curr->func() && --nr_exclusive)
>       break
> 
> 						signal()
> 
> 			    if (signal_pending_state()) == TRUE
> 			      return -EINTR
> 
> And no progress because CPU1 exits without acquiring the lock and CPU0
> thinks its done because it woke someone.

FWIW, the way the mutex code avoids this issue is by doing the
signal_pending test while holding the q->lock, that way its exclusive
with wakeup.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
  2016-09-01 19:01   ` Peter Zijlstra
@ 2016-09-01 19:08     ` Peter Zijlstra
  2016-09-02 12:06       ` Oleg Nesterov
  2016-09-01 22:17     ` Peter Zijlstra
  2016-09-02 12:06     ` Oleg Nesterov
  2 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2016-09-01 19:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Al Viro, Bart Van Assche, Johannes Weiner,
	Linus Torvalds, Neil Brown, linux-kernel

On Thu, Sep 01, 2016 at 09:01:41PM +0200, Peter Zijlstra wrote:
> > test_and_set_bit() implies mb() so
> > the lockless list_empty_careful() case is fine, we can not miss the
> > condition if we race with unlock_page().
> 
> You're talking about this ordering?:
> 
> 	finish_wait()			clear_bit_unlock();
> 	  list_empty_careful()
> 
> 	/* MB implied */		smp_mb__after_atomic();
> 	test_and_set_bit()		wake_up_page()
> 					  ...
> 					    autoremove_wake_function()
> 					      list_del_init();
> 
> 
> That could do with spelling out I feel.. :-)

This ^^^

> > I am not sure we even want to conditionalize both finish_wait()'s,
> > we could simply call it unconditionally and once before test_and_set(),
> > the spurious wakeup is unlikely case.
> 
> 
> 	ret = 0;
> 
> 	for (;;) {
> 		prepare_to_wait_exclusive(wq, &q->wait, mode);
> 
> 		if (test_bit(&q->key.bit_nr, &q->key.flag))
> 			ret = action(&q->key, mode);
> 
> 		if (!test_and_set_bit(&q->key.bit_nr, &q->key.flag)) {
> 			/* we got the lock anyway, ignore the signal */
> 			ret = 0;
> 			break;
> 		}
> 
> 		if (ret)
> 			break;
> 	}
> 	finish_wait(wq, &q->wait);
> 
> 	return ret;
> 
> 
> Would not that work too?

Nope, because we need to do that finish_wait() before
test_and_set_bit()..

Also the problem with doing finish_wait() unconditionally would be
destroying the FIFO order. With a bit of bad luck you'd get starvation
cases :/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
  2016-08-26 12:45 ` [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock() Oleg Nesterov
  2016-08-26 12:47   ` Oleg Nesterov
@ 2016-09-01 19:01   ` Peter Zijlstra
  2016-09-01 19:08     ` Peter Zijlstra
                       ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Peter Zijlstra @ 2016-09-01 19:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Al Viro, Bart Van Assche, Johannes Weiner,
	Linus Torvalds, Neil Brown, linux-kernel

On Fri, Aug 26, 2016 at 02:45:52PM +0200, Oleg Nesterov wrote:

> We do not need anything tricky to avoid the race,

The race being:

CPU0			CPU1			CPU2
			
			__wait_on_bit_lock()
			  bit_wait_io()
			    io_schedule()

clear_bit_unlock()
__wake_up_common(.nr_exclusive=1)
  list_for_each_entry()
    if (curr->func() && --nr_exclusive)
      break

						signal()

			    if (signal_pending_state()) == TRUE
			      return -EINTR

And no progress because CPU1 exits without acquiring the lock and CPU0
thinks its done because it woke someone.

> we can just call finish_wait() if action() fails.

That would be bit_wait*() returning -EINTR because sigpending.

Sure, you can always call that, first thing through the loop does
prepare again, so no harm. That however does not connect to your
condition,.. /me puzzled

> test_and_set_bit() implies mb() so
> the lockless list_empty_careful() case is fine, we can not miss the
> condition if we race with unlock_page().

You're talking about this ordering?:

	finish_wait()			clear_bit_unlock();
	  list_empty_careful()

	/* MB implied */		smp_mb__after_atomic();
	test_and_set_bit()		wake_up_page()
					  ...
					    autoremove_wake_function()
					      list_del_init();


That could do with spelling out I feel.. :-)

>  __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
>  			wait_bit_action_f *action, unsigned mode)
>  {
> +	int ret = 0;
>  
> +	for (;;) {
>  		prepare_to_wait_exclusive(wq, &q->wait, mode);
> +		if (test_bit(q->key.bit_nr, q->key.flags)) {
> +			ret = action(&q->key, mode);
> +			/*
> +			 * Ensure that clear_bit() + wake_up() right after
> +			 * test_and_set_bit() below can't see us; it should
> +			 * wake up another exclusive waiter if we fail.
> +			 */
> +			if (ret)
> +				finish_wait(wq, &q->wait);
> +		}
> +		if (!test_and_set_bit(q->key.bit_nr, q->key.flags)) {

So this is the actual difference, instead of failing the lock and
aborting on signal, we acquire the lock if possible. If its not
possible, someone else has it, which guarantees that someone else will
do an unlock which implies another wakeup and life goes on.

> +			if (!ret)
> +				finish_wait(wq, &q->wait);
> +			return 0;
> +		} else if (ret) {
> +			return ret;
> +		}
> +	}
>  }

> I am not sure we even want to conditionalize both finish_wait()'s,
> we could simply call it unconditionally and once before test_and_set(),
> the spurious wakeup is unlikely case.


	ret = 0;

	for (;;) {
		prepare_to_wait_exclusive(wq, &q->wait, mode);

		if (test_bit(&q->key.bit_nr, &q->key.flag))
			ret = action(&q->key, mode);

		if (!test_and_set_bit(&q->key.bit_nr, &q->key.flag)) {
			/* we got the lock anyway, ignore the signal */
			ret = 0;
			break;
		}

		if (ret)
			break;
	}
	finish_wait(wq, &q->wait);

	return ret;


Would not that work too?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
  2016-08-26 12:45 ` [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock() Oleg Nesterov
@ 2016-08-26 12:47   ` Oleg Nesterov
  2016-09-01 19:01   ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-08-26 12:47 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Al Viro, Bart Van Assche, Johannes Weiner, Linus Torvalds,
	Neil Brown, linux-kernel

On 08/26, Oleg Nesterov wrote:
>
> We do not need anything tricky to avoid the race, we can just call
> finish_wait() if action() fails. test_and_set_bit() implies mb() so
> the lockless list_empty_careful() case is fine, we can not miss the
> condition if we race with unlock_page().

To simplify the review see the code with the patch applied:


int __sched
__wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
			wait_bit_action_f *action, unsigned mode)
{
	int ret = 0;

	for (;;) {
		prepare_to_wait_exclusive(wq, &q->wait, mode);
		if (test_bit(q->key.bit_nr, q->key.flags)) {
			ret = action(&q->key, mode);
			/*
			 * Ensure that clear_bit() + wake_up() right after
			 * test_and_set_bit() below can't see us; it should
			 * wake up another exclusive waiter if we fail.
			 */
			if (ret)
				finish_wait(wq, &q->wait);
		}
		if (!test_and_set_bit(q->key.bit_nr, q->key.flags)) {
			if (!ret)
				finish_wait(wq, &q->wait);
			return 0;
		} else if (ret) {
			return ret;
		}
	}
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
  2016-08-26 12:44 [PATCH 0/2] sched/wait: abort_exclusive_wait() should pass TASK_NORMAL to wake_up() Oleg Nesterov
@ 2016-08-26 12:45 ` Oleg Nesterov
  2016-08-26 12:47   ` Oleg Nesterov
  2016-09-01 19:01   ` Peter Zijlstra
  0 siblings, 2 replies; 11+ messages in thread
From: Oleg Nesterov @ 2016-08-26 12:45 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Al Viro, Bart Van Assche, Johannes Weiner, Linus Torvalds,
	Neil Brown, linux-kernel

I think it would be nice to kill abort_exclusive_wait(). This patch
changes __wait_on_bit_lock(), but we can do a similar change to remove
it from ___wait_event().

We do not need anything tricky to avoid the race, we can just call
finish_wait() if action() fails. test_and_set_bit() implies mb() so
the lockless list_empty_careful() case is fine, we can not miss the
condition if we race with unlock_page().

I am not sure we even want to conditionalize both finish_wait()'s,
we could simply call it unconditionally and once before test_and_set(),
the spurious wakeup is unlikely case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/sched/wait.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 2bbba01..c10e904 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -423,20 +423,28 @@ int __sched
 __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
 			wait_bit_action_f *action, unsigned mode)
 {
-	do {
-		int ret;
+	int ret = 0;
 
+	for (;;) {
 		prepare_to_wait_exclusive(wq, &q->wait, mode);
-		if (!test_bit(q->key.bit_nr, q->key.flags))
-			continue;
-		ret = action(&q->key, mode);
-		if (!ret)
-			continue;
-		abort_exclusive_wait(wq, &q->wait, &q->key);
-		return ret;
-	} while (test_and_set_bit(q->key.bit_nr, q->key.flags));
-	finish_wait(wq, &q->wait);
-	return 0;
+		if (test_bit(q->key.bit_nr, q->key.flags)) {
+			ret = action(&q->key, mode);
+			/*
+			 * Ensure that clear_bit() + wake_up() right after
+			 * test_and_set_bit() below can't see us; it should
+			 * wake up another exclusive waiter if we fail.
+			 */
+			if (ret)
+				finish_wait(wq, &q->wait);
+		}
+		if (!test_and_set_bit(q->key.bit_nr, q->key.flags)) {
+			if (!ret)
+				finish_wait(wq, &q->wait);
+			return 0;
+		} else if (ret) {
+			return ret;
+		}
+	}
 }
 EXPORT_SYMBOL(__wait_on_bit_lock);
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-09-02 13:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <00e501d201cf$7bfecd40$73fc67c0$@alibaba-inc.com>
2016-08-29  8:41 ` [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock() Hillf Danton
2016-08-29 13:48   ` Oleg Nesterov
2016-08-26 12:44 [PATCH 0/2] sched/wait: abort_exclusive_wait() should pass TASK_NORMAL to wake_up() Oleg Nesterov
2016-08-26 12:45 ` [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock() Oleg Nesterov
2016-08-26 12:47   ` Oleg Nesterov
2016-09-01 19:01   ` Peter Zijlstra
2016-09-01 19:08     ` Peter Zijlstra
2016-09-02 12:06       ` Oleg Nesterov
2016-09-01 22:17     ` Peter Zijlstra
2016-09-02 12:06       ` Oleg Nesterov
2016-09-02 13:20         ` Peter Zijlstra
2016-09-02 12:06     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).