linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] RFC: sched: Prevent wakeup to enter critical section needlessly
@ 2012-09-24 13:06 Ivo Sieben
  2012-10-09 11:30 ` [REPOST] " Ivo Sieben
  0 siblings, 1 reply; 16+ messages in thread
From: Ivo Sieben @ 2012-09-24 13:06 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar, Peter Zijlstra, linux-serial, RT,
	Alan Cox, Greg KH
  Cc: Ivo Sieben

Check the waitqueue task list to be non empty before entering the critical
section. This prevents locking the spin lock needlessly in case the queue
was empty, and therefor also prevent scheduling overhead on a PREEMPT_RT
system.

Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com>
---

 Request for comments:
 - Does this make any sense?
 - I assume that I can safely use the list_empty_careful() function here, but is
   that correct?

 Background to this patch:
 Testing on a PREEMPT_RT system with TTY serial communication. Each time the TTY
 line discipline is dereferenced the Idle handling wait queue is woken up (see
 function put_ldisc in /drivers/tty/tty_ldisc.c)
 However line discipline idle handling is not used very often so the wait queue
 is empty most of the time. But still the wake_up() function enters the critical
 section guarded by spin locks. This causes additional scheduling overhead when
 a lower priority thread has control of that same lock.

 kernel/sched/core.c |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 649c9f8..6436eb8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3631,9 +3631,19 @@ void __wake_up(wait_queue_head_t *q, unsigned int mode,
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&q->lock, flags);
-	__wake_up_common(q, mode, nr_exclusive, 0, key);
-	spin_unlock_irqrestore(&q->lock, flags);
+	/*
+	 * We can check for list emptiness outside the lock by using the
+	 * "careful" check that verifies both the next and prev pointers, so
+	 * that there cannot be any half-pending updates in progress.
+	 *
+	 * This prevents the wake up to enter the critical section needlessly
+	 * when the task list is empty.
+	 */
+	if (!list_empty_careful(&q->task_list)) {
+		spin_lock_irqsave(&q->lock, flags);
+		__wake_up_common(q, mode, nr_exclusive, 0, key);
+		spin_unlock_irqrestore(&q->lock, flags);
+	}
 }
 EXPORT_SYMBOL(__wake_up);
 
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [REPOST] RFC: sched: Prevent wakeup to enter critical section needlessly
  2012-09-24 13:06 [PATCH] RFC: sched: Prevent wakeup to enter critical section needlessly Ivo Sieben
@ 2012-10-09 11:30 ` Ivo Sieben
  2012-10-09 13:37   ` Andi Kleen
  0 siblings, 1 reply; 16+ messages in thread
From: Ivo Sieben @ 2012-10-09 11:30 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar, Peter Zijlstra
  Cc: linux-serial, Alan Cox, Greg KH, Ivo Sieben

Check the waitqueue task list to be non empty before entering the critical
section. This prevents locking the spin lock needlessly in case the queue
was empty, and therefor also prevent scheduling overhead on a PREEMPT_RT
system.

Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com>
---

 REPOST:
 Request for comments:
 - Does this make any sense?
 - I assume that I can safely use the list_empty_careful() function here, but is
   that correct?

 Background to this patch:
 Testing on a PREEMPT_RT system with TTY serial communication. Each time the TTY
 line discipline is dereferenced the Idle handling wait queue is woken up (see
 function put_ldisc in /drivers/tty/tty_ldisc.c)
 However line discipline idle handling is not used very often so the wait queue
 is empty most of the time. But still the wake_up() function enters the critical
 section guarded by spin locks. This causes additional scheduling overhead when
 a lower priority thread has control of that same lock.

 kernel/sched/core.c |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c177472..c1667c4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3090,9 +3090,19 @@ void __wake_up(wait_queue_head_t *q, unsigned int mode,
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&q->lock, flags);
-	__wake_up_common(q, mode, nr_exclusive, 0, key);
-	spin_unlock_irqrestore(&q->lock, flags);
+	/*
+	 * We can check for list emptiness outside the lock by using the
+	 * "careful" check that verifies both the next and prev pointers, so
+	 * that there cannot be any half-pending updates in progress.
+	 *
+	 * This prevents the wake up to enter the critical section needlessly
+	 * when the task list is empty.
+	 */
+	if (!list_empty_careful(&q->task_list)) {
+		spin_lock_irqsave(&q->lock, flags);
+		__wake_up_common(q, mode, nr_exclusive, 0, key);
+		spin_unlock_irqrestore(&q->lock, flags);
+	}
 }
 EXPORT_SYMBOL(__wake_up);
 
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [REPOST] RFC: sched: Prevent wakeup to enter critical section needlessly
  2012-10-09 11:30 ` [REPOST] " Ivo Sieben
@ 2012-10-09 13:37   ` Andi Kleen
  2012-10-09 14:15     ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2012-10-09 13:37 UTC (permalink / raw)
  To: Ivo Sieben
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, linux-serial,
	Alan Cox, Greg KH

Ivo Sieben <meltedpianoman@gmail.com> writes:

> Check the waitqueue task list to be non empty before entering the critical
> section. This prevents locking the spin lock needlessly in case the queue
> was empty, and therefor also prevent scheduling overhead on a PREEMPT_RT
> system.
>
> Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com>
> ---
>
>  REPOST:
>  Request for comments:
>  - Does this make any sense?

Looks good to me. Avoiding any spinlock is good.  I don't even think you
need "careful", if you miss an update it was just as it happened a
little later.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REPOST] RFC: sched: Prevent wakeup to enter critical section needlessly
  2012-10-09 13:37   ` Andi Kleen
@ 2012-10-09 14:15     ` Peter Zijlstra
  2012-10-09 15:17       ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2012-10-09 14:15 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ivo Sieben, linux-kernel, Ingo Molnar, linux-serial, Alan Cox,
	Greg KH, Oleg Nesterov

On Tue, 2012-10-09 at 06:37 -0700, Andi Kleen wrote:
> Ivo Sieben <meltedpianoman@gmail.com> writes:
> 
> > Check the waitqueue task list to be non empty before entering the critical
> > section. This prevents locking the spin lock needlessly in case the queue
> > was empty, and therefor also prevent scheduling overhead on a PREEMPT_RT
> > system.
> >
> > Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com>
> > ---
> >
> >  REPOST:
> >  Request for comments:
> >  - Does this make any sense?
> 
> Looks good to me. Avoiding any spinlock is good.  I don't even think you
> need "careful", if you miss an update it was just as it happened a
> little later.

Yeah, so I've started looking at this several times, but never had the
time to actually think it through. I think I'll agree with you that
using the careful list empty check isn't needed.

In general there's already the race of doing a wakeup before the wait
and if the code using the waitqueue doesn't deal with that its already
broken, so in that respect you should be good, since you're simply
shifting the timing a little.

One thing you might need to consider is the memory ordering, will the
list_empty -- either careful or not -- observe the right list pointer,
or could it -- when racing with wait_event()/prepare_to_wait() --
observe a stale value. Or.. is that all already covered in on the use
site.

I started auditing a few users to see what they all assume, if they're
already broken and or if they would now be broken.. but like said, I
didn't get anywhere.


I'd like to see this patch/Changelog amended to at least cover these
points so that I feel all warm and fuzzy when I read it and not have to
think too hard ;-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REPOST] RFC: sched: Prevent wakeup to enter critical section needlessly
  2012-10-09 14:15     ` Peter Zijlstra
@ 2012-10-09 15:17       ` Oleg Nesterov
  2012-10-10 14:02         ` Andi Kleen
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2012-10-09 15:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Ivo Sieben, linux-kernel, Ingo Molnar, linux-serial,
	Alan Cox, Greg KH

On 10/09, Peter Zijlstra wrote:
>
> One thing you might need to consider is the memory ordering, will the
> list_empty -- either careful or not -- observe the right list pointer,
> or could it -- when racing with wait_event()/prepare_to_wait() --
> observe a stale value. Or.. is that all already covered in on the use
> site.

I agree.

Without spin_lock(q->lock) (or some other barriers) wait_event-like
code can miss an event.

wait_event:

	prepare_to_wait(wq)	// takes wq->lock

	if (!CONDITION)
		schedule();

Now,

	CONDITION = 1;
	wake_up(wq);

at least need the full mb() before lits_empty().

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REPOST] RFC: sched: Prevent wakeup to enter critical section needlessly
  2012-10-09 15:17       ` Oleg Nesterov
@ 2012-10-10 14:02         ` Andi Kleen
  2012-10-18  8:30           ` [PATCH-v2] " Ivo Sieben
  0 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2012-10-10 14:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Andi Kleen, Ivo Sieben, linux-kernel,
	Ingo Molnar, linux-serial, Alan Cox, Greg KH

> wait_event:
> 
> 	prepare_to_wait(wq)	// takes wq->lock
> 
> 	if (!CONDITION)
> 		schedule();
> 
> Now,
> 
> 	CONDITION = 1;
> 	wake_up(wq);
> 
> at least need the full mb() before lits_empty().

You're right, but it would probably only matter for inlining with LTO
(if the LTO compiler ever decides to do that)

Without that a call should be always enough barrier in practice.

So yes I would add the mb, but most likely it will not make much
difference. Just make sure to comment it.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH-v2] sched: Prevent wakeup to enter critical section needlessly
  2012-10-10 14:02         ` Andi Kleen
@ 2012-10-18  8:30           ` Ivo Sieben
  2012-10-25 10:12             ` [REPOST-v2] " Ivo Sieben
  0 siblings, 1 reply; 16+ messages in thread
From: Ivo Sieben @ 2012-10-18  8:30 UTC (permalink / raw)
  To: linux-kernel, Andi Kleen, Oleg Nesterov, Peter Zijlstra, Ingo Molnar
  Cc: linux-serial, Alan Cox, Greg KH, Ivo Sieben

Check the waitqueue task list to be non empty before entering the critical
section. This prevents locking the spin lock needlessly in case the queue
was empty, and therefor also prevent scheduling overhead on a PREEMPT_RT
system.

Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com>
---

 v2:
 - We don't need the "careful" list empty, a normal list empty is sufficient:
   if you miss an update it was just as it happened a little later.
 - Because of memory ordering problems we can observe an unupdated list
   administration. This can cause an wait_event-like code to miss an event.
   Adding a memory barrier befor checking the list to be empty will guarantee we
   evaluate a 100% updated list adminsitration.

 kernel/sched/core.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..168a9b2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3090,9 +3090,22 @@ void __wake_up(wait_queue_head_t *q, unsigned int mode,
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&q->lock, flags);
-	__wake_up_common(q, mode, nr_exclusive, 0, key);
-	spin_unlock_irqrestore(&q->lock, flags);
+	/*
+	 * We check for list emptiness outside the lock. This prevents the wake
+	 * up to enter the critical section needlessly when the task list is
+	 * empty.
+	 *
+	 * Placed a full memory barrier before checking list emptiness to make
+	 * 100% sure this function sees an up-to-date list administration.
+	 * Note that other code that manipulates the list uses a spin_lock and
+	 * therefore doesn't need additional memory barriers.
+	 */
+	smp_mb();
+	if (!list_empty(&q->task_list)) {
+		spin_lock_irqsave(&q->lock, flags);
+		__wake_up_common(q, mode, nr_exclusive, 0, key);
+		spin_unlock_irqrestore(&q->lock, flags);
+	}
 }
 EXPORT_SYMBOL(__wake_up);
 
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [REPOST-v2] sched: Prevent wakeup to enter critical section needlessly
  2012-10-18  8:30           ` [PATCH-v2] " Ivo Sieben
@ 2012-10-25 10:12             ` Ivo Sieben
  2012-11-19  7:30               ` Ivo Sieben
  0 siblings, 1 reply; 16+ messages in thread
From: Ivo Sieben @ 2012-10-25 10:12 UTC (permalink / raw)
  To: linux-kernel, Andi Kleen, Oleg Nesterov, Peter Zijlstra, Ingo Molnar
  Cc: linux-serial, Alan Cox, Greg KH, Ivo Sieben

Check the waitqueue task list to be non empty before entering the critical
section. This prevents locking the spin lock needlessly in case the queue
was empty, and therefor also prevent scheduling overhead on a PREEMPT_RT
system.

Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com>
---

 repost:
 Did I apply the memory barrier correct?

 v2:
 - We don't need the "careful" list empty, a normal list empty is sufficient:
   if you miss an update it was just as it happened a little later.
 - Because of memory ordering problems we can observe an unupdated list
   administration. This can cause an wait_event-like code to miss an event.
   Adding a memory barrier befor checking the list to be empty will guarantee we
   evaluate a 100% updated list adminsitration.

 kernel/sched/core.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..168a9b2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3090,9 +3090,22 @@ void __wake_up(wait_queue_head_t *q, unsigned int mode,
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&q->lock, flags);
-	__wake_up_common(q, mode, nr_exclusive, 0, key);
-	spin_unlock_irqrestore(&q->lock, flags);
+	/*
+	 * We check for list emptiness outside the lock. This prevents the wake
+	 * up to enter the critical section needlessly when the task list is
+	 * empty.
+	 *
+	 * Placed a full memory barrier before checking list emptiness to make
+	 * 100% sure this function sees an up-to-date list administration.
+	 * Note that other code that manipulates the list uses a spin_lock and
+	 * therefore doesn't need additional memory barriers.
+	 */
+	smp_mb();
+	if (!list_empty(&q->task_list)) {
+		spin_lock_irqsave(&q->lock, flags);
+		__wake_up_common(q, mode, nr_exclusive, 0, key);
+		spin_unlock_irqrestore(&q->lock, flags);
+	}
 }
 EXPORT_SYMBOL(__wake_up);
 
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [REPOST-v2] sched: Prevent wakeup to enter critical section needlessly
  2012-10-25 10:12             ` [REPOST-v2] " Ivo Sieben
@ 2012-11-19  7:30               ` Ivo Sieben
  2012-11-19 10:20                 ` Preeti U Murthy
  2012-11-19 15:10                 ` Oleg Nesterov
  0 siblings, 2 replies; 16+ messages in thread
From: Ivo Sieben @ 2012-11-19  7:30 UTC (permalink / raw)
  To: linux-kernel, Andi Kleen, Oleg Nesterov, Peter Zijlstra, Ingo Molnar
  Cc: linux-serial, Alan Cox, Greg KH, Ivo Sieben

Check the waitqueue task list to be non empty before entering the critical
section. This prevents locking the spin lock needlessly in case the queue
was empty, and therefor also prevent scheduling overhead on a PREEMPT_RT
system.

Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com>
---

 a second repost of this patch v2: Can anyone respond?
 Did I apply the memory barrier correct?

 v2:
 - We don't need the "careful" list empty, a normal list empty is sufficient:
   if you miss an update it was just as it happened a little later.
 - Because of memory ordering problems we can observe an unupdated list
   administration. This can cause an wait_event-like code to miss an event.
   Adding a memory barrier befor checking the list to be empty will guarantee we
   evaluate a 100% updated list adminsitration.

 kernel/sched/core.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..168a9b2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3090,9 +3090,22 @@ void __wake_up(wait_queue_head_t *q, unsigned int mode,
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&q->lock, flags);
-	__wake_up_common(q, mode, nr_exclusive, 0, key);
-	spin_unlock_irqrestore(&q->lock, flags);
+	/*
+	 * We check for list emptiness outside the lock. This prevents the wake
+	 * up to enter the critical section needlessly when the task list is
+	 * empty.
+	 *
+	 * Placed a full memory barrier before checking list emptiness to make
+	 * 100% sure this function sees an up-to-date list administration.
+	 * Note that other code that manipulates the list uses a spin_lock and
+	 * therefore doesn't need additional memory barriers.
+	 */
+	smp_mb();
+	if (!list_empty(&q->task_list)) {
+		spin_lock_irqsave(&q->lock, flags);
+		__wake_up_common(q, mode, nr_exclusive, 0, key);
+		spin_unlock_irqrestore(&q->lock, flags);
+	}
 }
 EXPORT_SYMBOL(__wake_up);
 
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [REPOST-v2] sched: Prevent wakeup to enter critical section needlessly
  2012-11-19  7:30               ` Ivo Sieben
@ 2012-11-19 10:20                 ` Preeti U Murthy
  2012-11-19 15:10                 ` Oleg Nesterov
  1 sibling, 0 replies; 16+ messages in thread
From: Preeti U Murthy @ 2012-11-19 10:20 UTC (permalink / raw)
  To: Ivo Sieben
  Cc: linux-kernel, Andi Kleen, Oleg Nesterov, Peter Zijlstra,
	Ingo Molnar, linux-serial, Alan Cox, Greg KH

Hi Ivo,

On 11/19/2012 01:00 PM, Ivo Sieben wrote:
> Check the waitqueue task list to be non empty before entering the critical
> section. This prevents locking the spin lock needlessly in case the queue
> was empty, and therefor also prevent scheduling overhead on a PREEMPT_RT
> system.
> 
> Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com>
> ---
> 
>  a second repost of this patch v2: Can anyone respond?
>  Did I apply the memory barrier correct?
> 
>  v2:
>  - We don't need the "careful" list empty, a normal list empty is sufficient:
>    if you miss an update it was just as it happened a little later.
>  - Because of memory ordering problems we can observe an unupdated list
>    administration. This can cause an wait_event-like code to miss an event.
>    Adding a memory barrier befor checking the list to be empty will guarantee we
>    evaluate a 100% updated list adminsitration.
> 
>  kernel/sched/core.c |   19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2d8927f..168a9b2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3090,9 +3090,22 @@ void __wake_up(wait_queue_head_t *q, unsigned int mode,
>  {
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&q->lock, flags);
> -	__wake_up_common(q, mode, nr_exclusive, 0, key);
> -	spin_unlock_irqrestore(&q->lock, flags);
> +	/*
> +	 * We check for list emptiness outside the lock. This prevents the wake
> +	 * up to enter the critical section needlessly when the task list is
> +	 * empty.
> +	 *
> +	 * Placed a full memory barrier before checking list emptiness to make
> +	 * 100% sure this function sees an up-to-date list administration.
> +	 * Note that other code that manipulates the list uses a spin_lock and
> +	 * therefore doesn't need additional memory barriers.
> +	 */
> +	smp_mb();
> +	if (!list_empty(&q->task_list)) {
> +		spin_lock_irqsave(&q->lock, flags);
> +		__wake_up_common(q, mode, nr_exclusive, 0, key);
> +		spin_unlock_irqrestore(&q->lock, flags);
> +	}
>  }
>  EXPORT_SYMBOL(__wake_up);
>  
> 
Looks good to me.
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REPOST-v2] sched: Prevent wakeup to enter critical section needlessly
  2012-11-19  7:30               ` Ivo Sieben
  2012-11-19 10:20                 ` Preeti U Murthy
@ 2012-11-19 15:10                 ` Oleg Nesterov
  2012-11-19 15:34                   ` Ivo Sieben
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2012-11-19 15:10 UTC (permalink / raw)
  To: Ivo Sieben
  Cc: linux-kernel, Andi Kleen, Peter Zijlstra, Ingo Molnar,
	linux-serial, Alan Cox, Greg KH

On 11/19, Ivo Sieben wrote:
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3090,9 +3090,22 @@ void __wake_up(wait_queue_head_t *q, unsigned int mode,
>  {
>  	unsigned long flags;
>
> -	spin_lock_irqsave(&q->lock, flags);
> -	__wake_up_common(q, mode, nr_exclusive, 0, key);
> -	spin_unlock_irqrestore(&q->lock, flags);
> +	/*
> +	 * We check for list emptiness outside the lock. This prevents the wake
> +	 * up to enter the critical section needlessly when the task list is
> +	 * empty.
> +	 *
> +	 * Placed a full memory barrier before checking list emptiness to make
> +	 * 100% sure this function sees an up-to-date list administration.
> +	 * Note that other code that manipulates the list uses a spin_lock and
> +	 * therefore doesn't need additional memory barriers.
> +	 */
> +	smp_mb();
> +	if (!list_empty(&q->task_list)) {

waitqueue_active() ?

> +		spin_lock_irqsave(&q->lock, flags);
> +		__wake_up_common(q, mode, nr_exclusive, 0, key);
> +		spin_unlock_irqrestore(&q->lock, flags);
> +	}

I am wondering if it makes sense unconditionally. A lot of callers do

	if (waitqueue_active(q))
		wake_up(...);

this patch makes the optimization above pointless and adds mb().


But I won't argue.

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REPOST-v2] sched: Prevent wakeup to enter critical section needlessly
  2012-11-19 15:10                 ` Oleg Nesterov
@ 2012-11-19 15:34                   ` Ivo Sieben
  2012-11-19 15:49                     ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Ivo Sieben @ 2012-11-19 15:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Andi Kleen, Peter Zijlstra, Ingo Molnar,
	linux-serial, Alan Cox, Greg KH

Hi

2012/11/19 Oleg Nesterov <oleg@redhat.com>:
>
> I am wondering if it makes sense unconditionally. A lot of callers do
>
>         if (waitqueue_active(q))
>                 wake_up(...);
>
> this patch makes the optimization above pointless and adds mb().
>
>
> But I won't argue.
>
> Oleg.
>

This patch solved an issue for me that I had with the TTY line
discipline idle handling:
Testing on a PREEMPT_RT system with TTY serial communication. Each
time the TTY line discipline is dereferenced the Idle handling wait
queue is woken up (see function put_ldisc in /drivers/tty/tty_ldisc.c)
However line discipline idle handling is not used very often so the
wait queue is empty most of the time. But still the wake_up() function
enters the critical section guarded by spin locks. This causes
additional scheduling overhead when a lower priority thread has
control of that same lock.

The /drivers/tty/tty_ldisc.c did not use the waitqueue_active() call
to check if the waitqueue was filled.... maybe I should solve this
problem the other way around: and make tty_ldisc.c first do the
waitqueue_active() call?

Regards,
Ivo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REPOST-v2] sched: Prevent wakeup to enter critical section needlessly
  2012-11-19 15:34                   ` Ivo Sieben
@ 2012-11-19 15:49                     ` Oleg Nesterov
  2012-11-21 13:03                       ` Ivo Sieben
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2012-11-19 15:49 UTC (permalink / raw)
  To: Ivo Sieben
  Cc: linux-kernel, Andi Kleen, Peter Zijlstra, Ingo Molnar,
	linux-serial, Alan Cox, Greg KH

On 11/19, Ivo Sieben wrote:
>
> Hi
>
> 2012/11/19 Oleg Nesterov <oleg@redhat.com>:
> >
> > I am wondering if it makes sense unconditionally. A lot of callers do
> >
> >         if (waitqueue_active(q))
> >                 wake_up(...);
> >
> > this patch makes the optimization above pointless and adds mb().
> >
> >
> > But I won't argue.
> >
> > Oleg.
> >
>
> This patch solved an issue for me that I had with the TTY line
> discipline idle handling:
> Testing on a PREEMPT_RT system with TTY serial communication. Each
> time the TTY line discipline is dereferenced the Idle handling wait
> queue is woken up (see function put_ldisc in /drivers/tty/tty_ldisc.c)
> However line discipline idle handling is not used very often so the
> wait queue is empty most of the time. But still the wake_up() function
> enters the critical section guarded by spin locks. This causes
> additional scheduling overhead when a lower priority thread has
> control of that same lock.
>
> The /drivers/tty/tty_ldisc.c did not use the waitqueue_active() call
> to check if the waitqueue was filled.... maybe I should solve this
> problem the other way around: and make tty_ldisc.c first do the
> waitqueue_active() call?

IMHO yes...

Because on a second thought I suspect this change is wrong.

Just for example, please look at kauditd_thread(). It does

	set_current_state(TASK_INTERRUPTIBLE);

	add_wait_queue(&kauditd_wait, &wait);

	if (!CONDITION)		// <-- LOAD
		schedule();

And the last LOAD can leak into the critical section protected by
wait_queue_head_t->lock, and it can be reordered with list_add()
inside this critical section. In this case we can race with wake_up()
unless it takes the same lock.

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REPOST-v2] sched: Prevent wakeup to enter critical section needlessly
  2012-11-19 15:49                     ` Oleg Nesterov
@ 2012-11-21 13:03                       ` Ivo Sieben
  2012-11-21 13:47                         ` Alan Cox
  2012-11-21 13:58                         ` Oleg Nesterov
  0 siblings, 2 replies; 16+ messages in thread
From: Ivo Sieben @ 2012-11-21 13:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Andi Kleen, Peter Zijlstra, Ingo Molnar,
	linux-serial, Alan Cox, Greg KH

Hi

2012/11/19 Oleg Nesterov <oleg@redhat.com>:
>
> Because on a second thought I suspect this change is wrong.
>
> Just for example, please look at kauditd_thread(). It does
>
>         set_current_state(TASK_INTERRUPTIBLE);
>
>         add_wait_queue(&kauditd_wait, &wait);
>
>         if (!CONDITION)         // <-- LOAD
>                 schedule();
>
> And the last LOAD can leak into the critical section protected by
> wait_queue_head_t->lock, and it can be reordered with list_add()
> inside this critical section. In this case we can race with wake_up()
> unless it takes the same lock.
>
> Oleg.
>

I agree that I should solve my problem using the waitqueue_active()
function locally. I'll abandon this patch and fix it in the
tty_ldisc.c.

But we try to understand your fault scenario: How can the LOAD leak
into the critical section? As far as we understand the spin_unlock()
function also contains a memory barrier to prevent such a reordering
from happening.

Regards,
Ivo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REPOST-v2] sched: Prevent wakeup to enter critical section needlessly
  2012-11-21 13:03                       ` Ivo Sieben
@ 2012-11-21 13:47                         ` Alan Cox
  2012-11-21 13:58                         ` Oleg Nesterov
  1 sibling, 0 replies; 16+ messages in thread
From: Alan Cox @ 2012-11-21 13:47 UTC (permalink / raw)
  To: Ivo Sieben
  Cc: Oleg Nesterov, linux-kernel, Andi Kleen, Peter Zijlstra,
	Ingo Molnar, linux-serial, Alan Cox, Greg KH

> But we try to understand your fault scenario: How can the LOAD leak
> into the critical section? As far as we understand the spin_unlock()
> function also contains a memory barrier to prevent such a reordering
> from happening.

It does - it would be very interesting for someone to look at the assembly
if this is making a difference.

Alan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [REPOST-v2] sched: Prevent wakeup to enter critical section needlessly
  2012-11-21 13:03                       ` Ivo Sieben
  2012-11-21 13:47                         ` Alan Cox
@ 2012-11-21 13:58                         ` Oleg Nesterov
  1 sibling, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2012-11-21 13:58 UTC (permalink / raw)
  To: Ivo Sieben
  Cc: linux-kernel, Andi Kleen, Peter Zijlstra, Ingo Molnar,
	linux-serial, Alan Cox, Greg KH

On 11/21, Ivo Sieben wrote:
> Hi
>
> 2012/11/19 Oleg Nesterov <oleg@redhat.com>:
> >
> > Because on a second thought I suspect this change is wrong.
> >
> > Just for example, please look at kauditd_thread(). It does
> >
> >         set_current_state(TASK_INTERRUPTIBLE);
> >
> >         add_wait_queue(&kauditd_wait, &wait);
> >
> >         if (!CONDITION)         // <-- LOAD
> >                 schedule();
> >
> > And the last LOAD can leak into the critical section protected by
> > wait_queue_head_t->lock, and it can be reordered with list_add()
> > inside this critical section. In this case we can race with wake_up()
> > unless it takes the same lock.
> >
> > Oleg.
> >
>
> I agree that I should solve my problem using the waitqueue_active()
> function locally. I'll abandon this patch and fix it in the
> tty_ldisc.c.
>
> But we try to understand your fault scenario: How can the LOAD leak
> into the critical section? As far as we understand the spin_unlock()
> function also contains a memory barrier
                           ^^^^^^^^^^^^^^

Not really, in general unlock is a one-way barrier.

> to prevent such a reordering
> from happening.

Please look at the comment above prepare_to_wait(), for example. Or
look at wmb() in try_to_wake_up().

I guess this is not possible on x86, but in general

	X;
	LOCK();
	UNLOCK();
	Y;

can be reordered as

	LOCK();
	Y;
	X;
	UNLOCK();

UNLOCK + LOCK is the full memory barrier.

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-11-21 13:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-24 13:06 [PATCH] RFC: sched: Prevent wakeup to enter critical section needlessly Ivo Sieben
2012-10-09 11:30 ` [REPOST] " Ivo Sieben
2012-10-09 13:37   ` Andi Kleen
2012-10-09 14:15     ` Peter Zijlstra
2012-10-09 15:17       ` Oleg Nesterov
2012-10-10 14:02         ` Andi Kleen
2012-10-18  8:30           ` [PATCH-v2] " Ivo Sieben
2012-10-25 10:12             ` [REPOST-v2] " Ivo Sieben
2012-11-19  7:30               ` Ivo Sieben
2012-11-19 10:20                 ` Preeti U Murthy
2012-11-19 15:10                 ` Oleg Nesterov
2012-11-19 15:34                   ` Ivo Sieben
2012-11-19 15:49                     ` Oleg Nesterov
2012-11-21 13:03                       ` Ivo Sieben
2012-11-21 13:47                         ` Alan Cox
2012-11-21 13:58                         ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).