linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] sched: fix next_interval determination in idle_balance()
@ 2007-06-19 19:39 Ingo Molnar
  2007-06-19 21:59 ` Paul E. McKenney
  0 siblings, 1 reply; 2+ messages in thread
From: Ingo Molnar @ 2007-06-19 19:39 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton
  Cc: linux-kernel, Greg KH, chrisw, stable, Srivatsa Vaddagiri,
	Christoph Lameter, Paul E. McKenney


2.6.22 must-have item - perhaps suitable for -stable too, because it was 
reproduced on 2.6.21.5 too.

---------------------->
From: Christoph Lameter <clameter@sgi.com>
Subject: [patch] sched: fix next_interval determination in idle_balance()

Fix massive SMP imbalance on NUMA nodes observed on 2.6.21.5 with CFS. 
(and later on reproduced without CFS as well).

The intervals of domains that do not have SD_BALANCE_NEWIDLE must be 
considered for the calculation of the time of the next balance. 
Otherwise we may defer rebalancing forever and nodes might stay idle for 
very long times.

Siddha also spotted that the conversion of the balance interval to 
jiffies is missing. Fix that to.

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

also continue the loop if !(sd->flags & SD_LOAD_BALANCE).

Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

It did in fact trigger under all three of mainline, CFS, and -rt 
including CFS -- see below for a couple of emails from last Friday 
giving results for these three on the AMD box (where it happened) and on 
a single-quad NUMA-Q system (where it did not, at least not with such 
severity).

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

Index: v/kernel/sched.c
===================================================================
--- v.orig/kernel/sched.c
+++ v/kernel/sched.c
@@ -2938,17 +2938,21 @@ static void idle_balance(int this_cpu, s
 	unsigned long next_balance = jiffies + 60 *  HZ;
 
 	for_each_domain(this_cpu, sd) {
-		if (sd->flags & SD_BALANCE_NEWIDLE) {
+		unsigned long interval;
+
+		if (!(sd->flags & SD_LOAD_BALANCE))
+			continue;
+
+		if (sd->flags & SD_BALANCE_NEWIDLE)
 			/* If we've pulled tasks over stop searching: */
 			pulled_task = load_balance_newidle(this_cpu,
-							this_rq, sd);
-			if (time_after(next_balance,
-				  sd->last_balance + sd->balance_interval))
-				next_balance = sd->last_balance
-						+ sd->balance_interval;
-			if (pulled_task)
-				break;
-		}
+								this_rq, sd);
+
+		interval = msecs_to_jiffies(sd->balance_interval);
+		if (time_after(next_balance, sd->last_balance + interval))
+			next_balance = sd->last_balance + interval;
+		if (pulled_task)
+			break;
 	}
 	if (!pulled_task)
 		/*

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [patch] sched: fix next_interval determination in idle_balance()
  2007-06-19 19:39 [patch] sched: fix next_interval determination in idle_balance() Ingo Molnar
@ 2007-06-19 21:59 ` Paul E. McKenney
  0 siblings, 0 replies; 2+ messages in thread
From: Paul E. McKenney @ 2007-06-19 21:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Greg KH, chrisw,
	stable, Srivatsa Vaddagiri, Christoph Lameter

On Tue, Jun 19, 2007 at 09:39:03PM +0200, Ingo Molnar wrote:
> 
> 2.6.22 must-have item - perhaps suitable for -stable too, because it was 
> reproduced on 2.6.21.5 too.
> 
> ---------------------->
> From: Christoph Lameter <clameter@sgi.com>
> Subject: [patch] sched: fix next_interval determination in idle_balance()
> 
> Fix massive SMP imbalance on NUMA nodes observed on 2.6.21.5 with CFS. 
> (and later on reproduced without CFS as well).
> 
> The intervals of domains that do not have SD_BALANCE_NEWIDLE must be 
> considered for the calculation of the time of the next balance. 
> Otherwise we may defer rebalancing forever and nodes might stay idle for 
> very long times.
> 
> Siddha also spotted that the conversion of the balance interval to 
> jiffies is missing. Fix that to.
> 
> From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> 
> also continue the loop if !(sd->flags & SD_LOAD_BALANCE).
> 
> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Retested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Spread evenly in well under a minute on the machine that previously
kept all of the rcutorture tasks on a single CPU forever, which is most
definitely not the way to torture RCU.

Good stuff!!!

						Thanx, Paul

> It did in fact trigger under all three of mainline, CFS, and -rt 
> including CFS -- see below for a couple of emails from last Friday 
> giving results for these three on the AMD box (where it happened) and on 
> a single-quad NUMA-Q system (where it did not, at least not with such 
> severity).
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  kernel/sched.c |   22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> Index: v/kernel/sched.c
> ===================================================================
> --- v.orig/kernel/sched.c
> +++ v/kernel/sched.c
> @@ -2938,17 +2938,21 @@ static void idle_balance(int this_cpu, s
>  	unsigned long next_balance = jiffies + 60 *  HZ;
> 
>  	for_each_domain(this_cpu, sd) {
> -		if (sd->flags & SD_BALANCE_NEWIDLE) {
> +		unsigned long interval;
> +
> +		if (!(sd->flags & SD_LOAD_BALANCE))
> +			continue;
> +
> +		if (sd->flags & SD_BALANCE_NEWIDLE)
>  			/* If we've pulled tasks over stop searching: */
>  			pulled_task = load_balance_newidle(this_cpu,
> -							this_rq, sd);
> -			if (time_after(next_balance,
> -				  sd->last_balance + sd->balance_interval))
> -				next_balance = sd->last_balance
> -						+ sd->balance_interval;
> -			if (pulled_task)
> -				break;
> -		}
> +								this_rq, sd);
> +
> +		interval = msecs_to_jiffies(sd->balance_interval);
> +		if (time_after(next_balance, sd->last_balance + interval))
> +			next_balance = sd->last_balance + interval;
> +		if (pulled_task)
> +			break;
>  	}
>  	if (!pulled_task)
>  		/*

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-06-19 21:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-19 19:39 [patch] sched: fix next_interval determination in idle_balance() Ingo Molnar
2007-06-19 21:59 ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).