linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sched: Improve load balancing in the presence of idle CPUs
@ 2015-03-30 18:55 Jason Low
  2015-03-31  8:37 ` Preeti U Murthy
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Low @ 2015-03-30 18:55 UTC (permalink / raw)
  To: Preeti U Murthy, peterz, mingo, Daniel Lezcano
  Cc: riel, daniel.lezcano, vincent.guittot, srikar, pjt, benh, efault,
	linux-kernel, iamjoonsoo.kim, svaidy, tim.c.chen,
	morten.rasmussen, jason.low2

Hi Preeti,

I noticed that another commit 4a725627f21d converted the check in
nohz_kick_needed() from idle_cpu() to rq->idle_balance, causing a
potentially outdated value to be used if this cpu is able to pull tasks
using rebalance_domains(), and nohz_kick_needed() directly returning
false.

Would this patch also help address some of the issue you are seeing?

Signed-off-by: Jason Low <jason.low2@hp.com>
---
 kernel/sched/fair.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fdae26e..ba8ec1a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7644,7 +7644,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 		 * balancing owner will pick it up.
 		 */
 		if (need_resched())
-			break;
+			goto end;
 
 		rq = cpu_rq(balance_cpu);
 
@@ -7687,7 +7687,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	int nr_busy, cpu = rq->cpu;
 	bool kick = false;
 
-	if (unlikely(rq->idle_balance))
+	if (unlikely(idle_cpu(cpu)))
 		return false;
 
        /*
-- 
1.7.2.5




^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-03-30 18:55 sched: Improve load balancing in the presence of idle CPUs Jason Low
@ 2015-03-31  8:37 ` Preeti U Murthy
  2015-03-31 18:54   ` Jason Low
  2015-04-02  2:11   ` Jason Low
  0 siblings, 2 replies; 30+ messages in thread
From: Preeti U Murthy @ 2015-03-31  8:37 UTC (permalink / raw)
  To: Jason Low, peterz, mingo, Daniel Lezcano
  Cc: riel, vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, morten.rasmussen

Hi Jason,

On 03/31/2015 12:25 AM, Jason Low wrote:
> Hi Preeti,
> 
> I noticed that another commit 4a725627f21d converted the check in
> nohz_kick_needed() from idle_cpu() to rq->idle_balance, causing a
> potentially outdated value to be used if this cpu is able to pull tasks
> using rebalance_domains(), and nohz_kick_needed() directly returning
> false.

I see that rebalance_domains() will be run at the end of the scheduler
tick interrupt handling. trigger_load_balance() only sets the softirq,
it does not call rebalance_domains() immediately. So the call graph
would be:

rq->idle_balance = idle_cpu()
|____trigger_load_balance()
     |_____raise SCHED_SOFTIRQ - we are handling interrupt,hence defer
           |____nohz_kick_needed()
               |____rebalance_domains() run through the softirqd.

Correct me if I am wrong but since we do not pull any load between the
rq->idle_balance update and nohz_kick_needed(), we are safe in reading
rq->idle_balance in nohz_kick_needed().

> 
> Would this patch also help address some of the issue you are seeing?
> 
> Signed-off-by: Jason Low <jason.low2@hp.com>
> ---
>  kernel/sched/fair.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fdae26e..ba8ec1a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7644,7 +7644,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  		 * balancing owner will pick it up.
>  		 */
>  		if (need_resched())
> -			break;
> +			goto end;

Why is this hunk needed?

Regards
Preeti U Murthy
> 
>  		rq = cpu_rq(balance_cpu);
> 
> @@ -7687,7 +7687,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
>  	int nr_busy, cpu = rq->cpu;
>  	bool kick = false;
> 
> -	if (unlikely(rq->idle_balance))
> +	if (unlikely(idle_cpu(cpu)))
>  		return false;
> 
>         /*
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-03-31  8:37 ` Preeti U Murthy
@ 2015-03-31 18:54   ` Jason Low
  2015-04-01  6:49     ` Preeti U Murthy
  2015-04-02  2:11   ` Jason Low
  1 sibling, 1 reply; 30+ messages in thread
From: Jason Low @ 2015-03-31 18:54 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: peterz, mingo, Daniel Lezcano, riel, vincent.guittot, srikar,
	pjt, benh, efault, linux-kernel, iamjoonsoo.kim, svaidy,
	tim.c.chen, morten.rasmussen, jason.low2

On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote:
> Hi Jason,
> 
> On 03/31/2015 12:25 AM, Jason Low wrote:
> > Hi Preeti,
> > 
> > I noticed that another commit 4a725627f21d converted the check in
> > nohz_kick_needed() from idle_cpu() to rq->idle_balance, causing a
> > potentially outdated value to be used if this cpu is able to pull tasks
> > using rebalance_domains(), and nohz_kick_needed() directly returning
> > false.
> 
> I see that rebalance_domains() will be run at the end of the scheduler
> tick interrupt handling. trigger_load_balance() only sets the softirq,
> it does not call rebalance_domains() immediately. So the call graph
> would be:

Oh right, since that only sets the softirq, this wouldn't be the issue,
though we would need these changes if we were to incorporate any sort of
nohz_kick_needed() logic into the nohz_idle_balance() code path correct?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-03-31 18:54   ` Jason Low
@ 2015-04-01  6:49     ` Preeti U Murthy
  2015-04-01 17:04       ` Morten Rasmussen
  0 siblings, 1 reply; 30+ messages in thread
From: Preeti U Murthy @ 2015-04-01  6:49 UTC (permalink / raw)
  To: Jason Low
  Cc: peterz, mingo, Daniel Lezcano, riel, vincent.guittot, srikar,
	pjt, benh, efault, linux-kernel, iamjoonsoo.kim, svaidy,
	tim.c.chen, morten.rasmussen


On 04/01/2015 12:24 AM, Jason Low wrote:
> On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote:
>> Hi Jason,
>>
>> On 03/31/2015 12:25 AM, Jason Low wrote:
>>> Hi Preeti,
>>>
>>> I noticed that another commit 4a725627f21d converted the check in
>>> nohz_kick_needed() from idle_cpu() to rq->idle_balance, causing a
>>> potentially outdated value to be used if this cpu is able to pull tasks
>>> using rebalance_domains(), and nohz_kick_needed() directly returning
>>> false.
>>
>> I see that rebalance_domains() will be run at the end of the scheduler
>> tick interrupt handling. trigger_load_balance() only sets the softirq,
>> it does not call rebalance_domains() immediately. So the call graph
>> would be:
> 
> Oh right, since that only sets the softirq, this wouldn't be the issue,
> though we would need these changes if we were to incorporate any sort of
> nohz_kick_needed() logic into the nohz_idle_balance() code path correct?

I am sorry I don't quite get this. Can you please elaborate?

Regards
Preeti U Murthy
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-01  6:49     ` Preeti U Murthy
@ 2015-04-01 17:04       ` Morten Rasmussen
  2015-04-02  3:30         ` Jason Low
  2015-04-02  5:59         ` Jason Low
  0 siblings, 2 replies; 30+ messages in thread
From: Morten Rasmussen @ 2015-04-01 17:04 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Jason Low, peterz, mingo, Daniel Lezcano, riel, vincent.guittot,
	srikar, pjt, benh, efault, linux-kernel, iamjoonsoo.kim, svaidy,
	tim.c.chen

On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
> 
> On 04/01/2015 12:24 AM, Jason Low wrote:
> > On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote:
> >> Hi Jason,
> >>
> >> On 03/31/2015 12:25 AM, Jason Low wrote:
> >>> Hi Preeti,
> >>>
> >>> I noticed that another commit 4a725627f21d converted the check in
> >>> nohz_kick_needed() from idle_cpu() to rq->idle_balance, causing a
> >>> potentially outdated value to be used if this cpu is able to pull tasks
> >>> using rebalance_domains(), and nohz_kick_needed() directly returning
> >>> false.
> >>
> >> I see that rebalance_domains() will be run at the end of the scheduler
> >> tick interrupt handling. trigger_load_balance() only sets the softirq,
> >> it does not call rebalance_domains() immediately. So the call graph
> >> would be:
> > 
> > Oh right, since that only sets the softirq, this wouldn't be the issue,
> > though we would need these changes if we were to incorporate any sort of
> > nohz_kick_needed() logic into the nohz_idle_balance() code path correct?
> 
> I am sorry I don't quite get this. Can you please elaborate?

I think the scenario is that we are in nohz_idle_balance() and decide to
bail out because we have pulled some tasks, but before leaving
nohz_idle_balance() we want to check if more balancing is necessary
using nohz_kick_needed() and potentially kick somebody to continue.

Note that the balance cpu is currently skipped in nohz_idle_balance(),
but if it wasn't the scenario would be possible.

In that case, we can't rely on rq->idle_balance as it would not be
up-to-date. Also, we may even want to use nohz_kick_needed(rq) where rq
!= this_rq, in which case we probably also want an updated status. It
seems that rq->idle_balance is only updated at each tick.

Or maybe I'm all wrong :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-03-31  8:37 ` Preeti U Murthy
  2015-03-31 18:54   ` Jason Low
@ 2015-04-02  2:11   ` Jason Low
  1 sibling, 0 replies; 30+ messages in thread
From: Jason Low @ 2015-04-02  2:11 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: peterz, mingo, Daniel Lezcano, riel, vincent.guittot, srikar,
	pjt, benh, efault, linux-kernel, iamjoonsoo.kim, svaidy,
	tim.c.chen, morten.rasmussen, jason.low2

On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote:
> On 03/31/2015 12:25 AM, Jason Low wrote:

> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index fdae26e..ba8ec1a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7644,7 +7644,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> >  		 * balancing owner will pick it up.
> >  		 */
> >  		if (need_resched())
> > -			break;
> > +			goto end;
> 
> Why is this hunk needed?

In terms of the change in the need_resched() case, if the current CPU
doesn't complete iterating all of the CPUs, then this will make it not
update nohz.next_balance. This is so we can continue the balancing with
the next balancing owner without too much delay.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-01 17:04       ` Morten Rasmussen
@ 2015-04-02  3:30         ` Jason Low
  2015-04-02  8:49           ` Morten Rasmussen
  2015-04-02  5:59         ` Jason Low
  1 sibling, 1 reply; 30+ messages in thread
From: Jason Low @ 2015-04-02  3:30 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Preeti U Murthy, peterz, mingo, Daniel Lezcano, riel,
	vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, jason.low2

On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
> On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
> > 
> > On 04/01/2015 12:24 AM, Jason Low wrote:
> > > On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote:
> > >> Hi Jason,
> > >>
> > >> On 03/31/2015 12:25 AM, Jason Low wrote:
> > >>> Hi Preeti,
> > >>>
> > >>> I noticed that another commit 4a725627f21d converted the check in
> > >>> nohz_kick_needed() from idle_cpu() to rq->idle_balance, causing a
> > >>> potentially outdated value to be used if this cpu is able to pull tasks
> > >>> using rebalance_domains(), and nohz_kick_needed() directly returning
> > >>> false.
> > >>
> > >> I see that rebalance_domains() will be run at the end of the scheduler
> > >> tick interrupt handling. trigger_load_balance() only sets the softirq,
> > >> it does not call rebalance_domains() immediately. So the call graph
> > >> would be:
> > > 
> > > Oh right, since that only sets the softirq, this wouldn't be the issue,
> > > though we would need these changes if we were to incorporate any sort of
> > > nohz_kick_needed() logic into the nohz_idle_balance() code path correct?
> > 
> > I am sorry I don't quite get this. Can you please elaborate?
> 
> I think the scenario is that we are in nohz_idle_balance() and decide to
> bail out because we have pulled some tasks, but before leaving
> nohz_idle_balance() we want to check if more balancing is necessary
> using nohz_kick_needed() and potentially kick somebody to continue.

> Note that the balance cpu is currently skipped in nohz_idle_balance(),
> but if it wasn't the scenario would be possible.

This scenario would also be possible if we call rebalance_domains()
first again.

I'm wondering if adding the nohz_kick_needed(), ect... in
nohz_idle_balance() can address the 10 second latency issue while still
calling rebalance_domains() first, since it seems more ideal to try
balancing on the current awake CPU first, as you also have mentioned

> In that case, we can't rely on rq->idle_balance as it would not be
> up-to-date. Also, we may even want to use nohz_kick_needed(rq) where rq
> != this_rq, in which case we probably also want an updated status. It
> seems that rq->idle_balance is only updated at each tick.

Yup, that's about what I was describing.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-01 17:04       ` Morten Rasmussen
  2015-04-02  3:30         ` Jason Low
@ 2015-04-02  5:59         ` Jason Low
  2015-04-02  8:42           ` Preeti U Murthy
                             ` (3 more replies)
  1 sibling, 4 replies; 30+ messages in thread
From: Jason Low @ 2015-04-02  5:59 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Preeti U Murthy, peterz, mingo, Daniel Lezcano, riel,
	vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, jason.low2

On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
> On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:

> > I am sorry I don't quite get this. Can you please elaborate?
> 
> I think the scenario is that we are in nohz_idle_balance() and decide to
> bail out because we have pulled some tasks, but before leaving
> nohz_idle_balance() we want to check if more balancing is necessary
> using nohz_kick_needed() and potentially kick somebody to continue.

Also, below is an example patch.

(Without the conversion to idle_cpu(), the check for rq->idle_balance
would not be accurate anymore)

---
 kernel/sched/fair.c |   17 ++++++++++-------
 1 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fdae26e..7749a14 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7620,6 +7620,8 @@ out:
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
+static inline bool nohz_kick_needed(struct rq *rq);
+
 /*
  * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
  * rebalancing for all the cpus for whom scheduler ticks are stopped.
@@ -7629,6 +7631,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 	int this_cpu = this_rq->cpu;
 	struct rq *rq;
 	int balance_cpu;
+	bool done_balancing = false;
 
 	if (idle != CPU_IDLE ||
 	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
@@ -7644,7 +7647,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 		 * balancing owner will pick it up.
 		 */
 		if (need_resched())
-			break;
+			goto end;
 
 		rq = cpu_rq(balance_cpu);
 
@@ -7663,9 +7666,12 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 		if (time_after(this_rq->next_balance, rq->next_balance))
 			this_rq->next_balance = rq->next_balance;
 	}
+	done_balancing = true;
 	nohz.next_balance = this_rq->next_balance;
 end:
 	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
+	if (!done_balancing && nohz_kick_needed(this_rq))
+		nohz_balancer_kick();
 }
 
 /*
@@ -7687,7 +7693,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	int nr_busy, cpu = rq->cpu;
 	bool kick = false;
 
-	if (unlikely(rq->idle_balance))
+	if (unlikely(idle_cpu(cpu)))
 		return false;
 
        /*
@@ -7757,16 +7763,13 @@ static void run_rebalance_domains(struct softirq_action *h)
 	enum cpu_idle_type idle = this_rq->idle_balance ?
 						CPU_IDLE : CPU_NOT_IDLE;
 
+	rebalance_domains(this_rq, idle);
 	/*
 	 * If this cpu has a pending nohz_balance_kick, then do the
 	 * balancing on behalf of the other idle cpus whose ticks are
-	 * stopped. Do nohz_idle_balance *before* rebalance_domains to
-	 * give the idle cpus a chance to load balance. Else we may
-	 * load balance only within the local sched_domain hierarchy
-	 * and abort nohz_idle_balance altogether if we pull some load.
+	 * stopped.
 	 */
 	nohz_idle_balance(this_rq, idle);
-	rebalance_domains(this_rq, idle);
 }
 
 /*
-- 
1.7.2.5




^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-02  5:59         ` Jason Low
@ 2015-04-02  8:42           ` Preeti U Murthy
  2015-04-02  9:17           ` Morten Rasmussen
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Preeti U Murthy @ 2015-04-02  8:42 UTC (permalink / raw)
  To: Jason Low, Morten Rasmussen
  Cc: peterz, mingo, Daniel Lezcano, riel, vincent.guittot, srikar,
	pjt, benh, efault, linux-kernel, iamjoonsoo.kim, svaidy,
	tim.c.chen

On 04/02/2015 11:29 AM, Jason Low wrote:
> On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
>> On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
> 
>>> I am sorry I don't quite get this. Can you please elaborate?
>>
>> I think the scenario is that we are in nohz_idle_balance() and decide to
>> bail out because we have pulled some tasks, but before leaving
>> nohz_idle_balance() we want to check if more balancing is necessary
>> using nohz_kick_needed() and potentially kick somebody to continue.
> 
> Also, below is an example patch.
> 
> (Without the conversion to idle_cpu(), the check for rq->idle_balance
> would not be accurate anymore)
> 
> ---
>  kernel/sched/fair.c |   17 ++++++++++-------
>  1 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fdae26e..7749a14 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7620,6 +7620,8 @@ out:
>  }
> 
>  #ifdef CONFIG_NO_HZ_COMMON
> +static inline bool nohz_kick_needed(struct rq *rq);
> +
>  /*
>   * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
>   * rebalancing for all the cpus for whom scheduler ticks are stopped.
> @@ -7629,6 +7631,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  	int this_cpu = this_rq->cpu;
>  	struct rq *rq;
>  	int balance_cpu;
> +	bool done_balancing = false;
> 
>  	if (idle != CPU_IDLE ||
>  	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
> @@ -7644,7 +7647,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  		 * balancing owner will pick it up.
>  		 */
>  		if (need_resched())
> -			break;
> +			goto end;
> 
>  		rq = cpu_rq(balance_cpu);
> 
> @@ -7663,9 +7666,12 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  		if (time_after(this_rq->next_balance, rq->next_balance))
>  			this_rq->next_balance = rq->next_balance;
>  	}
> +	done_balancing = true;
>  	nohz.next_balance = this_rq->next_balance;
>  end:
>  	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> +	if (!done_balancing && nohz_kick_needed(this_rq))
> +		nohz_balancer_kick();
>  }
> 
>  /*
> @@ -7687,7 +7693,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
>  	int nr_busy, cpu = rq->cpu;
>  	bool kick = false;
> 
> -	if (unlikely(rq->idle_balance))
> +	if (unlikely(idle_cpu(cpu)))
>  		return false;
> 
>         /*
> @@ -7757,16 +7763,13 @@ static void run_rebalance_domains(struct softirq_action *h)
>  	enum cpu_idle_type idle = this_rq->idle_balance ?
>  						CPU_IDLE : CPU_NOT_IDLE;
> 
> +	rebalance_domains(this_rq, idle);
>  	/*
>  	 * If this cpu has a pending nohz_balance_kick, then do the
>  	 * balancing on behalf of the other idle cpus whose ticks are
> -	 * stopped. Do nohz_idle_balance *before* rebalance_domains to
> -	 * give the idle cpus a chance to load balance. Else we may
> -	 * load balance only within the local sched_domain hierarchy
> -	 * and abort nohz_idle_balance altogether if we pull some load.
> +	 * stopped.
>  	 */
>  	nohz_idle_balance(this_rq, idle);
> -	rebalance_domains(this_rq, idle);
>  }

Ok this patch looks good. Let me test to find out if scheduling behavior
improves.

Regards
Preeti U Murthy


> 
>  /*
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-02  3:30         ` Jason Low
@ 2015-04-02  8:49           ` Morten Rasmussen
  0 siblings, 0 replies; 30+ messages in thread
From: Morten Rasmussen @ 2015-04-02  8:49 UTC (permalink / raw)
  To: Jason Low
  Cc: Preeti U Murthy, peterz, mingo, Daniel Lezcano, riel,
	vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen

On Thu, Apr 02, 2015 at 04:30:34AM +0100, Jason Low wrote:
> On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
> > On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
> > > 
> > > On 04/01/2015 12:24 AM, Jason Low wrote:
> > > > On Tue, 2015-03-31 at 14:07 +0530, Preeti U Murthy wrote:
> > > >> Hi Jason,
> > > >>
> > > >> On 03/31/2015 12:25 AM, Jason Low wrote:
> > > >>> Hi Preeti,
> > > >>>
> > > >>> I noticed that another commit 4a725627f21d converted the check in
> > > >>> nohz_kick_needed() from idle_cpu() to rq->idle_balance, causing a
> > > >>> potentially outdated value to be used if this cpu is able to pull tasks
> > > >>> using rebalance_domains(), and nohz_kick_needed() directly returning
> > > >>> false.
> > > >>
> > > >> I see that rebalance_domains() will be run at the end of the scheduler
> > > >> tick interrupt handling. trigger_load_balance() only sets the softirq,
> > > >> it does not call rebalance_domains() immediately. So the call graph
> > > >> would be:
> > > > 
> > > > Oh right, since that only sets the softirq, this wouldn't be the issue,
> > > > though we would need these changes if we were to incorporate any sort of
> > > > nohz_kick_needed() logic into the nohz_idle_balance() code path correct?
> > > 
> > > I am sorry I don't quite get this. Can you please elaborate?
> > 
> > I think the scenario is that we are in nohz_idle_balance() and decide to
> > bail out because we have pulled some tasks, but before leaving
> > nohz_idle_balance() we want to check if more balancing is necessary
> > using nohz_kick_needed() and potentially kick somebody to continue.
> 
> > Note that the balance cpu is currently skipped in nohz_idle_balance(),
> > but if it wasn't the scenario would be possible.
> 
> This scenario would also be possible if we call rebalance_domains()
> first again.

Yes.

> I'm wondering if adding the nohz_kick_needed(), ect... in
> nohz_idle_balance() can address the 10 second latency issue while still
> calling rebalance_domains() first, since it seems more ideal to try
> balancing on the current awake CPU first, as you also have mentioned

I believe it could. That is where I was going with the chain-of-kicks
idea. I think the main cause of the unacceptable you are observing is
due to nohz_kicks only being issued at the tick. So if the balancer
pulls for itself first and bails out the next kick won't be issued until
the next tick or even multiple ticks later depending on
nohz.next_balance.

I haven't figured out if there is a reason for delaying the next
nohz_idle_balance() though. 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-02  5:59         ` Jason Low
  2015-04-02  8:42           ` Preeti U Murthy
@ 2015-04-02  9:17           ` Morten Rasmussen
  2015-04-02 17:22             ` Jason Low
  2015-04-03 22:35           ` Tim Chen
  2015-04-04  9:59           ` Preeti U Murthy
  3 siblings, 1 reply; 30+ messages in thread
From: Morten Rasmussen @ 2015-04-02  9:17 UTC (permalink / raw)
  To: Jason Low
  Cc: Preeti U Murthy, peterz, mingo, Daniel Lezcano, riel,
	vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen

On Thu, Apr 02, 2015 at 06:59:07AM +0100, Jason Low wrote:
> On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
> > On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
> 
> > > I am sorry I don't quite get this. Can you please elaborate?
> > 
> > I think the scenario is that we are in nohz_idle_balance() and decide to
> > bail out because we have pulled some tasks, but before leaving
> > nohz_idle_balance() we want to check if more balancing is necessary
> > using nohz_kick_needed() and potentially kick somebody to continue.
> 
> Also, below is an example patch.
> 
> (Without the conversion to idle_cpu(), the check for rq->idle_balance
> would not be accurate anymore)
> 
> ---
>  kernel/sched/fair.c |   17 ++++++++++-------
>  1 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fdae26e..7749a14 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7620,6 +7620,8 @@ out:
>  }
>  
>  #ifdef CONFIG_NO_HZ_COMMON
> +static inline bool nohz_kick_needed(struct rq *rq);
> +
>  /*
>   * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
>   * rebalancing for all the cpus for whom scheduler ticks are stopped.
> @@ -7629,6 +7631,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  	int this_cpu = this_rq->cpu;
>  	struct rq *rq;
>  	int balance_cpu;
> +	bool done_balancing = false;
>  
>  	if (idle != CPU_IDLE ||
>  	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
> @@ -7644,7 +7647,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  		 * balancing owner will pick it up.
>  		 */
>  		if (need_resched())
> -			break;
> +			goto end;
>  
>  		rq = cpu_rq(balance_cpu);
>  
> @@ -7663,9 +7666,12 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  		if (time_after(this_rq->next_balance, rq->next_balance))
>  			this_rq->next_balance = rq->next_balance;
>  	}
> +	done_balancing = true;
>  	nohz.next_balance = this_rq->next_balance;
>  end:
>  	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> +	if (!done_balancing && nohz_kick_needed(this_rq))
> +		nohz_balancer_kick();
>  }
>  
>  /*
> @@ -7687,7 +7693,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
>  	int nr_busy, cpu = rq->cpu;
>  	bool kick = false;
>  
> -	if (unlikely(rq->idle_balance))
> +	if (unlikely(idle_cpu(cpu)))
>  		return false;
>  
>         /*
> @@ -7757,16 +7763,13 @@ static void run_rebalance_domains(struct softirq_action *h)
>  	enum cpu_idle_type idle = this_rq->idle_balance ?
>  						CPU_IDLE : CPU_NOT_IDLE;
>  
> +	rebalance_domains(this_rq, idle);
>  	/*
>  	 * If this cpu has a pending nohz_balance_kick, then do the
>  	 * balancing on behalf of the other idle cpus whose ticks are
> -	 * stopped. Do nohz_idle_balance *before* rebalance_domains to
> -	 * give the idle cpus a chance to load balance. Else we may
> -	 * load balance only within the local sched_domain hierarchy
> -	 * and abort nohz_idle_balance altogether if we pull some load.
> +	 * stopped.
>  	 */
>  	nohz_idle_balance(this_rq, idle);
> -	rebalance_domains(this_rq, idle);
>  }

I think this should reduce the latency Preeti is seeing and avoid
unnecessary wake-ups, however, it may not be quite as aggressive in
spreading tasks quickly. It will stop the chain-of-kicks as soon as the
balancer cpu has pulled only one task. The source cpu may still be
having two tasks and other cpus may still have more than two tasks
running.

Depending on how bad it is, we could consider kicking another cpu if the
imbalance is still significant after the balancer cpu has pulled a task.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-02  9:17           ` Morten Rasmussen
@ 2015-04-02 17:22             ` Jason Low
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Low @ 2015-04-02 17:22 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Preeti U Murthy, peterz, mingo, Daniel Lezcano, riel,
	vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, jason.low2

On Thu, 2015-04-02 at 10:17 +0100, Morten Rasmussen wrote:
> On Thu, Apr 02, 2015 at 06:59:07AM +0100, Jason Low wrote:

> > Also, below is an example patch.
> > 
> > (Without the conversion to idle_cpu(), the check for rq->idle_balance
> > would not be accurate anymore)

> I think this should reduce the latency Preeti is seeing and avoid
> unnecessary wake-ups, however, it may not be quite as aggressive in
> spreading tasks quickly. It will stop the chain-of-kicks as soon as the
> balancer cpu has pulled only one task. The source cpu may still be
> having two tasks and other cpus may still have more than two tasks
> running.

Yeah, good point. I'll wait and see if Preeti finds this to improve
scheduling behavior. If this only helps a little though, we can also try
to make it more aggressive in spreading tasks.

> Depending on how bad it is, we could consider kicking another cpu if the
> imbalance is still significant after the balancer cpu has pulled a task.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-02  5:59         ` Jason Low
  2015-04-02  8:42           ` Preeti U Murthy
  2015-04-02  9:17           ` Morten Rasmussen
@ 2015-04-03 22:35           ` Tim Chen
  2015-04-07 17:42             ` Jason Low
  2015-04-04  9:59           ` Preeti U Murthy
  3 siblings, 1 reply; 30+ messages in thread
From: Tim Chen @ 2015-04-03 22:35 UTC (permalink / raw)
  To: Jason Low
  Cc: Morten Rasmussen, Preeti U Murthy, peterz, mingo, Daniel Lezcano,
	riel, vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy

On Wed, 2015-04-01 at 22:59 -0700, Jason Low wrote:
> On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
> > On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
> 
> > > I am sorry I don't quite get this. Can you please elaborate?
> > 
> > I think the scenario is that we are in nohz_idle_balance() and decide to
> > bail out because we have pulled some tasks, but before leaving
> > nohz_idle_balance() we want to check if more balancing is necessary
> > using nohz_kick_needed() and potentially kick somebody to continue.
> 
> Also, below is an example patch.
> 
> (Without the conversion to idle_cpu(), the check for rq->idle_balance
> would not be accurate anymore)
> 
> ---
>  kernel/sched/fair.c |   17 ++++++++++-------
>  1 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fdae26e..7749a14 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7620,6 +7620,8 @@ out:
>  }
>  
>  #ifdef CONFIG_NO_HZ_COMMON
> +static inline bool nohz_kick_needed(struct rq *rq);
> +
>  /*
>   * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
>   * rebalancing for all the cpus for whom scheduler ticks are stopped.
> @@ -7629,6 +7631,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  	int this_cpu = this_rq->cpu;
>  	struct rq *rq;
>  	int balance_cpu;
> +	bool done_balancing = false;
>  
>  	if (idle != CPU_IDLE ||
>  	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
> @@ -7644,7 +7647,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  		 * balancing owner will pick it up.
>  		 */
>  		if (need_resched())
> -			break;
> +			goto end;
>  
>  		rq = cpu_rq(balance_cpu);
>  
> @@ -7663,9 +7666,12 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  		if (time_after(this_rq->next_balance, rq->next_balance))
>  			this_rq->next_balance = rq->next_balance;
>  	}
> +	done_balancing = true;
>  	nohz.next_balance = this_rq->next_balance;
>  end:
>  	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> +	if (!done_balancing && nohz_kick_needed(this_rq))
> +		nohz_balancer_kick();
>  }
>  
>  

I think we can get rid of the done_balancing boolean 
and make it a bit easier to read if we change the above code to

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bcfe320..08317dc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7557,8 +7557,13 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
                 * work being done for other cpus. Next load
                 * balancing owner will pick it up.
                 */
-               if (need_resched())
-                       break;
+               if (need_resched()) {
+                       /* preparing to bail, kicking other cpu to continue */
+                       clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
+                       if (nohz_kick_needed(this_rq))
+                               nohz_balance_kick();
+                       return;
+               }
 
                rq = cpu_rq(balance_cpu);

Thanks.

Tim


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-02  5:59         ` Jason Low
                             ` (2 preceding siblings ...)
  2015-04-03 22:35           ` Tim Chen
@ 2015-04-04  9:59           ` Preeti U Murthy
  2015-04-07 23:28             ` Jason Low
  3 siblings, 1 reply; 30+ messages in thread
From: Preeti U Murthy @ 2015-04-04  9:59 UTC (permalink / raw)
  To: Jason Low, Morten Rasmussen
  Cc: peterz, mingo, Daniel Lezcano, riel, vincent.guittot, srikar,
	pjt, benh, efault, linux-kernel, iamjoonsoo.kim, svaidy,
	tim.c.chen

On 04/02/2015 11:29 AM, Jason Low wrote:
> On Wed, 2015-04-01 at 18:04 +0100, Morten Rasmussen wrote:
>> On Wed, Apr 01, 2015 at 07:49:56AM +0100, Preeti U Murthy wrote:
> 
>>> I am sorry I don't quite get this. Can you please elaborate?
>>
>> I think the scenario is that we are in nohz_idle_balance() and decide to
>> bail out because we have pulled some tasks, but before leaving
>> nohz_idle_balance() we want to check if more balancing is necessary
>> using nohz_kick_needed() and potentially kick somebody to continue.
> 
> Also, below is an example patch.
> 
> (Without the conversion to idle_cpu(), the check for rq->idle_balance
> would not be accurate anymore)
> 
> ---
>  kernel/sched/fair.c |   17 ++++++++++-------
>  1 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fdae26e..7749a14 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7620,6 +7620,8 @@ out:
>  }
> 
>  #ifdef CONFIG_NO_HZ_COMMON
> +static inline bool nohz_kick_needed(struct rq *rq);
> +
>  /*
>   * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
>   * rebalancing for all the cpus for whom scheduler ticks are stopped.
> @@ -7629,6 +7631,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  	int this_cpu = this_rq->cpu;
>  	struct rq *rq;
>  	int balance_cpu;
> +	bool done_balancing = false;
> 
>  	if (idle != CPU_IDLE ||
>  	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
> @@ -7644,7 +7647,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  		 * balancing owner will pick it up.
>  		 */
>  		if (need_resched())
> -			break;
> +			goto end;
> 
>  		rq = cpu_rq(balance_cpu);
> 
> @@ -7663,9 +7666,12 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  		if (time_after(this_rq->next_balance, rq->next_balance))
>  			this_rq->next_balance = rq->next_balance;
>  	}
> +	done_balancing = true;
>  	nohz.next_balance = this_rq->next_balance;
>  end:
>  	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> +	if (!done_balancing && nohz_kick_needed(this_rq))
> +		nohz_balancer_kick();
>  }
> 
>  /*
> @@ -7687,7 +7693,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
>  	int nr_busy, cpu = rq->cpu;
>  	bool kick = false;
> 
> -	if (unlikely(rq->idle_balance))
> +	if (unlikely(idle_cpu(cpu)))
>  		return false;
> 
>         /*
> @@ -7757,16 +7763,13 @@ static void run_rebalance_domains(struct softirq_action *h)
>  	enum cpu_idle_type idle = this_rq->idle_balance ?
>  						CPU_IDLE : CPU_NOT_IDLE;
> 
> +	rebalance_domains(this_rq, idle);
>  	/*
>  	 * If this cpu has a pending nohz_balance_kick, then do the
>  	 * balancing on behalf of the other idle cpus whose ticks are
> -	 * stopped. Do nohz_idle_balance *before* rebalance_domains to
> -	 * give the idle cpus a chance to load balance. Else we may
> -	 * load balance only within the local sched_domain hierarchy
> -	 * and abort nohz_idle_balance altogether if we pull some load.
> +	 * stopped.
>  	 */
>  	nohz_idle_balance(this_rq, idle);
> -	rebalance_domains(this_rq, idle);
>  }
> 
>  /*
> 

Solution 1: As exists in the mainline
Solution 2: nohz_idle_balance(); rebalance_domains() on the ILB CPU
Solution 3: Above patch.

I observe that Solution 3 is not as aggressive in spreading load as
Solution 2. With Solution 2, the load gets spread within the first 3-4
seconds, while with Solution3, the load gets spread within the first 6-7
seconds. I think this is because, the above patch decides to further
nohz_idle_load_balance() based on the load on the current ILB CPU which
has most likely pulled just one task. This will abort further load
balancing. However, Solution 3 is certainly better at spreading load
than Solution 1.

Wrt IPIs, I see that Solution 2 results in increase in the number of
IPIs by around 2% over Solution 3, probably for the same reason that
Morten pointed out.

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-03 22:35           ` Tim Chen
@ 2015-04-07 17:42             ` Jason Low
  2015-04-07 19:39               ` Tim Chen
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Low @ 2015-04-07 17:42 UTC (permalink / raw)
  To: Tim Chen
  Cc: Morten Rasmussen, Preeti U Murthy, peterz, mingo, Daniel Lezcano,
	riel, vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, jason.low2

On Fri, 2015-04-03 at 15:35 -0700, Tim Chen wrote:
> I think we can get rid of the done_balancing boolean 
> and make it a bit easier to read if we change the above code to
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bcfe320..08317dc 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7557,8 +7557,13 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>                  * work being done for other cpus. Next load
>                  * balancing owner will pick it up.
>                  */
> -               if (need_resched())
> -                       break;
> +               if (need_resched()) {
> +                       /* preparing to bail, kicking other cpu to continue */
> +                       clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> +                       if (nohz_kick_needed(this_rq))
> +                               nohz_balance_kick();
> +                       return;
> +               }

Hi Tim,

We would also need the nohz_kick_needed/nohz_balance_kick if we
initially find that the current CPU is not idle (at the beginning of
nohz_idle_balance). In the above case, we would need to add the code to
2 locations.

Would it be better to still keep the done_balancing to avoid having
duplicate code?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-07 17:42             ` Jason Low
@ 2015-04-07 19:39               ` Tim Chen
  2015-04-07 20:24                 ` Jason Low
  0 siblings, 1 reply; 30+ messages in thread
From: Tim Chen @ 2015-04-07 19:39 UTC (permalink / raw)
  To: Jason Low
  Cc: Morten Rasmussen, Preeti U Murthy, peterz, mingo, Daniel Lezcano,
	riel, vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy

On Tue, 2015-04-07 at 10:42 -0700, Jason Low wrote:
> On Fri, 2015-04-03 at 15:35 -0700, Tim Chen wrote:
> > I think we can get rid of the done_balancing boolean 
> > and make it a bit easier to read if we change the above code to
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index bcfe320..08317dc 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7557,8 +7557,13 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> >                  * work being done for other cpus. Next load
> >                  * balancing owner will pick it up.
> >                  */
> > -               if (need_resched())
> > -                       break;
> > +               if (need_resched()) {
> > +                       /* preparing to bail, kicking other cpu to continue */
> > +                       clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> > +                       if (nohz_kick_needed(this_rq))
> > +                               nohz_balance_kick();
> > +                       return;
> > +               }
> 
> Hi Tim,
> 
> We would also need the nohz_kick_needed/nohz_balance_kick if we
> initially find that the current CPU is not idle (at the beginning of
> nohz_idle_balance). In the above case, we would need to add the code to
> 2 locations.
> 
> Would it be better to still keep the done_balancing to avoid having
> duplicate code?
> 

How about consolidating the code for passing the
nohz balancing and call it at both places.  
Something like below.  Make the code more readable.

Tim

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40667cb..16f6904 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7531,6 +7531,15 @@ out:
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
+static inline int nohz_kick_needed(struct rq *rq);
+
+static void inline pass_nohz_balance(struct rq *this_rq, int this_cpu)
+{
+       clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
+       if (nohz_kick_needed(this_rq))
+               nohz_balancer_kick();
+}
+
 /*
  * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
  * rebalancing for all the cpus for whom scheduler ticks are stopped.
@@ -7542,8 +7551,10 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
        int balance_cpu;
 
        if (idle != CPU_IDLE ||
-           !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
-               goto end;
+           !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu))) {
+               pass_nohz_balance(this_rq, this_cpu);
+               return;
+       }
 
        for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
                if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
@@ -7554,8 +7565,10 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
                 * work being done for other cpus. Next load
                 * balancing owner will pick it up.
                 */
-               if (need_resched())
-                       break;
+               if (need_resched()) {
+                       pass_nohz_balance(this_rq, this_cpu);
+                       return;
+               }
 
                rq = cpu_rq(balance_cpu);
 
@@ -7575,7 +7588,6 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
                        this_rq->next_balance = rq->next_balance;
        }
        nohz.next_balance = this_rq->next_balance;
-end:
        clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
 }
 



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-07 19:39               ` Tim Chen
@ 2015-04-07 20:24                 ` Jason Low
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Low @ 2015-04-07 20:24 UTC (permalink / raw)
  To: Tim Chen
  Cc: Morten Rasmussen, Preeti U Murthy, peterz, mingo, Daniel Lezcano,
	riel, vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, jason.low2

On Tue, 2015-04-07 at 12:39 -0700, Tim Chen wrote:

> How about consolidating the code for passing the
> nohz balancing and call it at both places.  
> Something like below.  Make the code more readable.
> 
> Tim
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 40667cb..16f6904 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7531,6 +7531,15 @@ out:
>  }
>  
>  #ifdef CONFIG_NO_HZ_COMMON
> +static inline int nohz_kick_needed(struct rq *rq);
> +
> +static void inline pass_nohz_balance(struct rq *this_rq, int this_cpu)
> +{
> +       clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> +       if (nohz_kick_needed(this_rq))
> +               nohz_balancer_kick();
> +}
> +
>  /*
>   * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
>   * rebalancing for all the cpus for whom scheduler ticks are stopped.
> @@ -7542,8 +7551,10 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>         int balance_cpu;
>  
>         if (idle != CPU_IDLE ||
> -           !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
> -               goto end;
> +           !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu))) {
> +               pass_nohz_balance(this_rq, this_cpu);
> +               return;
> +       }

Sure, this can make it more readable. This also avoids the need for the
goto, in addition to removing the done_balancing boolean.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-04  9:59           ` Preeti U Murthy
@ 2015-04-07 23:28             ` Jason Low
  2015-04-08  0:07               ` Jason Low
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Low @ 2015-04-07 23:28 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Morten Rasmussen, peterz, mingo, Daniel Lezcano, riel,
	vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, jason.low2

On Sat, 2015-04-04 at 15:29 +0530, Preeti U Murthy wrote:

> Solution 1: As exists in the mainline
> Solution 2: nohz_idle_balance(); rebalance_domains() on the ILB CPU
> Solution 3: Above patch.
> 
> I observe that Solution 3 is not as aggressive in spreading load as
> Solution 2. With Solution 2, the load gets spread within the first 3-4
> seconds,

hmm, so 3-4 seconds still sounds like a long time.

> while with Solution3, the load gets spread within the first 6-7
> seconds. I think this is because, the above patch decides to further
> nohz_idle_load_balance() based on the load on the current ILB CPU which
> has most likely pulled just one task.

Okay, so perhaps we can also try continuing nohz load balancing if we
find that there are overloaded CPUs in the system.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-07 23:28             ` Jason Low
@ 2015-04-08  0:07               ` Jason Low
  2015-04-08 11:12                 ` Srikar Dronamraju
  2015-04-13  6:16                 ` Preeti U Murthy
  0 siblings, 2 replies; 30+ messages in thread
From: Jason Low @ 2015-04-08  0:07 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Morten Rasmussen, peterz, mingo, Daniel Lezcano, riel,
	vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, jason.low2

On Tue, 2015-04-07 at 16:28 -0700, Jason Low wrote:

> Okay, so perhaps we can also try continuing nohz load balancing if we
> find that there are overloaded CPUs in the system.

Something like the following.

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fdae26e..d636bf7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7620,6 +7620,16 @@ out:
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
+static inline bool nohz_kick_needed(struct rq *rq);
+
+static inline void pass_nohz_balance(struct rq *this_rq, int this_cpu)
+{
+	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
+	nohz.next_balance = jiffies;
+	if (nohz_kick_needed(this_rq))
+		nohz_balancer_kick();
+}
+
 /*
  * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
  * rebalancing for all the cpus for whom scheduler ticks are stopped.
@@ -7631,8 +7641,10 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 	int balance_cpu;
 
 	if (idle != CPU_IDLE ||
-	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
-		goto end;
+	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu))) {
+		pass_nohz_balance(this_rq, this_cpu);
+		return;
+	}
 
 	for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
 		if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
@@ -7643,8 +7655,10 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 		 * work being done for other cpus. Next load
 		 * balancing owner will pick it up.
 		 */
-		if (need_resched())
-			break;
+		if (need_resched()) {
+			pass_nohz_balance(this_rq, this_cpu);
+			return;
+		}
 
 		rq = cpu_rq(balance_cpu);
 
@@ -7664,7 +7678,6 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 			this_rq->next_balance = rq->next_balance;
 	}
 	nohz.next_balance = this_rq->next_balance;
-end:
 	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
 }
 
@@ -7687,7 +7700,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	int nr_busy, cpu = rq->cpu;
 	bool kick = false;
 
-	if (unlikely(rq->idle_balance))
+	if (unlikely(idle_cpu(cpu)))
 		return false;
 
        /*
@@ -7707,7 +7720,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	if (time_before(now, nohz.next_balance))
 		return false;
 
-	if (rq->nr_running >= 2)
+	if (rq->nr_running >= 2 || rq->rd->overload)
 		return true;
 
 	rcu_read_lock();
@@ -7757,16 +7770,14 @@ static void run_rebalance_domains(struct softirq_action *h)
 	enum cpu_idle_type idle = this_rq->idle_balance ?
 						CPU_IDLE : CPU_NOT_IDLE;
 
+	rebalance_domains(this_rq, idle);
+
 	/*
 	 * If this cpu has a pending nohz_balance_kick, then do the
 	 * balancing on behalf of the other idle cpus whose ticks are
-	 * stopped. Do nohz_idle_balance *before* rebalance_domains to
-	 * give the idle cpus a chance to load balance. Else we may
-	 * load balance only within the local sched_domain hierarchy
-	 * and abort nohz_idle_balance altogether if we pull some load.
+	 * stopped.
 	 */
 	nohz_idle_balance(this_rq, idle);
-	rebalance_domains(this_rq, idle);
 }
 
 /*



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-08  0:07               ` Jason Low
@ 2015-04-08 11:12                 ` Srikar Dronamraju
  2015-04-08 21:22                   ` Jason Low
  2015-04-09  2:39                   ` Jason Low
  2015-04-13  6:16                 ` Preeti U Murthy
  1 sibling, 2 replies; 30+ messages in thread
From: Srikar Dronamraju @ 2015-04-08 11:12 UTC (permalink / raw)
  To: Jason Low
  Cc: Preeti U Murthy, Morten Rasmussen, peterz, mingo, Daniel Lezcano,
	riel, vincent.guittot, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen

* Jason Low <jason.low2@hp.com> [2015-04-07 17:07:46]:

> On Tue, 2015-04-07 at 16:28 -0700, Jason Low wrote:
> 
> > Okay, so perhaps we can also try continuing nohz load balancing if we
> > find that there are overloaded CPUs in the system.
> 
> Something like the following.
> 
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fdae26e..d636bf7 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7620,6 +7620,16 @@ out:
>  }
> 
>  #ifdef CONFIG_NO_HZ_COMMON
> +static inline bool nohz_kick_needed(struct rq *rq);
> +
> +static inline void pass_nohz_balance(struct rq *this_rq, int this_cpu)
> +{
> +	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> +	nohz.next_balance = jiffies;

Why are we updating nohz.next_balance here?

> +	if (nohz_kick_needed(this_rq))
> +		nohz_balancer_kick();
> +}
> +
>  /*
>   * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
>   * rebalancing for all the cpus for whom scheduler ticks are stopped.
> @@ -7631,8 +7641,10 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  	int balance_cpu;
> 
>  	if (idle != CPU_IDLE ||

Would it make sense to add need_resched here like
http://mid.gmane.org/1427442750-8112-1-git-send-email-wanpeng.li@linux.intel.com

> -	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
> -		goto end;
> +	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu))) {
> +		pass_nohz_balance(this_rq, this_cpu);
> +		return;
> +	}
> 
>  	for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
>  		if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))

<snipped > 

> @@ -7687,7 +7700,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
>  	int nr_busy, cpu = rq->cpu;
>  	bool kick = false;
> 
> -	if (unlikely(rq->idle_balance))
> +	if (unlikely(idle_cpu(cpu)))
>  		return false;


The only other place that we use idle_balance is
run_rebalance_domains(). Would it make sense to just use idle_cpu() in
run_rebalance_domains() and remove rq->idle_balance?

> 
>         /*
> @@ -7707,7 +7720,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
>  	if (time_before(now, nohz.next_balance))
>  		return false;
> 
> -	if (rq->nr_running >= 2)
> +	if (rq->nr_running >= 2 || rq->rd->overload)
>  		return true;
> 
>  	rcu_read_lock();
> @@ -7757,16 +7770,14 @@ static void run_rebalance_domains(struct softirq_action *h)
>  	enum cpu_idle_type idle = this_rq->idle_balance ?
>  						CPU_IDLE : CPU_NOT_IDLE;
> 
> +	rebalance_domains(this_rq, idle);
> +
>  	/*
>  	 * If this cpu has a pending nohz_balance_kick, then do the
>  	 * balancing on behalf of the other idle cpus whose ticks are
> -	 * stopped. Do nohz_idle_balance *before* rebalance_domains to
> -	 * give the idle cpus a chance to load balance. Else we may
> -	 * load balance only within the local sched_domain hierarchy
> -	 * and abort nohz_idle_balance altogether if we pull some load.
> +	 * stopped.
>  	 */
>  	nohz_idle_balance(this_rq, idle);
> -	rebalance_domains(this_rq, idle);
>  }
> 
>  /*
> 
> 

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-08 11:12                 ` Srikar Dronamraju
@ 2015-04-08 21:22                   ` Jason Low
  2015-04-10  8:37                     ` Srikar Dronamraju
  2015-04-09  2:39                   ` Jason Low
  1 sibling, 1 reply; 30+ messages in thread
From: Jason Low @ 2015-04-08 21:22 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Preeti U Murthy, Morten Rasmussen, peterz, mingo, Daniel Lezcano,
	riel, vincent.guittot, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, jason.low2

On Wed, 2015-04-08 at 16:42 +0530, Srikar Dronamraju wrote:
> * Jason Low <jason.low2@hp.com> [2015-04-07 17:07:46]:
> 
> > On Tue, 2015-04-07 at 16:28 -0700, Jason Low wrote:
> > 
> > > Okay, so perhaps we can also try continuing nohz load balancing if we
> > > find that there are overloaded CPUs in the system.
> > 
> > Something like the following.
> > 
> > ---
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index fdae26e..d636bf7 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7620,6 +7620,16 @@ out:
> >  }
> > 
> >  #ifdef CONFIG_NO_HZ_COMMON
> > +static inline bool nohz_kick_needed(struct rq *rq);
> > +
> > +static inline void pass_nohz_balance(struct rq *this_rq, int this_cpu)
> > +{
> > +	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> > +	nohz.next_balance = jiffies;
> 
> Why are we updating nohz.next_balance here?

This was just to make sure that since we're continuing the balancing on
another CPU that the nohz next_balance is guaranteed to be "now".

> > +	if (nohz_kick_needed(this_rq))
> > +		nohz_balancer_kick();
> > +}
> > +
> >  /*
> >   * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
> >   * rebalancing for all the cpus for whom scheduler ticks are stopped.
> > @@ -7631,8 +7641,10 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> >  	int balance_cpu;
> > 
> >  	if (idle != CPU_IDLE ||
> 
> Would it make sense to add need_resched here like
> http://mid.gmane.org/1427442750-8112-1-git-send-email-wanpeng.li@linux.intel.com

Yeah, we could have incorporated adding the need_resched there too for
testing purposes.

Though that probably wouldn't make too much of a difference in
performance with this patch, since this also modified the need_resched()
check in the loop + nohz.next_balance. So I think it would still be fine
to test this without the added need_resched().


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-08 11:12                 ` Srikar Dronamraju
  2015-04-08 21:22                   ` Jason Low
@ 2015-04-09  2:39                   ` Jason Low
  2015-04-09  7:02                     ` Srikar Dronamraju
  1 sibling, 1 reply; 30+ messages in thread
From: Jason Low @ 2015-04-09  2:39 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Preeti U Murthy, Morten Rasmussen, peterz, mingo, Daniel Lezcano,
	riel, vincent.guittot, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, jason.low2

On Wed, 2015-04-08 at 16:42 +0530, Srikar Dronamraju wrote:
> * Jason Low <jason.low2@hp.com> [2015-04-07 17:07:46]:
> > @@ -7687,7 +7700,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
> >  	int nr_busy, cpu = rq->cpu;
> >  	bool kick = false;
> > 
> > -	if (unlikely(rq->idle_balance))
> > +	if (unlikely(idle_cpu(cpu)))
> >  		return false;
> 
> 
> The only other place that we use idle_balance is
> run_rebalance_domains(). Would it make sense to just use idle_cpu() in
> run_rebalance_domains() and remove rq->idle_balance?

Hi Srikar,

So the idle_balance is used for storing the idle state of the CPU before
calling the softirq, for load balancing decisions. In that case, we may
need to keep this extra variable in order to store that information.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-09  2:39                   ` Jason Low
@ 2015-04-09  7:02                     ` Srikar Dronamraju
  2015-04-09 22:49                       ` Jason Low
  0 siblings, 1 reply; 30+ messages in thread
From: Srikar Dronamraju @ 2015-04-09  7:02 UTC (permalink / raw)
  To: Jason Low
  Cc: Preeti U Murthy, Morten Rasmussen, peterz, mingo, Daniel Lezcano,
	riel, vincent.guittot, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen

* Jason Low <jason.low2@hp.com> [2015-04-08 19:39:15]:

> On Wed, 2015-04-08 at 16:42 +0530, Srikar Dronamraju wrote:
> > * Jason Low <jason.low2@hp.com> [2015-04-07 17:07:46]:
> > > @@ -7687,7 +7700,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
> > >  	int nr_busy, cpu = rq->cpu;
> > >  	bool kick = false;
> > > 
> > > -	if (unlikely(rq->idle_balance))
> > > +	if (unlikely(idle_cpu(cpu)))
> > >  		return false;
> > 
> > 
> > The only other place that we use idle_balance is
> > run_rebalance_domains(). Would it make sense to just use idle_cpu() in
> > run_rebalance_domains() and remove rq->idle_balance?
> 
> Hi Srikar,
> 
> So the idle_balance is used for storing the idle state of the CPU before
> calling the softirq, for load balancing decisions. In that case, we may
> need to keep this extra variable in order to store that information.
> 


I am not sure if you got what I wanted to convey.

rq->idle_balance gets updated at every scheduler_tick() but the only user of
rq->idle_balance (after your change) seems to be run_rebalance_domains().
Now can we remove rq->idle_balance. This would mean we would have to
call idle_cpu() instead of using rq->idle_balance in
run_rebalance_domains(). (similar to what your above change)

That way we can reduce the rq struct size and we might end up calling
idle_cpu() lesser number of times.

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-09  7:02                     ` Srikar Dronamraju
@ 2015-04-09 22:49                       ` Jason Low
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Low @ 2015-04-09 22:49 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Preeti U Murthy, Morten Rasmussen, peterz, mingo, Daniel Lezcano,
	riel, vincent.guittot, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, jason.low2

On Thu, 2015-04-09 at 12:32 +0530, Srikar Dronamraju wrote:

> rq->idle_balance gets updated at every scheduler_tick() but the only user of
> rq->idle_balance (after your change) seems to be run_rebalance_domains().
> Now can we remove rq->idle_balance. This would mean we would have to
> call idle_cpu() instead of using rq->idle_balance in
> run_rebalance_domains(). (similar to what your above change)
> 
> That way we can reduce the rq struct size and we might end up calling
> idle_cpu() lesser number of times.

Yeah, we may also include another patch for that.

Taking a look at rebalance_domains(), we're already updating the "idle"
value using idle_cpu() after attempting load balancing anyway, so there
may not be much point in the extra parameter.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-08 21:22                   ` Jason Low
@ 2015-04-10  8:37                     ` Srikar Dronamraju
  2015-04-13 18:55                       ` Jason Low
  2015-04-13 20:54                       ` Jason Low
  0 siblings, 2 replies; 30+ messages in thread
From: Srikar Dronamraju @ 2015-04-10  8:37 UTC (permalink / raw)
  To: Jason Low
  Cc: Preeti U Murthy, Morten Rasmussen, peterz, mingo, Daniel Lezcano,
	riel, vincent.guittot, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen

> > >
> > >  #ifdef CONFIG_NO_HZ_COMMON
> > > +static inline bool nohz_kick_needed(struct rq *rq);
> > > +
> > > +static inline void pass_nohz_balance(struct rq *this_rq, int this_cpu)
> > > +{
> > > +	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> > > +	nohz.next_balance = jiffies;
> >
> > Why are we updating nohz.next_balance here?
>
> This was just to make sure that since we're continuing the balancing on
> another CPU that the nohz next_balance is guaranteed to be "now".
>

Since we are in nohz_idle_balance(), nohz.next_balance is guaranteed be
less than now. We do check for time_before(now, nohz.next_balance) in
nohz_kick_needed(). So in effect we are incrementing the nohz.next_balance.

While updating nohz.next_balance may not cause any issues, it atleast
look redundant to me.

At this point, I also wanted to understand why we do
"nohz.next_balance++" nohz_balancer_kick()?


> > > +	if (nohz_kick_needed(this_rq))
> > > +		nohz_balancer_kick();
> > > +}
> > > +
> > >  /*

--
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-08  0:07               ` Jason Low
  2015-04-08 11:12                 ` Srikar Dronamraju
@ 2015-04-13  6:16                 ` Preeti U Murthy
  2015-04-13 22:49                   ` Jason Low
  1 sibling, 1 reply; 30+ messages in thread
From: Preeti U Murthy @ 2015-04-13  6:16 UTC (permalink / raw)
  To: Jason Low
  Cc: Morten Rasmussen, peterz, mingo, Daniel Lezcano, riel,
	vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen

Hi Jason,

On 04/08/2015 05:37 AM, Jason Low wrote:
> On Tue, 2015-04-07 at 16:28 -0700, Jason Low wrote:
> 
>> Okay, so perhaps we can also try continuing nohz load balancing if we
>> find that there are overloaded CPUs in the system.

Sorry about the delay. Ok I will test out the below patch and share the
results.

Regards
Preeti U Murthy
> 
> Something like the following.
> 
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fdae26e..d636bf7 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7620,6 +7620,16 @@ out:
>  }
> 
>  #ifdef CONFIG_NO_HZ_COMMON
> +static inline bool nohz_kick_needed(struct rq *rq);
> +
> +static inline void pass_nohz_balance(struct rq *this_rq, int this_cpu)
> +{
> +	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> +	nohz.next_balance = jiffies;
> +	if (nohz_kick_needed(this_rq))
> +		nohz_balancer_kick();
> +}
> +
>  /*
>   * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
>   * rebalancing for all the cpus for whom scheduler ticks are stopped.
> @@ -7631,8 +7641,10 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  	int balance_cpu;
> 
>  	if (idle != CPU_IDLE ||
> -	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
> -		goto end;
> +	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu))) {
> +		pass_nohz_balance(this_rq, this_cpu);
> +		return;
> +	}
> 
>  	for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
>  		if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
> @@ -7643,8 +7655,10 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  		 * work being done for other cpus. Next load
>  		 * balancing owner will pick it up.
>  		 */
> -		if (need_resched())
> -			break;
> +		if (need_resched()) {
> +			pass_nohz_balance(this_rq, this_cpu);
> +			return;
> +		}
> 
>  		rq = cpu_rq(balance_cpu);
> 
> @@ -7664,7 +7678,6 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>  			this_rq->next_balance = rq->next_balance;
>  	}
>  	nohz.next_balance = this_rq->next_balance;
> -end:
>  	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
>  }
> 
> @@ -7687,7 +7700,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
>  	int nr_busy, cpu = rq->cpu;
>  	bool kick = false;
> 
> -	if (unlikely(rq->idle_balance))
> +	if (unlikely(idle_cpu(cpu)))
>  		return false;
> 
>         /*
> @@ -7707,7 +7720,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
>  	if (time_before(now, nohz.next_balance))
>  		return false;
> 
> -	if (rq->nr_running >= 2)
> +	if (rq->nr_running >= 2 || rq->rd->overload)
>  		return true;
> 
>  	rcu_read_lock();
> @@ -7757,16 +7770,14 @@ static void run_rebalance_domains(struct softirq_action *h)
>  	enum cpu_idle_type idle = this_rq->idle_balance ?
>  						CPU_IDLE : CPU_NOT_IDLE;
> 
> +	rebalance_domains(this_rq, idle);
> +
>  	/*
>  	 * If this cpu has a pending nohz_balance_kick, then do the
>  	 * balancing on behalf of the other idle cpus whose ticks are
> -	 * stopped. Do nohz_idle_balance *before* rebalance_domains to
> -	 * give the idle cpus a chance to load balance. Else we may
> -	 * load balance only within the local sched_domain hierarchy
> -	 * and abort nohz_idle_balance altogether if we pull some load.
> +	 * stopped.
>  	 */
>  	nohz_idle_balance(this_rq, idle);
> -	rebalance_domains(this_rq, idle);
>  }
> 
>  /*
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-10  8:37                     ` Srikar Dronamraju
@ 2015-04-13 18:55                       ` Jason Low
  2015-04-13 20:54                       ` Jason Low
  1 sibling, 0 replies; 30+ messages in thread
From: Jason Low @ 2015-04-13 18:55 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Preeti U Murthy, Morten Rasmussen, peterz, mingo, Daniel Lezcano,
	riel, vincent.guittot, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, jason.low2

On Fri, 2015-04-10 at 14:07 +0530, Srikar Dronamraju wrote:
> > > >
> > > >  #ifdef CONFIG_NO_HZ_COMMON
> > > > +static inline bool nohz_kick_needed(struct rq *rq);
> > > > +
> > > > +static inline void pass_nohz_balance(struct rq *this_rq, int this_cpu)
> > > > +{
> > > > +	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> > > > +	nohz.next_balance = jiffies;
> > >
> > > Why are we updating nohz.next_balance here?
> >
> > This was just to make sure that since we're continuing the balancing on
> > another CPU that the nohz next_balance is guaranteed to be "now".
> >
> 
> Since we are in nohz_idle_balance(), nohz.next_balance is guaranteed be
> less than now. We do check for time_before(now, nohz.next_balance) in
> nohz_kick_needed(). So in effect we are incrementing the nohz.next_balance.

Hi Srikar,

If now is equal to nohz.next_balance, we may attempt
nohz_balancer_kick().

After it does nohz.next_balance++ in nohz_balancer_kick(), now can be 1
less than the new nohz.next_balance value by the time
nohz_idle_balance() is attempted(). Without updating nohz.next_balance,
the time_before(now, nohz.next_balance) check in nohz_kick_needed() may
cause it to return false.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-10  8:37                     ` Srikar Dronamraju
  2015-04-13 18:55                       ` Jason Low
@ 2015-04-13 20:54                       ` Jason Low
  1 sibling, 0 replies; 30+ messages in thread
From: Jason Low @ 2015-04-13 20:54 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Preeti U Murthy, Morten Rasmussen, peterz, mingo, Daniel Lezcano,
	riel, vincent.guittot, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, jason.low2

On Fri, 2015-04-10 at 14:07 +0530, Srikar Dronamraju wrote:

> At this point, I also wanted to understand why we do
> "nohz.next_balance++" nohz_balancer_kick()?

So this looks like something that was added to avoid
nohz_balancer_kick() getting called too frequently. Otherwise, it may
get called in each trigger_load_balance(), even when another CPU has
already been kicked to do balancing.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-13  6:16                 ` Preeti U Murthy
@ 2015-04-13 22:49                   ` Jason Low
  2015-04-14  2:59                     ` Jason Low
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Low @ 2015-04-13 22:49 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Morten Rasmussen, peterz, mingo, Daniel Lezcano, riel,
	vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, jason.low2


> > ---
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index fdae26e..d636bf7 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7620,6 +7620,16 @@ out:
> >  }
> > 
> >  #ifdef CONFIG_NO_HZ_COMMON
> > +static inline bool nohz_kick_needed(struct rq *rq);
> > +
> > +static inline void pass_nohz_balance(struct rq *this_rq, int this_cpu)
> > +{
> > +	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
> > +	nohz.next_balance = jiffies;
> > +	if (nohz_kick_needed(this_rq))
> > +		nohz_balancer_kick();
> > +}
> > +
> >  /*
> >   * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
> >   * rebalancing for all the cpus for whom scheduler ticks are stopped.
> > @@ -7631,8 +7641,10 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> >  	int balance_cpu;
> > 
> >  	if (idle != CPU_IDLE ||
> > -	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
> > -		goto end;
> > +	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu))) {
> > +		pass_nohz_balance(this_rq, this_cpu);
> > +		return;
> > +	}

hmm, so taking a look at the patch again, it looks like we pass nohz
balance even when the NOHZ_BALANCE_KICK is not set on the current CPU.
We should separate the 2 conditions:

    if (!test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
        return;

    if (idle != CPU_IDLE) {
        /* another CPU continue balancing */
        pass_nohz_balance(this_rq, this_cpu);
        return;
    }

In general, separating the check also optimizes nohz_idle_balance() to
avoid clearing the bit when it is not set.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: sched: Improve load balancing in the presence of idle CPUs
  2015-04-13 22:49                   ` Jason Low
@ 2015-04-14  2:59                     ` Jason Low
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Low @ 2015-04-14  2:59 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Morten Rasmussen, peterz, mingo, Daniel Lezcano, riel,
	vincent.guittot, srikar, pjt, benh, efault, linux-kernel,
	iamjoonsoo.kim, svaidy, tim.c.chen, jason.low2

On Mon, 2015-04-13 at 15:49 -0700, Jason Low wrote:

> hmm, so taking a look at the patch again, it looks like we pass nohz
> balance even when the NOHZ_BALANCE_KICK is not set on the current CPU.
> We should separate the 2 conditions:
> 
>     if (!test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
>         return;
> 
>     if (idle != CPU_IDLE) {
>         /* another CPU continue balancing */
>         pass_nohz_balance(this_rq, this_cpu);
>         return;
>     }

Here's the example patch with the above update.

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ffeaa41..9aa48f7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7622,6 +7622,16 @@ out:
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
+static inline bool nohz_kick_needed(struct rq *rq);
+
+static inline void pass_nohz_balance(struct rq *this_rq, int this_cpu)
+{
+	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
+	nohz.next_balance = jiffies;
+	if (nohz_kick_needed(this_rq))
+		nohz_balancer_kick();
+}
+
 /*
  * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
  * rebalancing for all the cpus for whom scheduler ticks are stopped.
@@ -7632,9 +7642,13 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 	struct rq *rq;
 	int balance_cpu;
 
-	if (idle != CPU_IDLE ||
-	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
-		goto end;
+	if (!test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
+		return;
+
+	if (idle != CPU_IDLE) {
+		pass_nohz_balance(this_rq, this_cpu);
+		return;
+	}
 
 	for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
 		if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
@@ -7645,8 +7659,10 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 		 * work being done for other cpus. Next load
 		 * balancing owner will pick it up.
 		 */
-		if (need_resched())
-			break;
+		if (need_resched()) {
+			pass_nohz_balance(this_rq, this_cpu);
+			return;
+		}
 
 		rq = cpu_rq(balance_cpu);
 
@@ -7666,7 +7682,6 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 			this_rq->next_balance = rq->next_balance;
 	}
 	nohz.next_balance = this_rq->next_balance;
-end:
 	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
 }
 
@@ -7689,7 +7704,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	int nr_busy, cpu = rq->cpu;
 	bool kick = false;
 
-	if (unlikely(rq->idle_balance))
+	if (unlikely(idle_cpu(cpu)))
 		return false;
 
        /*
@@ -7709,7 +7724,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	if (time_before(now, nohz.next_balance))
 		return false;
 
-	if (rq->nr_running >= 2)
+	if (rq->nr_running >= 2 || rq->rd->overload)
 		return true;
 
 	rcu_read_lock();
@@ -7759,16 +7774,14 @@ static void run_rebalance_domains(struct softirq_action *h)
 	enum cpu_idle_type idle = this_rq->idle_balance ?
 						CPU_IDLE : CPU_NOT_IDLE;
 
+	rebalance_domains(this_rq, idle);
+
 	/*
 	 * If this cpu has a pending nohz_balance_kick, then do the
 	 * balancing on behalf of the other idle cpus whose ticks are
-	 * stopped. Do nohz_idle_balance *before* rebalance_domains to
-	 * give the idle cpus a chance to load balance. Else we may
-	 * load balance only within the local sched_domain hierarchy
-	 * and abort nohz_idle_balance altogether if we pull some load.
+	 * stopped.
 	 */
 	nohz_idle_balance(this_rq, idle);
-	rebalance_domains(this_rq, idle);
 }
 
 /*





^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-04-14  3:00 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-30 18:55 sched: Improve load balancing in the presence of idle CPUs Jason Low
2015-03-31  8:37 ` Preeti U Murthy
2015-03-31 18:54   ` Jason Low
2015-04-01  6:49     ` Preeti U Murthy
2015-04-01 17:04       ` Morten Rasmussen
2015-04-02  3:30         ` Jason Low
2015-04-02  8:49           ` Morten Rasmussen
2015-04-02  5:59         ` Jason Low
2015-04-02  8:42           ` Preeti U Murthy
2015-04-02  9:17           ` Morten Rasmussen
2015-04-02 17:22             ` Jason Low
2015-04-03 22:35           ` Tim Chen
2015-04-07 17:42             ` Jason Low
2015-04-07 19:39               ` Tim Chen
2015-04-07 20:24                 ` Jason Low
2015-04-04  9:59           ` Preeti U Murthy
2015-04-07 23:28             ` Jason Low
2015-04-08  0:07               ` Jason Low
2015-04-08 11:12                 ` Srikar Dronamraju
2015-04-08 21:22                   ` Jason Low
2015-04-10  8:37                     ` Srikar Dronamraju
2015-04-13 18:55                       ` Jason Low
2015-04-13 20:54                       ` Jason Low
2015-04-09  2:39                   ` Jason Low
2015-04-09  7:02                     ` Srikar Dronamraju
2015-04-09 22:49                       ` Jason Low
2015-04-13  6:16                 ` Preeti U Murthy
2015-04-13 22:49                   ` Jason Low
2015-04-14  2:59                     ` Jason Low
2015-04-02  2:11   ` Jason Low

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).