linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched: Call update_group_power only for local_group
@ 2010-07-01 23:12 Venkatesh Pallipadi
  2010-07-02  8:05 ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Venkatesh Pallipadi @ 2010-07-01 23:12 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Gautham R Shenoy, Joel Schopp, Venkatesh Pallipadi

commit 871e35b moved update_group_power() call in update_sg_lb_stats(),
resulting in it being called for each group, even though it only updates
the power of local group. As a result we have frequent redundant
update_group_power() calls.

Move it back under "if (local_group)" condition.

This reduces the number of calls to update_group_power by a factor of 4
on my 4 core in 4 NUMA nodes test system.

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
---
 kernel/sched_fair.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index a878b53..369d53d 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2359,8 +2359,11 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 	unsigned int balance_cpu = -1, first_idle_cpu = 0;
 	unsigned long avg_load_per_task = 0;
 
-	if (local_group)
+	if (local_group) {
 		balance_cpu = group_first_cpu(group);
+		update_group_power(sd, this_cpu);
+	}
+
 
 	/* Tally up the load of all CPUs in the group */
 	max_cpu_load = 0;
@@ -2406,8 +2409,6 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 		return;
 	}
 
-	update_group_power(sd, this_cpu);
-
 	/* Adjust by relative CPU power of the group */
 	sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-01 23:12 [PATCH] sched: Call update_group_power only for local_group Venkatesh Pallipadi
@ 2010-07-02  8:05 ` Peter Zijlstra
  2010-07-02 16:20   ` Venkatesh Pallipadi
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2010-07-02  8:05 UTC (permalink / raw)
  To: Venkatesh Pallipadi; +Cc: LKML, Gautham R Shenoy, Joel Schopp, Suresh B Siddha

On Thu, 2010-07-01 at 16:12 -0700, Venkatesh Pallipadi wrote:
> commit 871e35b moved update_group_power() call in update_sg_lb_stats(),
> resulting in it being called for each group, even though it only updates
> the power of local group. As a result we have frequent redundant
> update_group_power() calls.
> 
> Move it back under "if (local_group)" condition.
> 
> This reduces the number of calls to update_group_power by a factor of 4
> on my 4 core in 4 NUMA nodes test system.

Hrm,.. so Gautham removed that because for things like the NO_HZ
balancer the initial balance_cpu == this_cpu constraint doesn't hold.

Not I don't think the local_group constraint holds for that either, so
the below would again break that..

Should we perhaps have a conditional on this_rq->nohz_balance_kick or
so?

> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2359,8 +2359,11 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
>  	unsigned int balance_cpu = -1, first_idle_cpu = 0;
>  	unsigned long avg_load_per_task = 0;
>  
> -	if (local_group)
> +	if (local_group) {
>  		balance_cpu = group_first_cpu(group);
> +		update_group_power(sd, this_cpu);
> +	}
> +
>  
>  	/* Tally up the load of all CPUs in the group */
>  	max_cpu_load = 0;


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-02  8:05 ` Peter Zijlstra
@ 2010-07-02 16:20   ` Venkatesh Pallipadi
  2010-07-02 16:40     ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Venkatesh Pallipadi @ 2010-07-02 16:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Gautham R Shenoy, Joel Schopp, Suresh B Siddha

On Fri, Jul 2, 2010 at 1:05 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-07-01 at 16:12 -0700, Venkatesh Pallipadi wrote:
>> commit 871e35b moved update_group_power() call in update_sg_lb_stats(),
>> resulting in it being called for each group, even though it only updates
>> the power of local group. As a result we have frequent redundant
>> update_group_power() calls.
>>
>> Move it back under "if (local_group)" condition.
>>
>> This reduces the number of calls to update_group_power by a factor of 4
>> on my 4 core in 4 NUMA nodes test system.
>
> Hrm,.. so Gautham removed that because for things like the NO_HZ
> balancer the initial balance_cpu == this_cpu constraint doesn't hold.
>
> Not I don't think the local_group constraint holds for that either, so
> the below would again break that..
>
> Should we perhaps have a conditional on this_rq->nohz_balance_kick or
> so?

The thing is that update_group_power is only updating the power of
local group (sd->groups).
It is getting called multiple times however for each group as
update_sd_lb_stats loops
through groups->next calling update_sg_lb_stats.
If we really want to update the power of non-local groups,
update_cpu_power has to change
to take a groups parameter and non this_cpu as arguments and may have
to access non-local
rq etc.

Thanks,
Venki
>
>> --- a/kernel/sched_fair.c
>> +++ b/kernel/sched_fair.c
>> @@ -2359,8 +2359,11 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
>>       unsigned int balance_cpu = -1, first_idle_cpu = 0;
>>       unsigned long avg_load_per_task = 0;
>>
>> -     if (local_group)
>> +     if (local_group) {
>>               balance_cpu = group_first_cpu(group);
>> +             update_group_power(sd, this_cpu);
>> +     }
>> +
>>
>>       /* Tally up the load of all CPUs in the group */
>>       max_cpu_load = 0;
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-02 16:20   ` Venkatesh Pallipadi
@ 2010-07-02 16:40     ` Peter Zijlstra
  2010-07-02 16:56       ` Venkatesh Pallipadi
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2010-07-02 16:40 UTC (permalink / raw)
  To: Venkatesh Pallipadi; +Cc: LKML, Gautham R Shenoy, Joel Schopp, Suresh B Siddha

On Fri, 2010-07-02 at 09:20 -0700, Venkatesh Pallipadi wrote:
> > Hrm,.. so Gautham removed that because for things like the NO_HZ
> > balancer the initial balance_cpu == this_cpu constraint doesn't hold.
> >
> > Not I don't think the local_group constraint holds for that either, so
> > the below would again break that..
> >
> > Should we perhaps have a conditional on this_rq->nohz_balance_kick or
> > so?
> 
> The thing is that update_group_power is only updating the power of
> local group (sd->groups).

Not quite, see nohz_idle_balance(), that iterates idle_cpus_mask, and
calls rebalance_domains(balance_cpu, CPU_IDLE), which then does
for_each_domain(balance_cpu, sd)

So sd need not be local at all, and sd->group will be the group of which
balance_cpu is part.

> It is getting called multiple times however for each group as
> update_sd_lb_stats loops
> through groups->next calling update_sg_lb_stats.

Sure I see how that's happening and why you would want to avoid that, no
argument there.

> If we really want to update the power of non-local groups,
> update_cpu_power has to change
> to take a groups parameter and non this_cpu as arguments and may have
> to access non-local
> rq etc. 

No, see above. All we need is to somehow allow nohz_idle_balance() to
update cpu_power as well.

So I think we want something like:

       if (local_group) {
               balance_cpu = group_first_cpu(group);                                                                
               if (balance_cpu == this_cpu || nohz_balance)
                       update_group_power(sd, this_cpu);
       }

Or am I totally missing something here?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-02 16:40     ` Peter Zijlstra
@ 2010-07-02 16:56       ` Venkatesh Pallipadi
  2010-07-02 17:31         ` Peter Zijlstra
  2010-07-08 14:12         ` Peter Zijlstra
  0 siblings, 2 replies; 17+ messages in thread
From: Venkatesh Pallipadi @ 2010-07-02 16:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Gautham R Shenoy, Joel Schopp, Suresh B Siddha,
	Venkatesh Pallipadi

My fault. I completely missed that code path, assuming this_cpu in
load_balance means this_cpu :(

How about the change below instead?

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
---
 kernel/sched_fair.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index a878b53..97f4534 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2406,8 +2406,6 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 		return;
 	}
 
-	update_group_power(sd, this_cpu);
-
 	/* Adjust by relative CPU power of the group */
 	sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
 
@@ -2456,6 +2454,8 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
 	init_sd_power_savings_stats(sd, sds, idle);
 	load_idx = get_sd_load_idx(sd, idle);
 
+	update_group_power(sd, this_cpu);
+
 	do {
 		int local_group;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-02 16:56       ` Venkatesh Pallipadi
@ 2010-07-02 17:31         ` Peter Zijlstra
  2010-07-08 14:12         ` Peter Zijlstra
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2010-07-02 17:31 UTC (permalink / raw)
  To: Venkatesh Pallipadi; +Cc: LKML, Gautham R Shenoy, Joel Schopp, Suresh B Siddha

On Fri, 2010-07-02 at 09:56 -0700, Venkatesh Pallipadi wrote:
> How about the change below instead?
> 
I think I've now managed to confuse myself too.. will try and reset my
brain and have a second look in a bit.. ;-)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-02 16:56       ` Venkatesh Pallipadi
  2010-07-02 17:31         ` Peter Zijlstra
@ 2010-07-08 14:12         ` Peter Zijlstra
  2010-07-08 17:45           ` Suresh Siddha
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2010-07-08 14:12 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: LKML, Gautham R Shenoy, Joel Schopp, Suresh B Siddha, Michael Neuling

Sorry for the delay, I got side-tracked :/

On Fri, 2010-07-02 at 09:56 -0700, Venkatesh Pallipadi wrote:
> My fault. I completely missed that code path, assuming this_cpu in
> load_balance means this_cpu :(

Right, but local_group is still valid because that's
cpumask_test_cpu(this_cpu, sched_group_cpus(group));

So in that regard your initial patch was still correct. However, since
there can be multiple CPUs in the local group, we want to only have one
do the actual balancing, which is what the !*balance logic is about, so
by keeping the call site where it is gains that.

The below proposal avoids calling update_groups_power() too often for:
 - !local_groups,
 - multiple cpus in the same group.

Does this look correct?

Related to this, Mikey was looking at avoiding the for_each_cpu_and()
loops by converting update_sg_lb_stats into something similar to
update_group_power() and have 1 cpu in the group update the statistics
and propagate upwards by summing groups instead of individual cpus.

---
 kernel/sched_fair.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 9910e1b..62f34a0 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2433,7 +2433,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 		return;
 	}
 
-	update_group_power(sd, this_cpu);
+	if (local_group)
+		update_group_power(sd, this_cpu);
 
 	/* Adjust by relative CPU power of the group */
 	sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-08 14:12         ` Peter Zijlstra
@ 2010-07-08 17:45           ` Suresh Siddha
  2010-07-08 17:49             ` Peter Zijlstra
  2010-07-08 18:16             ` Peter Zijlstra
  0 siblings, 2 replies; 17+ messages in thread
From: Suresh Siddha @ 2010-07-08 17:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Venkatesh Pallipadi, LKML, Gautham R Shenoy, Joel Schopp,
	Michael Neuling

On Thu, 2010-07-08 at 07:12 -0700, Peter Zijlstra wrote:
> Sorry for the delay, I got side-tracked :/
> 
> On Fri, 2010-07-02 at 09:56 -0700, Venkatesh Pallipadi wrote:
> > My fault. I completely missed that code path, assuming this_cpu in
> > load_balance means this_cpu :(
> 
> Right, but local_group is still valid because that's
> cpumask_test_cpu(this_cpu, sched_group_cpus(group));
> 
> So in that regard your initial patch was still correct. However, since
> there can be multiple CPUs in the local group, we want to only have one
> do the actual balancing, which is what the !*balance logic is about, so
> by keeping the call site where it is gains that.
> 
> The below proposal avoids calling update_groups_power() too often for:
>  - !local_groups,
>  - multiple cpus in the same group.
> 
> Does this look correct?
> 
> Related to this, Mikey was looking at avoiding the for_each_cpu_and()
> loops by converting update_sg_lb_stats into something similar to
> update_group_power() and have 1 cpu in the group update the statistics
> and propagate upwards by summing groups instead of individual cpus.
> 
> ---
>  kernel/sched_fair.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9910e1b..62f34a0 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2433,7 +2433,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
>  		return;
>  	}
>  
> -	update_group_power(sd, this_cpu);
> +	if (local_group)
> +		update_group_power(sd, this_cpu);

if IDLE == CPU_NEWLY_IDLE, then all the cpu's in the local group will do
this. Also update_group_power() can be done on only on the local cpu,
i.e., when this_cpu == smp_processor_id() right?

So it should be something like?

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index a878b53..2d8a74e 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2406,8 +2406,6 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 		return;
 	}
 
-	update_group_power(sd, this_cpu);
-
 	/* Adjust by relative CPU power of the group */
 	sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
 
@@ -2456,6 +2454,9 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
 	init_sd_power_savings_stats(sd, sds, idle);
 	load_idx = get_sd_load_idx(sd, idle);
 
+	if (this_cpu == smp_processor_id())
+		update_group_power(sd, this_cpu);
+
 	do {
 		int local_group;
 



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-08 17:45           ` Suresh Siddha
@ 2010-07-08 17:49             ` Peter Zijlstra
  2010-07-08 17:50               ` Suresh Siddha
  2010-07-08 18:16             ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2010-07-08 17:49 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Venkatesh Pallipadi, LKML, Gautham R Shenoy, Joel Schopp,
	Michael Neuling

On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote:
>  
> @@ -2456,6 +2454,9 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
>  	init_sd_power_savings_stats(sd, sds, idle);
>  	load_idx = get_sd_load_idx(sd, idle);
>  
> +	if (this_cpu == smp_processor_id())
> +		update_group_power(sd, this_cpu);
> +
>  	do {
>  		int local_group;

Which will break for nohz_idle_balance..

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-08 17:49             ` Peter Zijlstra
@ 2010-07-08 17:50               ` Suresh Siddha
  2010-07-08 17:55                 ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Suresh Siddha @ 2010-07-08 17:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Venkatesh Pallipadi, LKML, Gautham R Shenoy, Joel Schopp,
	Michael Neuling

On Thu, 2010-07-08 at 10:49 -0700, Peter Zijlstra wrote:
> On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote:
> >  
> > @@ -2456,6 +2454,9 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
> >  	init_sd_power_savings_stats(sd, sds, idle);
> >  	load_idx = get_sd_load_idx(sd, idle);
> >  
> > +	if (this_cpu == smp_processor_id())
> > +		update_group_power(sd, this_cpu);
> > +
> >  	do {
> >  		int local_group;
> 
> Which will break for nohz_idle_balance..

Then the logic is broken somewhere because update_group_power() reads
APERF/MPERF MSR's which doesn't make sense when this_cpu !=
smp_processor_id(). What am I missing?

thanks,
suresh


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-08 17:50               ` Suresh Siddha
@ 2010-07-08 17:55                 ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2010-07-08 17:55 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Venkatesh Pallipadi, LKML, Gautham R Shenoy, Joel Schopp,
	Michael Neuling

On Thu, 2010-07-08 at 10:50 -0700, Suresh Siddha wrote:
> On Thu, 2010-07-08 at 10:49 -0700, Peter Zijlstra wrote:
> > On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote:
> > >  
> > > @@ -2456,6 +2454,9 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
> > >  	init_sd_power_savings_stats(sd, sds, idle);
> > >  	load_idx = get_sd_load_idx(sd, idle);
> > >  
> > > +	if (this_cpu == smp_processor_id())
> > > +		update_group_power(sd, this_cpu);
> > > +
> > >  	do {
> > >  		int local_group;
> > 
> > Which will break for nohz_idle_balance..
> 
> Then the logic is broken somewhere because update_group_power() reads
> APERF/MPERF MSR's which doesn't make sense when this_cpu !=
> smp_processor_id(). What am I missing?

The APERF/MPERF code is utterly broken.. (and currently disabled by
default) but yeah, that's one aspect of its brokenness I only realized
after your email.

The problem is that it measures current throughput, not current
capacity. So for an idle thread/core it would return 0, instead of the
potential.

I've been meaning to revisit this.. maybe I should simply rip that out
until I get it working.

I was thinking of measuring temporal maxima to approximate capacity
instead of throughput.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-08 17:45           ` Suresh Siddha
  2010-07-08 17:49             ` Peter Zijlstra
@ 2010-07-08 18:16             ` Peter Zijlstra
  2010-07-08 21:53               ` Suresh Siddha
                                 ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Peter Zijlstra @ 2010-07-08 18:16 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Venkatesh Pallipadi, LKML, Gautham R Shenoy, Joel Schopp,
	Michael Neuling

On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote:
> > @@ -2433,7 +2433,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
> >               return;
> >       }
> >  
> > -     update_group_power(sd, this_cpu);
> > +     if (local_group)
> > +             update_group_power(sd, this_cpu);
> 
> if IDLE == CPU_NEWLY_IDLE, then all the cpu's in the local group will do
> this. Also update_group_power() can be done on only on the local cpu,
> i.e., when this_cpu == smp_processor_id() right?

It might make sense to only update_group_power on !CPU_NEWLY_IDLE and
rely on the tick driven cpu_power updates.

No sense in updating them in finer slices I guess.

So how about something like:

---
 kernel/sched_fair.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 9910e1b..2f05679 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2427,14 +2427,14 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 	 * domains. In the newly idle case, we will allow all the cpu's
 	 * to do the newly idle load balance.
 	 */
-	if (idle != CPU_NEWLY_IDLE && local_group &&
-	    balance_cpu != this_cpu) {
-		*balance = 0;
-		return;
+	if (idle != CPU_NEWLY_IDLE && local_group) {
+		if (balance_cpu != this_cpu) {
+			*balance = 0;
+			return;
+		}
+		update_group_power(sd, this_cpu);
 	}
 
-	update_group_power(sd, this_cpu);
-
 	/* Adjust by relative CPU power of the group */
 	sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
 


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-08 18:16             ` Peter Zijlstra
@ 2010-07-08 21:53               ` Suresh Siddha
  2010-07-09 13:17                 ` Peter Zijlstra
  2010-07-17 11:12                 ` [tip:sched/core] sched: Update rq->clock for nohz balanced cpus tip-bot for Suresh Siddha
  2010-07-12 17:11               ` [PATCH] sched: Call update_group_power only for local_group Venkatesh Pallipadi
  2010-07-17 11:12               ` [tip:sched/core] sched: Reduce update_group_power() calls tip-bot for Peter Zijlstra
  2 siblings, 2 replies; 17+ messages in thread
From: Suresh Siddha @ 2010-07-08 21:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Venkatesh Pallipadi, LKML, Gautham R Shenoy, Joel Schopp,
	Michael Neuling

On Thu, 2010-07-08 at 11:16 -0700, Peter Zijlstra wrote:
> On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote:
> > > @@ -2433,7 +2433,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
> > >               return;
> > >       }
> > >  
> > > -     update_group_power(sd, this_cpu);
> > > +     if (local_group)
> > > +             update_group_power(sd, this_cpu);
> > 
> > if IDLE == CPU_NEWLY_IDLE, then all the cpu's in the local group will do
> > this. Also update_group_power() can be done on only on the local cpu,
> > i.e., when this_cpu == smp_processor_id() right?
> 
> It might make sense to only update_group_power on !CPU_NEWLY_IDLE and
> rely on the tick driven cpu_power updates.
> 
> No sense in updating them in finer slices I guess.
> 
> So how about something like:
> 
> ---
>  kernel/sched_fair.c |   12 ++++++------
>  1 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9910e1b..2f05679 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2427,14 +2427,14 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
>  	 * domains. In the newly idle case, we will allow all the cpu's
>  	 * to do the newly idle load balance.
>  	 */
> -	if (idle != CPU_NEWLY_IDLE && local_group &&
> -	    balance_cpu != this_cpu) {
> -		*balance = 0;
> -		return;
> +	if (idle != CPU_NEWLY_IDLE && local_group) {
> +		if (balance_cpu != this_cpu) {
> +			*balance = 0;
> +			return;
> +		}
> +		update_group_power(sd, this_cpu);
>  	}
>  
> -	update_group_power(sd, this_cpu);
> -
>  	/* Adjust by relative CPU power of the group */
>  	sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
>  

I am ok with this patch (barring the currently broken aperf/mperf part).

Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>

Also, looking at all this, don't we need to do something like this in
the nohz load balance?

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 9910e1b..ae750e9 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3598,6 +3598,7 @@ static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle)
 		}
 
 		raw_spin_lock_irq(&this_rq->lock);
+		update_rq_clock(this_rq);
 		update_cpu_load(this_rq);
 		raw_spin_unlock_irq(&this_rq->lock);
 



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-08 21:53               ` Suresh Siddha
@ 2010-07-09 13:17                 ` Peter Zijlstra
  2010-07-17 11:12                 ` [tip:sched/core] sched: Update rq->clock for nohz balanced cpus tip-bot for Suresh Siddha
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2010-07-09 13:17 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Venkatesh Pallipadi, LKML, Gautham R Shenoy, Joel Schopp,
	Michael Neuling

On Thu, 2010-07-08 at 14:53 -0700, Suresh Siddha wrote:

> Also, looking at all this, don't we need to do something like this in
> the nohz load balance?

Yes, I think you're right, thanks!

> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9910e1b..ae750e9 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -3598,6 +3598,7 @@ static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle)
>  		}
>  
>  		raw_spin_lock_irq(&this_rq->lock);
> +		update_rq_clock(this_rq);
>  		update_cpu_load(this_rq);
>  		raw_spin_unlock_irq(&this_rq->lock);



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: Call update_group_power only for local_group
  2010-07-08 18:16             ` Peter Zijlstra
  2010-07-08 21:53               ` Suresh Siddha
@ 2010-07-12 17:11               ` Venkatesh Pallipadi
  2010-07-17 11:12               ` [tip:sched/core] sched: Reduce update_group_power() calls tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 17+ messages in thread
From: Venkatesh Pallipadi @ 2010-07-12 17:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Suresh Siddha, LKML, Gautham R Shenoy, Joel Schopp, Michael Neuling

On Thu, Jul 8, 2010 at 11:16 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote:
>> > @@ -2433,7 +2433,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
>> >               return;
>> >       }
>> >
>> > -     update_group_power(sd, this_cpu);
>> > +     if (local_group)
>> > +             update_group_power(sd, this_cpu);
>>
>> if IDLE == CPU_NEWLY_IDLE, then all the cpu's in the local group will do
>> this. Also update_group_power() can be done on only on the local cpu,
>> i.e., when this_cpu == smp_processor_id() right?
>
> It might make sense to only update_group_power on !CPU_NEWLY_IDLE and
> rely on the tick driven cpu_power updates.
>
> No sense in updating them in finer slices I guess.
>
> So how about something like:

Yes. This looks good.

Acked-by: Venkatesh Pallipadi <venki@google.com>

>
> ---
>  kernel/sched_fair.c |   12 ++++++------
>  1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9910e1b..2f05679 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2427,14 +2427,14 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
>         * domains. In the newly idle case, we will allow all the cpu's
>         * to do the newly idle load balance.
>         */
> -       if (idle != CPU_NEWLY_IDLE && local_group &&
> -           balance_cpu != this_cpu) {
> -               *balance = 0;
> -               return;
> +       if (idle != CPU_NEWLY_IDLE && local_group) {
> +               if (balance_cpu != this_cpu) {
> +                       *balance = 0;
> +                       return;
> +               }
> +               update_group_power(sd, this_cpu);
>        }
>
> -       update_group_power(sd, this_cpu);
> -
>        /* Adjust by relative CPU power of the group */
>        sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:sched/core] sched: Update rq->clock for nohz balanced cpus
  2010-07-08 21:53               ` Suresh Siddha
  2010-07-09 13:17                 ` Peter Zijlstra
@ 2010-07-17 11:12                 ` tip-bot for Suresh Siddha
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot for Suresh Siddha @ 2010-07-17 11:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, suresh.b.siddha, tglx, mingo

Commit-ID:  5343bdb8fd076f16edc9d113a9e35e2a1d1f4966
Gitweb:     http://git.kernel.org/tip/5343bdb8fd076f16edc9d113a9e35e2a1d1f4966
Author:     Suresh Siddha <suresh.b.siddha@intel.com>
AuthorDate: Fri, 9 Jul 2010 15:19:54 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sat, 17 Jul 2010 12:02:08 +0200

sched: Update rq->clock for nohz balanced cpus

Suresh spotted that we don't update the rq->clock in the nohz
load-balancer path.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1278626014.2834.74.camel@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index b4da534..e44a591 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3596,6 +3596,7 @@ static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle)
 		}
 
 		raw_spin_lock_irq(&this_rq->lock);
+		update_rq_clock(this_rq);
 		update_cpu_load(this_rq);
 		raw_spin_unlock_irq(&this_rq->lock);
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip:sched/core] sched: Reduce update_group_power() calls
  2010-07-08 18:16             ` Peter Zijlstra
  2010-07-08 21:53               ` Suresh Siddha
  2010-07-12 17:11               ` [PATCH] sched: Call update_group_power only for local_group Venkatesh Pallipadi
@ 2010-07-17 11:12               ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-07-17 11:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, suresh.b.siddha, tglx,
	venki, mingo

Commit-ID:  bbc8cb5baead9607309583b20873ab0cc8d89eaf
Gitweb:     http://git.kernel.org/tip/bbc8cb5baead9607309583b20873ab0cc8d89eaf
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Fri, 9 Jul 2010 15:15:43 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sat, 17 Jul 2010 12:05:14 +0200

sched: Reduce update_group_power() calls

Currently we update cpu_power() too often, update_group_power() only
updates the local group's cpu_power but it gets called for all groups.

Furthermore, CPU_NEWLY_IDLE invocations will result in all cpus
calling it, even though a slow update of cpu_power is sufficient.

Therefore move the update under 'idle != CPU_NEWLY_IDLE &&
local_group' to reduce superfluous invocations.

Reported-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1278612989.1900.176.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index e44a591..c9ac097 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2425,14 +2425,14 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 	 * domains. In the newly idle case, we will allow all the cpu's
 	 * to do the newly idle load balance.
 	 */
-	if (idle != CPU_NEWLY_IDLE && local_group &&
-	    balance_cpu != this_cpu) {
-		*balance = 0;
-		return;
+	if (idle != CPU_NEWLY_IDLE && local_group) {
+		if (balance_cpu != this_cpu) {
+			*balance = 0;
+			return;
+		}
+		update_group_power(sd, this_cpu);
 	}
 
-	update_group_power(sd, this_cpu);
-
 	/* Adjust by relative CPU power of the group */
 	sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-07-17 11:12 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-01 23:12 [PATCH] sched: Call update_group_power only for local_group Venkatesh Pallipadi
2010-07-02  8:05 ` Peter Zijlstra
2010-07-02 16:20   ` Venkatesh Pallipadi
2010-07-02 16:40     ` Peter Zijlstra
2010-07-02 16:56       ` Venkatesh Pallipadi
2010-07-02 17:31         ` Peter Zijlstra
2010-07-08 14:12         ` Peter Zijlstra
2010-07-08 17:45           ` Suresh Siddha
2010-07-08 17:49             ` Peter Zijlstra
2010-07-08 17:50               ` Suresh Siddha
2010-07-08 17:55                 ` Peter Zijlstra
2010-07-08 18:16             ` Peter Zijlstra
2010-07-08 21:53               ` Suresh Siddha
2010-07-09 13:17                 ` Peter Zijlstra
2010-07-17 11:12                 ` [tip:sched/core] sched: Update rq->clock for nohz balanced cpus tip-bot for Suresh Siddha
2010-07-12 17:11               ` [PATCH] sched: Call update_group_power only for local_group Venkatesh Pallipadi
2010-07-17 11:12               ` [tip:sched/core] sched: Reduce update_group_power() calls tip-bot for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).