linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] sched: Ensure cpu_power periodic update
@ 2011-12-12 19:21 Vincent Guittot
  2011-12-15 10:08 ` Peter Zijlstra
  2012-01-28 12:07 ` [tip:sched/core] " tip-bot for Vincent Guittot
  0 siblings, 2 replies; 8+ messages in thread
From: Vincent Guittot @ 2011-12-12 19:21 UTC (permalink / raw)
  To: linux-kernel, linaro-dev, a.p.zijlstra; +Cc: patches, Vincent Guittot

With a lot of small tasks, the softirq sched is nearly never called
when no_hz is enable. In this case the load_balance is mainly called with
the newly_idle mode which doesn't update the cpu_power.
Add a next_update field which ensure a maximum update period when
there is short activity

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 include/linux/sched.h |    1 +
 kernel/sched/fair.c   |   24 ++++++++++++++++--------
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 64527c4..7178446 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -903,6 +903,7 @@ struct sched_group_power {
 	 * single CPU.
 	 */
 	unsigned int power, power_orig;
+	unsigned long next_update;
 	/*
 	 * Number of busy cpus in this group.
 	 */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a4d2b7a..809f484 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -215,6 +215,8 @@ calc_delta_mine(unsigned long delta_exec, unsigned long weight,
 
 const struct sched_class fair_sched_class;
 
+static unsigned long __read_mostly max_load_balance_interval = HZ/10;
+
 /**************************************************************
  * CFS operations on generic schedulable entities:
  */
@@ -3786,6 +3788,11 @@ void update_group_power(struct sched_domain *sd, int cpu)
 	struct sched_domain *child = sd->child;
 	struct sched_group *group, *sdg = sd->groups;
 	unsigned long power;
+	unsigned long interval;
+
+	interval = msecs_to_jiffies(sd->balance_interval);
+	interval = clamp(interval, 1UL, max_load_balance_interval);
+	sdg->sgp->next_update = jiffies + interval;
 
 	if (!child) {
 		update_cpu_power(sd, cpu);
@@ -3893,12 +3900,15 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 	 * domains. In the newly idle case, we will allow all the cpu's
 	 * to do the newly idle load balance.
 	 */
-	if (idle != CPU_NEWLY_IDLE && local_group) {
-		if (balance_cpu != this_cpu) {
-			*balance = 0;
-			return;
-		}
-		update_group_power(sd, this_cpu);
+	if (local_group) {
+		if (idle != CPU_NEWLY_IDLE) {
+			if (balance_cpu != this_cpu) {
+				*balance = 0;
+				return;
+			}
+			update_group_power(sd, this_cpu);
+		} else if (time_after_eq(jiffies, group->sgp->next_update))
+			update_group_power(sd, this_cpu);
 	}
 
 	/* Adjust by relative CPU power of the group */
@@ -4917,8 +4927,6 @@ void select_nohz_load_balancer(int stop_tick)
 
 static DEFINE_SPINLOCK(balancing);
 
-static unsigned long __read_mostly max_load_balance_interval = HZ/10;
-
 /*
  * Scale the max load_balance interval with the number of CPUs in the system.
  * This trades load-balance latency on larger machines for less cross talk.
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC] sched: Ensure cpu_power periodic update
  2011-12-12 19:21 [RFC] sched: Ensure cpu_power periodic update Vincent Guittot
@ 2011-12-15 10:08 ` Peter Zijlstra
  2011-12-15 13:36   ` Vincent Guittot
  2012-01-28 12:07 ` [tip:sched/core] " tip-bot for Vincent Guittot
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2011-12-15 10:08 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, linaro-dev, patches, Ingo Molnar, Paul Turner,
	Suresh Siddha

On Mon, 2011-12-12 at 20:21 +0100, Vincent Guittot wrote:
> With a lot of small tasks, the softirq sched is nearly never called
> when no_hz is enable. In this case the load_balance is mainly called with
> the newly_idle mode which doesn't update the cpu_power.
> Add a next_update field which ensure a maximum update period when
> there is short activity

> +	if (local_group) {
> +		if (idle != CPU_NEWLY_IDLE) {
> +			if (balance_cpu != this_cpu) {
> +				*balance = 0;
> +				return;
> +			}
> +			update_group_power(sd, this_cpu);
> +		} else if (time_after_eq(jiffies, group->sgp->next_update))
> +			update_group_power(sd, this_cpu);
>  	}

Hmm, I would have expected it to be called from the NOHZ balancing path
instead of the new_idle path. Your changelog fails to mentions any
considerations on this..

Then again, its probably easier to keep update_group_power on this_cpu
than to allow a remote update of your cpu_power.

So I'm not opposed to this patch, I'd just like a little extra
clarification.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] sched: Ensure cpu_power periodic update
  2011-12-15 10:08 ` Peter Zijlstra
@ 2011-12-15 13:36   ` Vincent Guittot
  2011-12-16  0:58     ` Suresh Siddha
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2011-12-15 13:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linaro-dev, patches, Ingo Molnar, Paul Turner,
	Suresh Siddha

On 15 December 2011 11:08, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Mon, 2011-12-12 at 20:21 +0100, Vincent Guittot wrote:
>> With a lot of small tasks, the softirq sched is nearly never called
>> when no_hz is enable. In this case the load_balance is mainly called with
>> the newly_idle mode which doesn't update the cpu_power.
>> Add a next_update field which ensure a maximum update period when
>> there is short activity
>
>> +     if (local_group) {
>> +             if (idle != CPU_NEWLY_IDLE) {
>> +                     if (balance_cpu != this_cpu) {
>> +                             *balance = 0;
>> +                             return;
>> +                     }
>> +                     update_group_power(sd, this_cpu);
>> +             } else if (time_after_eq(jiffies, group->sgp->next_update))
>> +                     update_group_power(sd, this_cpu);
>>       }
>
> Hmm, I would have expected it to be called from the NOHZ balancing path
> instead of the new_idle path. Your changelog fails to mentions any
> considerations on this..
>

As we are not lucky, the small tasks are mainly running between ticks
and  the timer interrupt doesn't fire which implies that both
rebalance_domain of the cpu and nohz_balancer_kick are not called. We
have a lot of call to idle_balance() when cpus become idle and very
few calls to rebalance or nohz_idle_balance. If some tasks are rt
tasks, the cpu_power should be updated regularly to reflect current
use of the cpu by rt scheduler.

I'm using cyclictest to easily reproduce the problem on my dual cortex-A9

> Then again, its probably easier to keep update_group_power on this_cpu
> than to allow a remote update of your cpu_power.
>

This additional path for updating the cpu_power will only be used by
this_cpu because it is called by idle_balance. But we still have a
call to update_group_power by a remote cpu when nohz_idle_balance is
called.

> So I'm not opposed to this patch, I'd just like a little extra
> clarification.

Vincent

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] sched: Ensure cpu_power periodic update
  2011-12-15 13:36   ` Vincent Guittot
@ 2011-12-16  0:58     ` Suresh Siddha
  2011-12-16  8:23       ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Suresh Siddha @ 2011-12-16  0:58 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, linux-kernel, linaro-dev, patches, Ingo Molnar,
	Paul Turner

On Thu, 2011-12-15 at 05:36 -0800, Vincent Guittot wrote:
> I'm using cyclictest to easily reproduce the problem on my dual cortex-A9

So does the cyclictest itself exhibit the problem or running cyclictest
with another workload showed the problem? In other words, what numbers
of the workload did you see change with this patch?

> 
> > Then again, its probably easier to keep update_group_power on this_cpu
> > than to allow a remote update of your cpu_power.
> >
> 
> This additional path for updating the cpu_power will only be used by
> this_cpu because it is called by idle_balance. But we still have a
> call to update_group_power by a remote cpu when nohz_idle_balance is
> called.

As Vincent mentioned, the current mainline kernel already updates the
remote cpu's group_power in the nohz idle load balancing patch.

Also with all the recent nohz idle load balancing using kick, on a
dual-core system there may not be any nohz idle load balancing if
multiple tasks wakeup, run for short time and go back to idle before the
next tick. We rely on the wakeup balance to get it right in this case.

thanks,
suresh


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] sched: Ensure cpu_power periodic update
  2011-12-16  0:58     ` Suresh Siddha
@ 2011-12-16  8:23       ` Vincent Guittot
  2012-01-04  8:22         ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2011-12-16  8:23 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Peter Zijlstra, linux-kernel, linaro-dev, patches, Ingo Molnar,
	Paul Turner

On 16 December 2011 01:58, Suresh Siddha <suresh.b.siddha@intel.com> wrote:
> On Thu, 2011-12-15 at 05:36 -0800, Vincent Guittot wrote:
>> I'm using cyclictest to easily reproduce the problem on my dual cortex-A9
>
> So does the cyclictest itself exhibit the problem or running cyclictest
> with another workload showed the problem? In other words, what numbers
> of the workload did you see change with this patch?
>

Using a cyclictest -q -t 5 -D 4 on my dual cortex-A9 shows the fact
that the softirqs timer and sched are not called very often and the
cpu_power is nearly never updated.

I have also used the following sequence :

cyclictest -q -t 5 -D 4 &
sleep 2
cyclictest -q -t 3 --affinity=0 -p 99 -D 2

The cpu_power of cpu0 should start to decrease when the rt threads are
started. Without the patch, we must wait for the next sched softirq
for starting to update the cpu_power and we have no guarantee of the
maximum interval. With the patch, the cpu_power is updated regularly
using the balance_interval value.

Vincent

>>
>> > Then again, its probably easier to keep update_group_power on this_cpu
>> > than to allow a remote update of your cpu_power.
>> >
>>
>> This additional path for updating the cpu_power will only be used by
>> this_cpu because it is called by idle_balance. But we still have a
>> call to update_group_power by a remote cpu when nohz_idle_balance is
>> called.
>
> As Vincent mentioned, the current mainline kernel already updates the
> remote cpu's group_power in the nohz idle load balancing patch.
>
> Also with all the recent nohz idle load balancing using kick, on a
> dual-core system there may not be any nohz idle load balancing if
> multiple tasks wakeup, run for short time and go back to idle before the
> next tick. We rely on the wakeup balance to get it right in this case.
>
> thanks,
> suresh
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] sched: Ensure cpu_power periodic update
  2011-12-16  8:23       ` Vincent Guittot
@ 2012-01-04  8:22         ` Vincent Guittot
  2012-01-06 12:10           ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2012-01-04  8:22 UTC (permalink / raw)
  To: Suresh Siddha, Peter Zijlstra
  Cc: linux-kernel, linaro-dev, patches, Ingo Molnar, Paul Turner

Hi Peter,

Does it sound good for you if I update the comment of this patch with
the explanation of the previous mails or do you need more information
?

Thanks,
Vincent

On 16 December 2011 09:23, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> On 16 December 2011 01:58, Suresh Siddha <suresh.b.siddha@intel.com> wrote:
>> On Thu, 2011-12-15 at 05:36 -0800, Vincent Guittot wrote:
>>> I'm using cyclictest to easily reproduce the problem on my dual cortex-A9
>>
>> So does the cyclictest itself exhibit the problem or running cyclictest
>> with another workload showed the problem? In other words, what numbers
>> of the workload did you see change with this patch?
>>
>
> Using a cyclictest -q -t 5 -D 4 on my dual cortex-A9 shows the fact
> that the softirqs timer and sched are not called very often and the
> cpu_power is nearly never updated.
>
> I have also used the following sequence :
>
> cyclictest -q -t 5 -D 4 &
> sleep 2
> cyclictest -q -t 3 --affinity=0 -p 99 -D 2
>
> The cpu_power of cpu0 should start to decrease when the rt threads are
> started. Without the patch, we must wait for the next sched softirq
> for starting to update the cpu_power and we have no guarantee of the
> maximum interval. With the patch, the cpu_power is updated regularly
> using the balance_interval value.
>
> Vincent
>
>>>
>>> > Then again, its probably easier to keep update_group_power on this_cpu
>>> > than to allow a remote update of your cpu_power.
>>> >
>>>
>>> This additional path for updating the cpu_power will only be used by
>>> this_cpu because it is called by idle_balance. But we still have a
>>> call to update_group_power by a remote cpu when nohz_idle_balance is
>>> called.
>>
>> As Vincent mentioned, the current mainline kernel already updates the
>> remote cpu's group_power in the nohz idle load balancing patch.
>>
>> Also with all the recent nohz idle load balancing using kick, on a
>> dual-core system there may not be any nohz idle load balancing if
>> multiple tasks wakeup, run for short time and go back to idle before the
>> next tick. We rely on the wakeup balance to get it right in this case.
>>
>> thanks,
>> suresh
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] sched: Ensure cpu_power periodic update
  2012-01-04  8:22         ` Vincent Guittot
@ 2012-01-06 12:10           ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2012-01-06 12:10 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Suresh Siddha, linux-kernel, linaro-dev, patches, Ingo Molnar,
	Paul Turner

On Wed, 2012-01-04 at 09:22 +0100, Vincent Guittot wrote:
> Does it sound good for you if I update the comment of this patch with
> the explanation of the previous mails or do you need more information
> ?
> 
No, looks ok. Just got stuck in the mailbox + extra holidays delay.

Got it queued now, Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:sched/core] sched: Ensure cpu_power periodic update
  2011-12-12 19:21 [RFC] sched: Ensure cpu_power periodic update Vincent Guittot
  2011-12-15 10:08 ` Peter Zijlstra
@ 2012-01-28 12:07 ` tip-bot for Vincent Guittot
  1 sibling, 0 replies; 8+ messages in thread
From: tip-bot for Vincent Guittot @ 2012-01-28 12:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, vincent.guittot, tglx

Commit-ID:  4ec4412e1e91f44a3dcb97b6c9172a13fc78bac9
Gitweb:     http://git.kernel.org/tip/4ec4412e1e91f44a3dcb97b6c9172a13fc78bac9
Author:     Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate: Mon, 12 Dec 2011 20:21:08 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 27 Jan 2012 13:28:49 +0100

sched: Ensure cpu_power periodic update

With a lot of small tasks, the softirq sched is nearly never called
when no_hz is enabled. In this case load_balance() is mainly called
with the newly_idle mode which doesn't update the cpu_power.

Add a next_update field which ensure a maximum update period when
there is short activity.

Having stale cpu_power information can skew the load-balancing
decisions, this is cured by the guaranteed update.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1323717668-2143-1-git-send-email-vincent.guittot@linaro.org
---
 include/linux/sched.h |    1 +
 kernel/sched/fair.c   |   24 ++++++++++++++++--------
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0e19595..92313a3f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -905,6 +905,7 @@ struct sched_group_power {
 	 * single CPU.
 	 */
 	unsigned int power, power_orig;
+	unsigned long next_update;
 	/*
 	 * Number of busy cpus in this group.
 	 */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7c6414f..8e77a6b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -215,6 +215,8 @@ calc_delta_mine(unsigned long delta_exec, unsigned long weight,
 
 const struct sched_class fair_sched_class;
 
+static unsigned long __read_mostly max_load_balance_interval = HZ/10;
+
 /**************************************************************
  * CFS operations on generic schedulable entities:
  */
@@ -3776,6 +3778,11 @@ void update_group_power(struct sched_domain *sd, int cpu)
 	struct sched_domain *child = sd->child;
 	struct sched_group *group, *sdg = sd->groups;
 	unsigned long power;
+	unsigned long interval;
+
+	interval = msecs_to_jiffies(sd->balance_interval);
+	interval = clamp(interval, 1UL, max_load_balance_interval);
+	sdg->sgp->next_update = jiffies + interval;
 
 	if (!child) {
 		update_cpu_power(sd, cpu);
@@ -3883,12 +3890,15 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 	 * domains. In the newly idle case, we will allow all the cpu's
 	 * to do the newly idle load balance.
 	 */
-	if (idle != CPU_NEWLY_IDLE && local_group) {
-		if (balance_cpu != this_cpu) {
-			*balance = 0;
-			return;
-		}
-		update_group_power(sd, this_cpu);
+	if (local_group) {
+		if (idle != CPU_NEWLY_IDLE) {
+			if (balance_cpu != this_cpu) {
+				*balance = 0;
+				return;
+			}
+			update_group_power(sd, this_cpu);
+		} else if (time_after_eq(jiffies, group->sgp->next_update))
+			update_group_power(sd, this_cpu);
 	}
 
 	/* Adjust by relative CPU power of the group */
@@ -4945,8 +4955,6 @@ static int __cpuinit sched_ilb_notifier(struct notifier_block *nfb,
 
 static DEFINE_SPINLOCK(balancing);
 
-static unsigned long __read_mostly max_load_balance_interval = HZ/10;
-
 /*
  * Scale the max load_balance interval with the number of CPUs in the system.
  * This trades load-balance latency on larger machines for less cross talk.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-01-28 12:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-12 19:21 [RFC] sched: Ensure cpu_power periodic update Vincent Guittot
2011-12-15 10:08 ` Peter Zijlstra
2011-12-15 13:36   ` Vincent Guittot
2011-12-16  0:58     ` Suresh Siddha
2011-12-16  8:23       ` Vincent Guittot
2012-01-04  8:22         ` Vincent Guittot
2012-01-06 12:10           ` Peter Zijlstra
2012-01-28 12:07 ` [tip:sched/core] " tip-bot for Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).