All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/fair: Don't increase sd->balance_interval on newidle balance
@ 2018-09-25 17:35 Valentin Schneider
  2018-09-26  8:13 ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Valentin Schneider @ 2018-09-25 17:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, Dietmar.Eggemann

When load_balance() fails to move some load because of task affinity,
we end up increasing sd->balance_interval to delay the next periodic
balance in the hopes that next time we look, that annoying pinned
task(s) will be gone.

However, idle_balance() pays no attention to sd->balance_interval, yet
it will still lead to an increase in balance_interval in case of
pinned tasks.

If we're going through several newidle balances (e.g. we have a
periodic task), this can lead to a huge increase of the
balance_interval in a very small amount of time.

To prevent that, don't increase the balance interval when going
through a newidle balance.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6bd142d..620910d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8782,13 +8782,22 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 	sd->nr_balance_failed = 0;
 
 out_one_pinned:
+	ld_moved = 0;
+
+	/*
+	 * idle_balance() disregards balance intervals, so we could repeatedly
+	 * reach this code, which would lead to balance_interval skyrocketting
+	 * in a short amount of time. Skip the balance_interval increase logic
+	 * to avoid that.
+	 */
+	if (env.idle == CPU_NEWLY_IDLE)
+		goto out;
+
 	/* tune up the balancing interval */
-	if (((env.flags & LBF_ALL_PINNED) &&
-			sd->balance_interval < MAX_PINNED_INTERVAL) ||
-			(sd->balance_interval < sd->max_interval))
+	if ((env.flags & LBF_ALL_PINNED &&
+	     sd->balance_interval < MAX_PINNED_INTERVAL) ||
+	    (sd->balance_interval < sd->max_interval))
 		sd->balance_interval *= 2;
-
-	ld_moved = 0;
 out:
 	return ld_moved;
 }
--
2.7.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Don't increase sd->balance_interval on newidle balance
  2018-09-25 17:35 [PATCH] sched/fair: Don't increase sd->balance_interval on newidle balance Valentin Schneider
@ 2018-09-26  8:13 ` Vincent Guittot
  2018-09-26  9:35   ` Valentin Schneider
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2018-09-26  8:13 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Dietmar Eggemann

On Tue, 25 Sep 2018 at 19:38, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> When load_balance() fails to move some load because of task affinity,
> we end up increasing sd->balance_interval to delay the next periodic
> balance in the hopes that next time we look, that annoying pinned
> task(s) will be gone.

It's not about hope but to minimize useless periodic load balance as
there is no possibility to balance anything

>
> However, idle_balance() pays no attention to sd->balance_interval, yet
> it will still lead to an increase in balance_interval in case of
> pinned tasks.

I would say that this makes sense as there is no way to pull the
pinned task on this rq so running periodic load balance (busy or idle)
is a waste of time  and resources

>
> If we're going through several newidle balances (e.g. we have a
> periodic task), this can lead to a huge increase of the

This only happen when overload is set which means that there are tasks
on another CPU but this one can't pull one of them.

Can you give us details about the use case that you care about ?

Also, If the interval reaches a value that becomes a problem for you
why don't you reduce the max_interval with sysfs ? This max interval
can be reach in several other occasions

> balance_interval in a very small amount of time.
>
> To prevent that, don't increase the balance interval when going
> through a newidle balance.
>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  kernel/sched/fair.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6bd142d..620910d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8782,13 +8782,22 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>         sd->nr_balance_failed = 0;
>
>  out_one_pinned:
> +       ld_moved = 0;
> +
> +       /*
> +        * idle_balance() disregards balance intervals, so we could repeatedly
> +        * reach this code, which would lead to balance_interval skyrocketting
> +        * in a short amount of time. Skip the balance_interval increase logic
> +        * to avoid that.
> +        */
> +       if (env.idle == CPU_NEWLY_IDLE)
> +               goto out;
> +
>         /* tune up the balancing interval */
> -       if (((env.flags & LBF_ALL_PINNED) &&
> -                       sd->balance_interval < MAX_PINNED_INTERVAL) ||
> -                       (sd->balance_interval < sd->max_interval))
> +       if ((env.flags & LBF_ALL_PINNED &&
> +            sd->balance_interval < MAX_PINNED_INTERVAL) ||
> +           (sd->balance_interval < sd->max_interval))

This looks like a code cleanup that is not related to the subject

>                 sd->balance_interval *= 2;
> -
> -       ld_moved = 0;
>  out:
>         return ld_moved;
>  }
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Don't increase sd->balance_interval on newidle balance
  2018-09-26  8:13 ` Vincent Guittot
@ 2018-09-26  9:35   ` Valentin Schneider
  2018-09-26 10:33     ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Valentin Schneider @ 2018-09-26  9:35 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Dietmar Eggemann

Hi,

On 26/09/18 09:13, Vincent Guittot wrote:
> On Tue, 25 Sep 2018 at 19:38, Valentin Schneider
> <valentin.schneider@arm.com> wrote:
>>
>> When load_balance() fails to move some load because of task affinity,
>> we end up increasing sd->balance_interval to delay the next periodic
>> balance in the hopes that next time we look, that annoying pinned
>> task(s) will be gone.
> 
> It's not about hope but to minimize useless periodic load balance as
> there is no possibility to balance anything
> 
>>
>> However, idle_balance() pays no attention to sd->balance_interval, yet
>> it will still lead to an increase in balance_interval in case of
>> pinned tasks.
> 
> I would say that this makes sense as there is no way to pull the
> pinned task on this rq so running periodic load balance (busy or idle)
> is a waste of time  and resources
> 

Increasing the delay for the next periodic balance when there's pinned tasks
makes sense. The issue I have here is that the balance_interval can potentially
be doubled at every failed idle_balance() without waiting for a sufficient
amount of time.

When going through periodic balance, load_balance() is gated by:

    time_after_eq(jiffies, sd->last_balance + interval)

So if we previously ran load_balance() and failed to pull anything because 
of task affinity, we'll wait for at least the new interval before attempting
another load_balance().

On the other hand, idle_balance() will call load_balance() regardless of
balance_interval values. So if the task composition allows it, we can repeatedly
fail to load_balance() and thus double balance_interval at every idle_balance()
"tick". So you can go from 8ms to 512ms in a few successive idle_balance()s,
whereas periodic balance would have to wait for at least the sum of all
intermediate balance_intervals leading to 512 (8, 16, 32, etc).

>>
>> If we're going through several newidle balances (e.g. we have a
>> periodic task), this can lead to a huge increase of the
> 
> This only happen when overload is set which means that there are tasks
> on another CPU but this one can't pull one of them.
> 
> Can you give us details about the use case that you care about ?
> 

It's the same as I presented last week - devlib (some python target communication
library I use) has some phase where it spawns at lot of tasks at once to do
some setup (busybox, shutils, bash...). Some of those tasks are pinned to a
particular CPU, and that can lead to failed load_balance() - and to make things
worse, there's a lot of idle_balance() in there.

Eventually when I start running my actual workload a few ~100ms later, it's
impacted by that balance_interval increase.

Admittedly that's a specific use-case, but I don't think this quick increase
is something that was intended.

> Also, If the interval reaches a value that becomes a problem for you
> why don't you reduce the max_interval with sysfs ? This max interval
> can be reach in several other occasions
> 

True, although I saw that as an unwanted behaviour rather than a tuning problem.

>> balance_interval in a very small amount of time.
>>
>> To prevent that, don't increase the balance interval when going
>> through a newidle balance.
>>
>> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
>> ---
>>  kernel/sched/fair.c | 19 ++++++++++++++-----
>>  1 file changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 6bd142d..620910d 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8782,13 +8782,22 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>>         sd->nr_balance_failed = 0;
>>
>>  out_one_pinned:
>> +       ld_moved = 0;
>> +
>> +       /*
>> +        * idle_balance() disregards balance intervals, so we could repeatedly
>> +        * reach this code, which would lead to balance_interval skyrocketting
>> +        * in a short amount of time. Skip the balance_interval increase logic
>> +        * to avoid that.
>> +        */
>> +       if (env.idle == CPU_NEWLY_IDLE)
>> +               goto out;
>> +
>>         /* tune up the balancing interval */
>> -       if (((env.flags & LBF_ALL_PINNED) &&
>> -                       sd->balance_interval < MAX_PINNED_INTERVAL) ||
>> -                       (sd->balance_interval < sd->max_interval))
>> +       if ((env.flags & LBF_ALL_PINNED &&
>> +            sd->balance_interval < MAX_PINNED_INTERVAL) ||
>> +           (sd->balance_interval < sd->max_interval))
> 
> This looks like a code cleanup that is not related to the subject
> 

Absolutely, I figured I could get away with it but if that's preferred I can
split that more clearly

>>                 sd->balance_interval *= 2;
>> -
>> -       ld_moved = 0;
>>  out:
>>         return ld_moved;
>>  }
>> --
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Don't increase sd->balance_interval on newidle balance
  2018-09-26  9:35   ` Valentin Schneider
@ 2018-09-26 10:33     ` Vincent Guittot
  2018-09-26 13:16       ` Valentin Schneider
  2018-09-26 13:17       ` Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: Vincent Guittot @ 2018-09-26 10:33 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Dietmar Eggemann

On Wed, 26 Sep 2018 at 11:35, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> Hi,
>
> On 26/09/18 09:13, Vincent Guittot wrote:
> > On Tue, 25 Sep 2018 at 19:38, Valentin Schneider
> > <valentin.schneider@arm.com> wrote:
> >>
> >> When load_balance() fails to move some load because of task affinity,
> >> we end up increasing sd->balance_interval to delay the next periodic
> >> balance in the hopes that next time we look, that annoying pinned
> >> task(s) will be gone.
> >
> > It's not about hope but to minimize useless periodic load balance as
> > there is no possibility to balance anything
> >
> >>
> >> However, idle_balance() pays no attention to sd->balance_interval, yet
> >> it will still lead to an increase in balance_interval in case of
> >> pinned tasks.
> >
> > I would say that this makes sense as there is no way to pull the
> > pinned task on this rq so running periodic load balance (busy or idle)
> > is a waste of time  and resources
> >
>
> Increasing the delay for the next periodic balance when there's pinned tasks
> makes sense. The issue I have here is that the balance_interval can potentially
> be doubled at every failed idle_balance() without waiting for a sufficient
> amount of time.
>
> When going through periodic balance, load_balance() is gated by:
>
>     time_after_eq(jiffies, sd->last_balance + interval)
>
> So if we previously ran load_balance() and failed to pull anything because
> of task affinity, we'll wait for at least the new interval before attempting
> another load_balance().
>
> On the other hand, idle_balance() will call load_balance() regardless of
> balance_interval values. So if the task composition allows it, we can repeatedly
> fail to load_balance() and thus double balance_interval at every idle_balance()
> "tick". So you can go from 8ms to 512ms in a few successive idle_balance()s,
> whereas periodic balance would have to wait for at least the sum of all
> intermediate balance_intervals leading to 512 (8, 16, 32, etc).
>
> >>
> >> If we're going through several newidle balances (e.g. we have a
> >> periodic task), this can lead to a huge increase of the
> >
> > This only happen when overload is set which means that there are tasks
> > on another CPU but this one can't pull one of them.
> >
> > Can you give us details about the use case that you care about ?
> >
>
> It's the same as I presented last week - devlib (some python target communication

ok. you mean at linaro connect

> library I use) has some phase where it spawns at lot of tasks at once to do
> some setup (busybox, shutils, bash...). Some of those tasks are pinned to a
> particular CPU, and that can lead to failed load_balance() - and to make things
> worse, there's a lot of idle_balance() in there.
>
> Eventually when I start running my actual workload a few ~100ms later, it's
> impacted by that balance_interval increase.
>
> Admittedly that's a specific use-case, but I don't think this quick increase
> is something that was intended.

Yes, this really sounds like a specific use-case. Unluckily you find a
way to reach max interval quite easily/every time with your test
set-up but keep in mind that this can also happen in real system life
and without using the newly idle path.
So if it's a problem to have a interval at max value for your unitary
test, it probably means that it's a problem for the system and the max
value is too high

Taking advantage of all load_balance event to update the interval
makes sense to me. It seems that you care about a short and regular
balance interval more that minimizing overhead of load balancing.
At the opposite, i'm sure that you don't complain if newly idle load
balance resets the interval to min value and overwrite what the
periodic load balance set up previously :-)

>
> > Also, If the interval reaches a value that becomes a problem for you
> > why don't you reduce the max_interval with sysfs ? This max interval
> > can be reach in several other occasions
> >
>
> True, although I saw that as an unwanted behaviour rather than a tuning problem.
>
> >> balance_interval in a very small amount of time.
> >>
> >> To prevent that, don't increase the balance interval when going
> >> through a newidle balance.
> >>
> >> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> >> ---
> >>  kernel/sched/fair.c | 19 ++++++++++++++-----
> >>  1 file changed, 14 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 6bd142d..620910d 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -8782,13 +8782,22 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> >>         sd->nr_balance_failed = 0;
> >>
> >>  out_one_pinned:
> >> +       ld_moved = 0;
> >> +
> >> +       /*
> >> +        * idle_balance() disregards balance intervals, so we could repeatedly
> >> +        * reach this code, which would lead to balance_interval skyrocketting
> >> +        * in a short amount of time. Skip the balance_interval increase logic
> >> +        * to avoid that.
> >> +        */
> >> +       if (env.idle == CPU_NEWLY_IDLE)
> >> +               goto out;
> >> +
> >>         /* tune up the balancing interval */
> >> -       if (((env.flags & LBF_ALL_PINNED) &&
> >> -                       sd->balance_interval < MAX_PINNED_INTERVAL) ||
> >> -                       (sd->balance_interval < sd->max_interval))
> >> +       if ((env.flags & LBF_ALL_PINNED &&
> >> +            sd->balance_interval < MAX_PINNED_INTERVAL) ||
> >> +           (sd->balance_interval < sd->max_interval))
> >
> > This looks like a code cleanup that is not related to the subject
> >
>
> Absolutely, I figured I could get away with it but if that's preferred I can
> split that more clearly

That's easier to make separate review.

>
> >>                 sd->balance_interval *= 2;
> >> -
> >> -       ld_moved = 0;
> >>  out:
> >>         return ld_moved;
> >>  }
> >> --
> >> 2.7.4
> >>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Don't increase sd->balance_interval on newidle balance
  2018-09-26 10:33     ` Vincent Guittot
@ 2018-09-26 13:16       ` Valentin Schneider
  2018-09-26 13:56         ` Vincent Guittot
  2018-09-26 13:17       ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Valentin Schneider @ 2018-09-26 13:16 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Dietmar Eggemann,
	Patrick Bellasi

On 26/09/18 11:33, Vincent Guittot wrote:
> On Wed, 26 Sep 2018 at 11:35, Valentin Schneider
> [...]
>>> Can you give us details about the use case that you care about ?
>>>
>>
>> It's the same as I presented last week - devlib (some python target communication
> 
> ok. you mean at linaro connect
> 

Yeah, sorry.

>> library I use) has some phase where it spawns at lot of tasks at once to do
>> some setup (busybox, shutils, bash...). Some of those tasks are pinned to a
>> particular CPU, and that can lead to failed load_balance() - and to make things
>> worse, there's a lot of idle_balance() in there.
>>
>> Eventually when I start running my actual workload a few ~100ms later, it's
>> impacted by that balance_interval increase.
>>
>> Admittedly that's a specific use-case, but I don't think this quick increase
>> is something that was intended.
> 
> Yes, this really sounds like a specific use-case. Unluckily you find a
> way to reach max interval quite easily/every time with your test
> set-up but keep in mind that this can also happen in real system life
> and without using the newly idle path.
> So if it's a problem to have a interval at max value for your unitary
> test, it probably means that it's a problem for the system and the max
> value is too high
> 

Limiting the max value for those tests is actually a good point, and I think
I'll give it a shot. However...

> Taking advantage of all load_balance event to update the interval
> makes sense to me. It seems that you care about a short and regular
> balance interval more that minimizing overhead of load balancing.
> At the opposite, i'm sure that you don't complain if newly idle load
> balance resets the interval to min value and overwrite what the
> periodic load balance set up previously :-)
> 

...My concern is more about increasing balance_interval faster than we should.
The proposed "fix" is to prevent any balance_interval increase when going
through idle_balance(), but Patrick (added in cc) suggested offline that
we could simply limit the rate at which we do these increases, so that they
match what we do in rebalance_domains().

We'd still increase balance_interval on failed newidle load_balance(), but
we wouldn't increase from min to max in e.g. 50ms. Would that work better
for you?

>> [...]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Don't increase sd->balance_interval on newidle balance
  2018-09-26 10:33     ` Vincent Guittot
  2018-09-26 13:16       ` Valentin Schneider
@ 2018-09-26 13:17       ` Peter Zijlstra
  2018-09-26 13:58         ` Vincent Guittot
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2018-09-26 13:17 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Valentin Schneider, linux-kernel, Ingo Molnar, Dietmar Eggemann

On Wed, Sep 26, 2018 at 12:33:25PM +0200, Vincent Guittot wrote:
> On Wed, 26 Sep 2018 at 11:35, Valentin Schneider

> > library I use) has some phase where it spawns at lot of tasks at once to do
> > some setup (busybox, shutils, bash...). Some of those tasks are pinned to a
> > particular CPU, and that can lead to failed load_balance() - and to make things
> > worse, there's a lot of idle_balance() in there.
> >
> > Eventually when I start running my actual workload a few ~100ms later, it's
> > impacted by that balance_interval increase.
> >
> > Admittedly that's a specific use-case, but I don't think this quick increase
> > is something that was intended.
> 
> Yes, this really sounds like a specific use-case. Unluckily you find a
> way to reach max interval quite easily/every time with your test
> set-up but keep in mind that this can also happen in real system life
> and without using the newly idle path.
> So if it's a problem to have a interval at max value for your unitary
> test, it probably means that it's a problem for the system and the max
> value is too high
> 
> Taking advantage of all load_balance event to update the interval
> makes sense to me. It seems that you care about a short and regular
> balance interval more that minimizing overhead of load balancing.
> At the opposite, i'm sure that you don't complain if newly idle load
> balance resets the interval to min value and overwrite what the
> periodic load balance set up previously :-)

Well, we've excluded newidle balance from updating such stats before. So
in that respect the patch proposed by Valentin isn't weird.

Consider for example:

  58b26c4c0257 ("sched: Increment cache_nice_tries only on periodic lb")

In general I think it makes perfect sense to exclude newidle balance
from such stats; you get much more stable results from the regular
balance.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Don't increase sd->balance_interval on newidle balance
  2018-09-26 13:16       ` Valentin Schneider
@ 2018-09-26 13:56         ` Vincent Guittot
  0 siblings, 0 replies; 8+ messages in thread
From: Vincent Guittot @ 2018-09-26 13:56 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Dietmar Eggemann,
	Patrick Bellasi

On Wed, 26 Sep 2018 at 15:16, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> On 26/09/18 11:33, Vincent Guittot wrote:
> > On Wed, 26 Sep 2018 at 11:35, Valentin Schneider
> > [...]
> >>> Can you give us details about the use case that you care about ?
> >>>
> >>
> >> It's the same as I presented last week - devlib (some python target communication
> >
> > ok. you mean at linaro connect
> >
>
> Yeah, sorry.
>
> >> library I use) has some phase where it spawns at lot of tasks at once to do
> >> some setup (busybox, shutils, bash...). Some of those tasks are pinned to a
> >> particular CPU, and that can lead to failed load_balance() - and to make things
> >> worse, there's a lot of idle_balance() in there.
> >>
> >> Eventually when I start running my actual workload a few ~100ms later, it's
> >> impacted by that balance_interval increase.
> >>
> >> Admittedly that's a specific use-case, but I don't think this quick increase
> >> is something that was intended.
> >
> > Yes, this really sounds like a specific use-case. Unluckily you find a
> > way to reach max interval quite easily/every time with your test
> > set-up but keep in mind that this can also happen in real system life
> > and without using the newly idle path.
> > So if it's a problem to have a interval at max value for your unitary
> > test, it probably means that it's a problem for the system and the max
> > value is too high
> >
>
> Limiting the max value for those tests is actually a good point, and I think
> I'll give it a shot. However...
>
> > Taking advantage of all load_balance event to update the interval
> > makes sense to me. It seems that you care about a short and regular
> > balance interval more that minimizing overhead of load balancing.
> > At the opposite, i'm sure that you don't complain if newly idle load
> > balance resets the interval to min value and overwrite what the
> > periodic load balance set up previously :-)
> >
>
> ...My concern is more about increasing balance_interval faster than we should.
> The proposed "fix" is to prevent any balance_interval increase when going
> through idle_balance(), but Patrick (added in cc) suggested offline that
> we could simply limit the rate at which we do these increases, so that they
> match what we do in rebalance_domains().
>
> We'd still increase balance_interval on failed newidle load_balance(), but
> we wouldn't increase from min to max in e.g. 50ms. Would that work better
> for you?

adding a rate limite would be a reasonnable solution.
>
> >> [...]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] sched/fair: Don't increase sd->balance_interval on newidle balance
  2018-09-26 13:17       ` Peter Zijlstra
@ 2018-09-26 13:58         ` Vincent Guittot
  0 siblings, 0 replies; 8+ messages in thread
From: Vincent Guittot @ 2018-09-26 13:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Valentin Schneider, linux-kernel, Ingo Molnar, Dietmar Eggemann

On Wed, 26 Sep 2018 at 15:17, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Sep 26, 2018 at 12:33:25PM +0200, Vincent Guittot wrote:
> > On Wed, 26 Sep 2018 at 11:35, Valentin Schneider
>
> > > library I use) has some phase where it spawns at lot of tasks at once to do
> > > some setup (busybox, shutils, bash...). Some of those tasks are pinned to a
> > > particular CPU, and that can lead to failed load_balance() - and to make things
> > > worse, there's a lot of idle_balance() in there.
> > >
> > > Eventually when I start running my actual workload a few ~100ms later, it's
> > > impacted by that balance_interval increase.
> > >
> > > Admittedly that's a specific use-case, but I don't think this quick increase
> > > is something that was intended.
> >
> > Yes, this really sounds like a specific use-case. Unluckily you find a
> > way to reach max interval quite easily/every time with your test
> > set-up but keep in mind that this can also happen in real system life
> > and without using the newly idle path.
> > So if it's a problem to have a interval at max value for your unitary
> > test, it probably means that it's a problem for the system and the max
> > value is too high
> >
> > Taking advantage of all load_balance event to update the interval
> > makes sense to me. It seems that you care about a short and regular
> > balance interval more that minimizing overhead of load balancing.
> > At the opposite, i'm sure that you don't complain if newly idle load
> > balance resets the interval to min value and overwrite what the
> > periodic load balance set up previously :-)
>
> Well, we've excluded newidle balance from updating such stats before. So
> in that respect the patch proposed by Valentin isn't weird.
>
> Consider for example:
>
>   58b26c4c0257 ("sched: Increment cache_nice_tries only on periodic lb")
>
> In general I think it makes perfect sense to exclude newidle balance
> from such stats; you get much more stable results from the regular
> balance.

Ok so in this case we should exclude all update of  the interval
during newly idle and not only some of them

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-09-26 13:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-25 17:35 [PATCH] sched/fair: Don't increase sd->balance_interval on newidle balance Valentin Schneider
2018-09-26  8:13 ` Vincent Guittot
2018-09-26  9:35   ` Valentin Schneider
2018-09-26 10:33     ` Vincent Guittot
2018-09-26 13:16       ` Valentin Schneider
2018-09-26 13:56         ` Vincent Guittot
2018-09-26 13:17       ` Peter Zijlstra
2018-09-26 13:58         ` Vincent Guittot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.