linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/fair: improve spreading of utilization
@ 2020-03-12 16:54 Vincent Guittot
  2020-03-13 10:26 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Vincent Guittot @ 2020-03-12 16:54 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel
  Cc: Vincent Guittot

During load_balancing, a group with spare capacity will try to pull some
utilizations from an overloaded group. In such case, the load balance
looks for the runqueue with the highest utilization. Nevertheless, it
should also ensure that there are some pending tasks to pull otherwise
the load balance will fail to pull a task and the spread of the load will
be delayed.

This situation is quite transient but it's possible to highlight the
effect with a short run of sysbench test so the time to spread task impacts
the global result significantly.

Below are the average results for 15 iterations on an arm64 octo core:
sysbench --test=cpu --num-threads=8  --max-requests=1000 run

                           tip/sched/core  +patchset
total time:                172ms           158ms
per-request statistics:
         avg:                1.337ms         1.244ms
         max:               21.191ms        10.753ms

The average max doesn't fully reflect the wide spread of the value which
ranges from 1.350ms to more than 41ms for the tip/sched/core and from
1.350ms to 21ms with the patch.

Other factors like waiting for an idle load balance or cache hotness
can delay the spreading of the tasks which explains why we can still
have up to 21ms with the patch.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3c8a379c357e..97a0307312d9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		case migrate_util:
 			util = cpu_util(cpu_of(rq));
 
+			/*
+			 * Don't try to pull utilization from a CPU with one
+			 * running task. Whatever its utilization, we will fail
+			 * detach the task.
+			 */
+			if (nr_running <= 1)
+				continue;
+
 			if (busiest_util < util) {
 				busiest_util = util;
 				busiest = rq;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched/fair: improve spreading of utilization
  2020-03-12 16:54 [PATCH] sched/fair: improve spreading of utilization Vincent Guittot
@ 2020-03-13 10:26 ` Peter Zijlstra
  2020-03-13 11:00 ` Valentin Schneider
  2020-03-20 12:58 ` [tip: sched/core] sched/fair: Improve " tip-bot2 for Vincent Guittot
  2 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2020-03-13 10:26 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
	linux-kernel

On Thu, Mar 12, 2020 at 05:54:29PM +0100, Vincent Guittot wrote:
> During load_balancing, a group with spare capacity will try to pull some
> utilizations from an overloaded group. In such case, the load balance
> looks for the runqueue with the highest utilization. Nevertheless, it
> should also ensure that there are some pending tasks to pull otherwise
> the load balance will fail to pull a task and the spread of the load will
> be delayed.
> 
> This situation is quite transient but it's possible to highlight the
> effect with a short run of sysbench test so the time to spread task impacts
> the global result significantly.
> 
> Below are the average results for 15 iterations on an arm64 octo core:
> sysbench --test=cpu --num-threads=8  --max-requests=1000 run
> 
>                            tip/sched/core  +patchset
> total time:                172ms           158ms
> per-request statistics:
>          avg:                1.337ms         1.244ms
>          max:               21.191ms        10.753ms
> 
> The average max doesn't fully reflect the wide spread of the value which
> ranges from 1.350ms to more than 41ms for the tip/sched/core and from
> 1.350ms to 21ms with the patch.
> 
> Other factors like waiting for an idle load balance or cache hotness
> can delay the spreading of the tasks which explains why we can still
> have up to 21ms with the patch.

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched/fair: improve spreading of utilization
  2020-03-12 16:54 [PATCH] sched/fair: improve spreading of utilization Vincent Guittot
  2020-03-13 10:26 ` Peter Zijlstra
@ 2020-03-13 11:00 ` Valentin Schneider
  2020-03-13 11:24   ` Vincent Guittot
  2020-03-20 12:58 ` [tip: sched/core] sched/fair: Improve " tip-bot2 for Vincent Guittot
  2 siblings, 1 reply; 14+ messages in thread
From: Valentin Schneider @ 2020-03-13 11:00 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, linux-kernel


On Thu, Mar 12 2020, Vincent Guittot wrote:
>  kernel/sched/fair.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3c8a379c357e..97a0307312d9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>               case migrate_util:
>                       util = cpu_util(cpu_of(rq));
>
> +			/*
> +			 * Don't try to pull utilization from a CPU with one
> +			 * running task. Whatever its utilization, we will fail
> +			 * detach the task.
> +			 */
> +			if (nr_running <= 1)
> +				continue;
> +

Doesn't this break misfit? If the busiest group is group_misfit_task, it
is totally valid for the runqueues to have a single running task -
that's the CPU-bound task we want to upmigrate.

If the busiest rq has only a single running task, we'll skip the
detach_tasks() block and go straight to the active balance bits.
Misfit balancing totally relies on this, and IMO ASYM_PACKING does
too. Looking at voluntary_active_balance(), it seems your change also
goes against the one added by
  1aaf90a4b88a ("sched: Move CFS tasks to CPUs with higher capacity")

The bandaid here would be gate this 'continue' with checks against the
busiest_group_type, but that's only a loose link wrt
voluntary_active_balance().

>                       if (busiest_util < util) {
>                               busiest_util = util;
>                               busiest = rq;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched/fair: improve spreading of utilization
  2020-03-13 11:00 ` Valentin Schneider
@ 2020-03-13 11:24   ` Vincent Guittot
  2020-03-13 11:28     ` Valentin Schneider
  0 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2020-03-13 11:24 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel

On Fri, 13 Mar 2020 at 12:00, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
>
> On Thu, Mar 12 2020, Vincent Guittot wrote:
> >  kernel/sched/fair.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 3c8a379c357e..97a0307312d9 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >               case migrate_util:
> >                       util = cpu_util(cpu_of(rq));
> >
> > +                     /*
> > +                      * Don't try to pull utilization from a CPU with one
> > +                      * running task. Whatever its utilization, we will fail
> > +                      * detach the task.
> > +                      */
> > +                     if (nr_running <= 1)
> > +                             continue;
> > +
>
> Doesn't this break misfit? If the busiest group is group_misfit_task, it
> is totally valid for the runqueues to have a single running task -
> that's the CPU-bound task we want to upmigrate.

 group_misfit_task has its dedicated migrate_misfit case

>
> If the busiest rq has only a single running task, we'll skip the
> detach_tasks() block and go straight to the active balance bits.
> Misfit balancing totally relies on this, and IMO ASYM_PACKING does
> too. Looking at voluntary_active_balance(), it seems your change also
> goes against the one added by
>   1aaf90a4b88a ("sched: Move CFS tasks to CPUs with higher capacity")
>
> The bandaid here would be gate this 'continue' with checks against the
> busiest_group_type, but that's only a loose link wrt
> voluntary_active_balance().
>
> >                       if (busiest_util < util) {
> >                               busiest_util = util;
> >                               busiest = rq;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched/fair: improve spreading of utilization
  2020-03-13 11:24   ` Vincent Guittot
@ 2020-03-13 11:28     ` Valentin Schneider
  2020-03-13 12:42       ` Valentin Schneider
  0 siblings, 1 reply; 14+ messages in thread
From: Valentin Schneider @ 2020-03-13 11:28 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Valentin Schneider, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	linux-kernel


On Fri, Mar 13 2020, Vincent Guittot wrote:
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index 3c8a379c357e..97a0307312d9 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>> >               case migrate_util:
>> >                       util = cpu_util(cpu_of(rq));
>> >
>> > +                     /*
>> > +                      * Don't try to pull utilization from a CPU with one
>> > +                      * running task. Whatever its utilization, we will fail
>> > +                      * detach the task.
>> > +                      */
>> > +                     if (nr_running <= 1)
>> > +                             continue;
>> > +
>>
>> Doesn't this break misfit? If the busiest group is group_misfit_task, it
>> is totally valid for the runqueues to have a single running task -
>> that's the CPU-bound task we want to upmigrate.
>
>  group_misfit_task has its dedicated migrate_misfit case
>

Doh, yes, sorry. I think my rambling on ASYM_PACKING / reduced capacity
migration is still relevant, though.

>>
>> If the busiest rq has only a single running task, we'll skip the
>> detach_tasks() block and go straight to the active balance bits.
>> Misfit balancing totally relies on this, and IMO ASYM_PACKING does
>> too. Looking at voluntary_active_balance(), it seems your change also
>> goes against the one added by
>>   1aaf90a4b88a ("sched: Move CFS tasks to CPUs with higher capacity")
>>
>> The bandaid here would be gate this 'continue' with checks against the
>> busiest_group_type, but that's only a loose link wrt
>> voluntary_active_balance().
>>
>> >                       if (busiest_util < util) {
>> >                               busiest_util = util;
>> >                               busiest = rq;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched/fair: improve spreading of utilization
  2020-03-13 11:28     ` Valentin Schneider
@ 2020-03-13 12:42       ` Valentin Schneider
  2020-03-13 12:55         ` Vincent Guittot
  0 siblings, 1 reply; 14+ messages in thread
From: Valentin Schneider @ 2020-03-13 12:42 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Valentin Schneider, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	linux-kernel


On Fri, Mar 13 2020, Valentin Schneider wrote:
> On Fri, Mar 13 2020, Vincent Guittot wrote:
>>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> > index 3c8a379c357e..97a0307312d9 100644
>>> > --- a/kernel/sched/fair.c
>>> > +++ b/kernel/sched/fair.c
>>> > @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>>> >               case migrate_util:
>>> >                       util = cpu_util(cpu_of(rq));
>>> >
>>> > +                     /*
>>> > +                      * Don't try to pull utilization from a CPU with one
>>> > +                      * running task. Whatever its utilization, we will fail
>>> > +                      * detach the task.
>>> > +                      */
>>> > +                     if (nr_running <= 1)
>>> > +                             continue;
>>> > +
>>>
>>> Doesn't this break misfit? If the busiest group is group_misfit_task, it
>>> is totally valid for the runqueues to have a single running task -
>>> that's the CPU-bound task we want to upmigrate.
>>
>>  group_misfit_task has its dedicated migrate_misfit case
>>
>
> Doh, yes, sorry. I think my rambling on ASYM_PACKING / reduced capacity
> migration is still relevant, though.
>

And with more coffee that's another Doh, ASYM_PACKING would end up as
migrate_task. So this only affects the reduced capacity migration, which
might be hard to notice in benchmarks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched/fair: improve spreading of utilization
  2020-03-13 12:42       ` Valentin Schneider
@ 2020-03-13 12:55         ` Vincent Guittot
  2020-03-13 14:26           ` Vincent Guittot
  0 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2020-03-13 12:55 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel

On Fri, 13 Mar 2020 at 13:42, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
>
> On Fri, Mar 13 2020, Valentin Schneider wrote:
> > On Fri, Mar 13 2020, Vincent Guittot wrote:
> >>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> > index 3c8a379c357e..97a0307312d9 100644
> >>> > --- a/kernel/sched/fair.c
> >>> > +++ b/kernel/sched/fair.c
> >>> > @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >>> >               case migrate_util:
> >>> >                       util = cpu_util(cpu_of(rq));
> >>> >
> >>> > +                     /*
> >>> > +                      * Don't try to pull utilization from a CPU with one
> >>> > +                      * running task. Whatever its utilization, we will fail
> >>> > +                      * detach the task.
> >>> > +                      */
> >>> > +                     if (nr_running <= 1)
> >>> > +                             continue;
> >>> > +
> >>>
> >>> Doesn't this break misfit? If the busiest group is group_misfit_task, it
> >>> is totally valid for the runqueues to have a single running task -
> >>> that's the CPU-bound task we want to upmigrate.
> >>
> >>  group_misfit_task has its dedicated migrate_misfit case
> >>
> >
> > Doh, yes, sorry. I think my rambling on ASYM_PACKING / reduced capacity
> > migration is still relevant, though.
> >
>
> And with more coffee that's another Doh, ASYM_PACKING would end up as
> migrate_task. So this only affects the reduced capacity migration, which

yes  ASYM_PACKING uses migrate_task and the case of reduced capacity
would use it too and would not be impacted by this patch. I say
"would" because the original rework of load balance got rid of this
case. I'm going to prepare a separate fix  for this

> might be hard to notice in benchmarks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched/fair: improve spreading of utilization
  2020-03-13 12:55         ` Vincent Guittot
@ 2020-03-13 14:26           ` Vincent Guittot
  2020-03-13 15:47             ` Valentin Schneider
  0 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2020-03-13 14:26 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel

On Fri, 13 Mar 2020 at 13:55, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Fri, 13 Mar 2020 at 13:42, Valentin Schneider
> <valentin.schneider@arm.com> wrote:
> >
> >
> > On Fri, Mar 13 2020, Valentin Schneider wrote:
> > > On Fri, Mar 13 2020, Vincent Guittot wrote:
> > >>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > >>> > index 3c8a379c357e..97a0307312d9 100644
> > >>> > --- a/kernel/sched/fair.c
> > >>> > +++ b/kernel/sched/fair.c
> > >>> > @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> > >>> >               case migrate_util:
> > >>> >                       util = cpu_util(cpu_of(rq));
> > >>> >
> > >>> > +                     /*
> > >>> > +                      * Don't try to pull utilization from a CPU with one
> > >>> > +                      * running task. Whatever its utilization, we will fail
> > >>> > +                      * detach the task.
> > >>> > +                      */
> > >>> > +                     if (nr_running <= 1)
> > >>> > +                             continue;
> > >>> > +
> > >>>
> > >>> Doesn't this break misfit? If the busiest group is group_misfit_task, it
> > >>> is totally valid for the runqueues to have a single running task -
> > >>> that's the CPU-bound task we want to upmigrate.
> > >>
> > >>  group_misfit_task has its dedicated migrate_misfit case
> > >>
> > >
> > > Doh, yes, sorry. I think my rambling on ASYM_PACKING / reduced capacity
> > > migration is still relevant, though.
> > >
> >
> > And with more coffee that's another Doh, ASYM_PACKING would end up as
> > migrate_task. So this only affects the reduced capacity migration, which
>
> yes  ASYM_PACKING uses migrate_task and the case of reduced capacity
> would use it too and would not be impacted by this patch. I say
> "would" because the original rework of load balance got rid of this
> case. I'm going to prepare a separate fix  for this

After more thought, I think that we are safe for reduced capacity too
because this is handled in the migrate_load case. In my previous
reply, I was thinking of  the case where rq is not overloaded but cpu
has reduced capacity which is not handled. But in such case, we don't
have to force the migration of the task because there is still enough
capacity otherwise rq would be overloaded and we are back to the case
already handled

>
> > might be hard to notice in benchmarks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched/fair: improve spreading of utilization
  2020-03-13 14:26           ` Vincent Guittot
@ 2020-03-13 15:47             ` Valentin Schneider
  2020-03-13 16:09               ` Vincent Guittot
  0 siblings, 1 reply; 14+ messages in thread
From: Valentin Schneider @ 2020-03-13 15:47 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel


On Fri, Mar 13 2020, Vincent Guittot wrote:
>> > And with more coffee that's another Doh, ASYM_PACKING would end up as
>> > migrate_task. So this only affects the reduced capacity migration, which
>>
>> yes  ASYM_PACKING uses migrate_task and the case of reduced capacity
>> would use it too and would not be impacted by this patch. I say
>> "would" because the original rework of load balance got rid of this
>> case. I'm going to prepare a separate fix  for this
>
> After more thought, I think that we are safe for reduced capacity too
> because this is handled in the migrate_load case. In my previous
> reply, I was thinking of  the case where rq is not overloaded but cpu
> has reduced capacity which is not handled. But in such case, we don't
> have to force the migration of the task because there is still enough
> capacity otherwise rq would be overloaded and we are back to the case
> already handled
>

Good point on the capacity reduction vs group_is_overloaded.

That said, can't we also reach this with migrate_task? Say the local
group is entirely idle, and the busiest group has a few non-idle CPUs
but they all have at most 1 running task. AFAICT we would still go to
calculate_imbalance(), and try to balance out the number of idle CPUs.

If the migration_type is migrate_util, that can't happen because of this
change. Since we have this progressive balancing strategy (tasks -> util
-> load), it's a bit odd to have this "gap" in the middle where we get
one less possibility to trigger active balance, don't you think? That
is, providing I didn't say nonsense again :)

It's not a super big deal, but I think it's nice if we can maintain a
consistent / gradual migration policy.

>>
>> > might be hard to notice in benchmarks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched/fair: improve spreading of utilization
  2020-03-13 15:47             ` Valentin Schneider
@ 2020-03-13 16:09               ` Vincent Guittot
  2020-03-13 16:57                 ` Valentin Schneider
  0 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2020-03-13 16:09 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel

On Fri, 13 Mar 2020 at 16:47, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
>
> On Fri, Mar 13 2020, Vincent Guittot wrote:
> >> > And with more coffee that's another Doh, ASYM_PACKING would end up as
> >> > migrate_task. So this only affects the reduced capacity migration, which
> >>
> >> yes  ASYM_PACKING uses migrate_task and the case of reduced capacity
> >> would use it too and would not be impacted by this patch. I say
> >> "would" because the original rework of load balance got rid of this
> >> case. I'm going to prepare a separate fix  for this
> >
> > After more thought, I think that we are safe for reduced capacity too
> > because this is handled in the migrate_load case. In my previous
> > reply, I was thinking of  the case where rq is not overloaded but cpu
> > has reduced capacity which is not handled. But in such case, we don't
> > have to force the migration of the task because there is still enough
> > capacity otherwise rq would be overloaded and we are back to the case
> > already handled
> >
>
> Good point on the capacity reduction vs group_is_overloaded.
>
> That said, can't we also reach this with migrate_task? Say the local

The test has only been added for migrate_util so migrate_task is not impacted

> group is entirely idle, and the busiest group has a few non-idle CPUs
> but they all have at most 1 running task. AFAICT we would still go to
> calculate_imbalance(), and try to balance out the number of idle CPUs.

such case is handled by migrate_task when we try to even the number of
tasks between groups

>
> If the migration_type is migrate_util, that can't happen because of this
> change. Since we have this progressive balancing strategy (tasks -> util
> -> load), it's a bit odd to have this "gap" in the middle where we get
> one less possibility to trigger active balance, don't you think? That
> is, providing I didn't say nonsense again :)

Right now, I can't think of a use case that could trigger such
situation because we use migrate_util when source is overloaded which
means that there is at least one waiting task and we favor this task
in priority

>
> It's not a super big deal, but I think it's nice if we can maintain a
> consistent / gradual migration policy.
>
> >>
> >> > might be hard to notice in benchmarks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched/fair: improve spreading of utilization
  2020-03-13 16:09               ` Vincent Guittot
@ 2020-03-13 16:57                 ` Valentin Schneider
  2020-03-13 17:12                   ` Vincent Guittot
  0 siblings, 1 reply; 14+ messages in thread
From: Valentin Schneider @ 2020-03-13 16:57 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel


On Fri, Mar 13 2020, Vincent Guittot wrote:

>> Good point on the capacity reduction vs group_is_overloaded.
>>
>> That said, can't we also reach this with migrate_task? Say the local
>
> The test has only been added for migrate_util so migrate_task is not impacted
>
>> group is entirely idle, and the busiest group has a few non-idle CPUs
>> but they all have at most 1 running task. AFAICT we would still go to
>> calculate_imbalance(), and try to balance out the number of idle CPUs.
>
> such case is handled by migrate_task when we try to even the number of
> tasks between groups
>
>>
>> If the migration_type is migrate_util, that can't happen because of this
>> change. Since we have this progressive balancing strategy (tasks -> util
>> -> load), it's a bit odd to have this "gap" in the middle where we get
>> one less possibility to trigger active balance, don't you think? That
>> is, providing I didn't say nonsense again :)
>
> Right now, I can't think of a use case that could trigger such
> situation because we use migrate_util when source is overloaded which
> means that there is at least one waiting task and we favor this task
> in priority
>

Right, what I was trying to say is that AIUI migration_type ==
migrate_task with <= 1 running task per CPU in the busiest group can
*currently* lead to a balance attempt, and thus a potential active
balance.

Consider a local group of 4 idle CPUs, and a busiest group of 3 busy 1
idle CPUs, each busy having only 1 running task. That busiest group
would be group_has_spare, so we would compute an imbalance of
(4-1) / 2 == 1 task to move. We'll proceed with the load balance, but
we'll only move things if we go through an active_balance.

My point is that if we prevent this for migrate_util, it would make
sense to prevent it for migrate_task, but it's not straightforward since
we have things like ASYM_PACKING.

>>
>> It's not a super big deal, but I think it's nice if we can maintain a
>> consistent / gradual migration policy.
>>
>> >>
>> >> > might be hard to notice in benchmarks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched/fair: improve spreading of utilization
  2020-03-13 16:57                 ` Valentin Schneider
@ 2020-03-13 17:12                   ` Vincent Guittot
  2020-03-13 17:34                     ` Valentin Schneider
  0 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2020-03-13 17:12 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel

On Fri, 13 Mar 2020 at 17:57, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
>
> On Fri, Mar 13 2020, Vincent Guittot wrote:
>
> >> Good point on the capacity reduction vs group_is_overloaded.
> >>
> >> That said, can't we also reach this with migrate_task? Say the local
> >
> > The test has only been added for migrate_util so migrate_task is not impacted
> >
> >> group is entirely idle, and the busiest group has a few non-idle CPUs
> >> but they all have at most 1 running task. AFAICT we would still go to
> >> calculate_imbalance(), and try to balance out the number of idle CPUs.
> >
> > such case is handled by migrate_task when we try to even the number of
> > tasks between groups
> >
> >>
> >> If the migration_type is migrate_util, that can't happen because of this
> >> change. Since we have this progressive balancing strategy (tasks -> util
> >> -> load), it's a bit odd to have this "gap" in the middle where we get
> >> one less possibility to trigger active balance, don't you think? That
> >> is, providing I didn't say nonsense again :)
> >
> > Right now, I can't think of a use case that could trigger such
> > situation because we use migrate_util when source is overloaded which
> > means that there is at least one waiting task and we favor this task
> > in priority
> >
>
> Right, what I was trying to say is that AIUI migration_type ==
> migrate_task with <= 1 running task per CPU in the busiest group can
> *currently* lead to a balance attempt, and thus a potential active
> balance.
>
> Consider a local group of 4 idle CPUs, and a busiest group of 3 busy 1
> idle CPUs, each busy having only 1 running task. That busiest group
> would be group_has_spare, so we would compute an imbalance of
> (4-1) / 2 == 1 task to move. We'll proceed with the load balance, but
> we'll only move things if we go through an active_balance.

yes because we want to even as much as possible the number of tasks per group

>
> My point is that if we prevent this for migrate_util, it would make
> sense to prevent it for migrate_task, but it's not straightforward since

hmm but we don't want to prevent this active balance for migrate_task
because of cases like the one you mentioned above.

we might consider to finally select a CPU with only 1 running task
with migrate_util if there is no other CPU with more than 1 task. But
this would complexify the code and I don't think it's possible because
migrate_util is used to pull some utilizations from an overloaded
group which must have a CPU with a waiting task to be overloaded.

> we have things like ASYM_PACKING.
>
> >>
> >> It's not a super big deal, but I think it's nice if we can maintain a
> >> consistent / gradual migration policy.
> >>
> >> >>
> >> >> > might be hard to notice in benchmarks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched/fair: improve spreading of utilization
  2020-03-13 17:12                   ` Vincent Guittot
@ 2020-03-13 17:34                     ` Valentin Schneider
  0 siblings, 0 replies; 14+ messages in thread
From: Valentin Schneider @ 2020-03-13 17:34 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel


On Fri, Mar 13 2020, Vincent Guittot wrote:
>> My point is that if we prevent this for migrate_util, it would make
>> sense to prevent it for migrate_task, but it's not straightforward since
>
> hmm but we don't want to prevent this active balance for migrate_task
> because of cases like the one you mentioned above.
>
> we might consider to finally select a CPU with only 1 running task
> with migrate_util if there is no other CPU with more than 1 task. But
> this would complexify the code and I don't think it's possible because
> migrate_util is used to pull some utilizations from an overloaded
> group which must have a CPU with a waiting task to be overloaded.
>

OK, so what we may want in the future is a tighter link between
find_busiest_queue() and voluntary_active_balance(). I don't see a neat
way of doing this right now, I'll ponder over it.

Thanks for keeping up with my rambling.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip: sched/core] sched/fair: Improve spreading of utilization
  2020-03-12 16:54 [PATCH] sched/fair: improve spreading of utilization Vincent Guittot
  2020-03-13 10:26 ` Peter Zijlstra
  2020-03-13 11:00 ` Valentin Schneider
@ 2020-03-20 12:58 ` tip-bot2 for Vincent Guittot
  2 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2020-03-20 12:58 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Vincent Guittot, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     c32b4308295aaaaedd5beae56cb42e205ae63e58
Gitweb:        https://git.kernel.org/tip/c32b4308295aaaaedd5beae56cb42e205ae63e58
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Thu, 12 Mar 2020 17:54:29 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 20 Mar 2020 13:06:20 +01:00

sched/fair: Improve spreading of utilization

During load_balancing, a group with spare capacity will try to pull some
utilizations from an overloaded group. In such case, the load balance
looks for the runqueue with the highest utilization. Nevertheless, it
should also ensure that there are some pending tasks to pull otherwise
the load balance will fail to pull a task and the spread of the load will
be delayed.

This situation is quite transient but it's possible to highlight the
effect with a short run of sysbench test so the time to spread task impacts
the global result significantly.

Below are the average results for 15 iterations on an arm64 octo core:
sysbench --test=cpu --num-threads=8  --max-requests=1000 run

                           tip/sched/core  +patchset
total time:                172ms           158ms
per-request statistics:
         avg:                1.337ms         1.244ms
         max:               21.191ms        10.753ms

The average max doesn't fully reflect the wide spread of the value which
ranges from 1.350ms to more than 41ms for the tip/sched/core and from
1.350ms to 21ms with the patch.

Other factors like waiting for an idle load balance or cache hotness
can delay the spreading of the tasks which explains why we can still
have up to 21ms with the patch.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200312165429.990-1-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7aaae2..783356f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9313,6 +9313,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		case migrate_util:
 			util = cpu_util(cpu_of(rq));
 
+			/*
+			 * Don't try to pull utilization from a CPU with one
+			 * running task. Whatever its utilization, we will fail
+			 * detach the task.
+			 */
+			if (nr_running <= 1)
+				continue;
+
 			if (busiest_util < util) {
 				busiest_util = util;
 				busiest = rq;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-03-20 12:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-12 16:54 [PATCH] sched/fair: improve spreading of utilization Vincent Guittot
2020-03-13 10:26 ` Peter Zijlstra
2020-03-13 11:00 ` Valentin Schneider
2020-03-13 11:24   ` Vincent Guittot
2020-03-13 11:28     ` Valentin Schneider
2020-03-13 12:42       ` Valentin Schneider
2020-03-13 12:55         ` Vincent Guittot
2020-03-13 14:26           ` Vincent Guittot
2020-03-13 15:47             ` Valentin Schneider
2020-03-13 16:09               ` Vincent Guittot
2020-03-13 16:57                 ` Valentin Schneider
2020-03-13 17:12                   ` Vincent Guittot
2020-03-13 17:34                     ` Valentin Schneider
2020-03-20 12:58 ` [tip: sched/core] sched/fair: Improve " tip-bot2 for Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).