linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] sched/fair: some fixes for asym_packing
@ 2018-12-20  7:55 Vincent Guittot
  2018-12-20  7:55 ` [PATCH v3 1/3] sched/fair: fix rounding issue for asym packing Vincent Guittot
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Vincent Guittot @ 2018-12-20  7:55 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: valentin.schneider, Morten.Rasmussen, Vincent Guittot

During the review of misfit task patchset, Morten and Valentin raised some
problems with the use of SD_ASYM_PACKING flag on asymmetric system like
hikey960 arm64 big/LITTLE platform. The study of the use cases has shown
some problems that can happen for every systems that use the flag.

The 3 patches fixes the problems raised for lmbench and the rt-app UC that
creates 2 tasks that start as small tasks and then become suddenly always
running tasks. (I can provide the rt-app json if needed)

Changes since v2:
- include other active balance reasons
- set imbalance to avg_load as suggested by Valentin

Changes since v1:
- rebase on tip/sched/core
- changes asym_active_balance() as suggested by Peter

Vincent Guittot (3):
  sched/fair: fix rounding issue for asym packing
  sched/fair: trigger asym_packing during idle load balance
  sched/fair: fix unnecessary increase of balance interval

 kernel/sched/fair.c | 44 ++++++++++++++++++++++++++++----------------
 1 file changed, 28 insertions(+), 16 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/3] sched/fair: fix rounding issue for asym packing
  2018-12-20  7:55 [PATCH v3 0/3] sched/fair: some fixes for asym_packing Vincent Guittot
@ 2018-12-20  7:55 ` Vincent Guittot
  2018-12-20 11:16   ` Valentin Schneider
  2018-12-20 14:54   ` [PATCH v4 " Vincent Guittot
  2018-12-20  7:55 ` [PATCH 2/3] sched/fair: trigger asym_packing during idle load balance Vincent Guittot
  2018-12-20  7:55 ` [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval Vincent Guittot
  2 siblings, 2 replies; 16+ messages in thread
From: Vincent Guittot @ 2018-12-20  7:55 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: valentin.schneider, Morten.Rasmussen, Vincent Guittot

When check_asym_packing() is triggered, the imbalance is set to :
  busiest_stat.avg_load * busiest_stat.group_capacity / SCHED_CAPACITY_SCALE
But busiest_stat.avg_load equals
  sgs->group_load *SCHED_CAPACITY_SCALE / sgs->group_capacity
These divisions can generate a rounding that will make imbalance slightly
lower than the weighted load of the cfs_rq.
But this is enough to skip the rq in find_busiest_queue and prevents asym
migration to happen.

Directly set imbalance to sgs->group_load to remove the rounding.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ca46964..9b31247 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8476,9 +8476,7 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
 	if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
 		return 0;
 
-	env->imbalance = DIV_ROUND_CLOSEST(
-		sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity,
-		SCHED_CAPACITY_SCALE);
+	env->imbalance = sds->busiest_stat.avg_load;
 
 	return 1;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] sched/fair: trigger asym_packing during idle load balance
  2018-12-20  7:55 [PATCH v3 0/3] sched/fair: some fixes for asym_packing Vincent Guittot
  2018-12-20  7:55 ` [PATCH v3 1/3] sched/fair: fix rounding issue for asym packing Vincent Guittot
@ 2018-12-20  7:55 ` Vincent Guittot
  2018-12-20 11:19   ` Valentin Schneider
  2018-12-20  7:55 ` [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval Vincent Guittot
  2 siblings, 1 reply; 16+ messages in thread
From: Vincent Guittot @ 2018-12-20  7:55 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: valentin.schneider, Morten.Rasmussen, Vincent Guittot

newly idle load balance is not always triggered when a cpu becomes idle.
This prevent the scheduler to get a chance to migrate task for asym packing.
Enable active migration because of asym packing during idle load balance too.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9b31247..487c73e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8853,7 +8853,7 @@ static int need_active_balance(struct lb_env *env)
 {
 	struct sched_domain *sd = env->sd;
 
-	if (env->idle == CPU_NEWLY_IDLE) {
+	if (env->idle != CPU_NOT_IDLE) {
 
 		/*
 		 * ASYM_PACKING needs to force migrate tasks from busy but
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval
  2018-12-20  7:55 [PATCH v3 0/3] sched/fair: some fixes for asym_packing Vincent Guittot
  2018-12-20  7:55 ` [PATCH v3 1/3] sched/fair: fix rounding issue for asym packing Vincent Guittot
  2018-12-20  7:55 ` [PATCH 2/3] sched/fair: trigger asym_packing during idle load balance Vincent Guittot
@ 2018-12-20  7:55 ` Vincent Guittot
  2018-12-20 11:22   ` Valentin Schneider
  2 siblings, 1 reply; 16+ messages in thread
From: Vincent Guittot @ 2018-12-20  7:55 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: valentin.schneider, Morten.Rasmussen, Vincent Guittot

In case of active balance, we increase the balance interval to cover
pinned tasks cases not covered by all_pinned logic. Neverthless, the
active migration triggered by asym packing should be treated as the normal
unbalanced case and reset the interval to default value otherwise active
migration for asym_packing can be easily delayed for hundreds of ms
because of this pinned task detection mecanism.
The same happen to other conditions tested in need_active_balance() like
mistfit task and when the capacity of src_cpu is reduced compared to
dst_cpu (see comments in need_active_balance() for details).

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 487c73e..9b1e701 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8849,21 +8849,25 @@ static struct rq *find_busiest_queue(struct lb_env *env,
  */
 #define MAX_PINNED_INTERVAL	512
 
-static int need_active_balance(struct lb_env *env)
+static inline bool
+asym_active_balance(struct lb_env *env)
 {
-	struct sched_domain *sd = env->sd;
+	/*
+	 * ASYM_PACKING needs to force migrate tasks from busy but
+	 * lower priority CPUs in order to pack all tasks in the
+	 * highest priority CPUs.
+	 */
+	return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
+	       sched_asym_prefer(env->dst_cpu, env->src_cpu);
+}
 
-	if (env->idle != CPU_NOT_IDLE) {
+static inline bool
+voluntary_active_balance(struct lb_env *env)
+{
+	struct sched_domain *sd = env->sd;
 
-		/*
-		 * ASYM_PACKING needs to force migrate tasks from busy but
-		 * lower priority CPUs in order to pack all tasks in the
-		 * highest priority CPUs.
-		 */
-		if ((sd->flags & SD_ASYM_PACKING) &&
-		    sched_asym_prefer(env->dst_cpu, env->src_cpu))
-			return 1;
-	}
+	if (asym_active_balance(env))
+		return 1;
 
 	/*
 	 * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.
@@ -8881,6 +8885,16 @@ static int need_active_balance(struct lb_env *env)
 	if (env->src_grp_type == group_misfit_task)
 		return 1;
 
+	return 0;
+}
+
+static int need_active_balance(struct lb_env *env)
+{
+	struct sched_domain *sd = env->sd;
+
+	if (voluntary_active_balance(env))
+		return 1;
+
 	return unlikely(sd->nr_balance_failed > sd->cache_nice_tries+2);
 }
 
@@ -9142,7 +9156,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 	} else
 		sd->nr_balance_failed = 0;
 
-	if (likely(!active_balance)) {
+	if (likely(!active_balance) || voluntary_active_balance(&env)) {
 		/* We were unbalanced, so reset the balancing interval */
 		sd->balance_interval = sd->min_interval;
 	} else {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/3] sched/fair: fix rounding issue for asym packing
  2018-12-20  7:55 ` [PATCH v3 1/3] sched/fair: fix rounding issue for asym packing Vincent Guittot
@ 2018-12-20 11:16   ` Valentin Schneider
  2018-12-20 14:22     ` Vincent Guittot
  2018-12-20 14:54   ` [PATCH v4 " Vincent Guittot
  1 sibling, 1 reply; 16+ messages in thread
From: Valentin Schneider @ 2018-12-20 11:16 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel; +Cc: Morten.Rasmussen

On 20/12/2018 07:55, Vincent Guittot wrote:
> When check_asym_packing() is triggered, the imbalance is set to :
>   busiest_stat.avg_load * busiest_stat.group_capacity / SCHED_CAPACITY_SCALE
> But busiest_stat.avg_load equals
>   sgs->group_load *SCHED_CAPACITY_SCALE / sgs->group_capacity
> These divisions can generate a rounding that will make imbalance slightly
> lower than the weighted load of the cfs_rq.
> But this is enough to skip the rq in find_busiest_queue and prevents asym
> migration to happen.
> 
> Directly set imbalance to sgs->group_load to remove the rounding.
                            ^^^^^^^^^^^^^^^
I see where that's coming from, but using 'sgs' here is tad confusing since
'sds->busiest_stat' is what's actually used.

Maybe just something like 'the busiest's sgs->group_load' would be good
enough to make things explicit.

> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ca46964..9b31247 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8476,9 +8476,7 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
>  	if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
>  		return 0;
>  
> -	env->imbalance = DIV_ROUND_CLOSEST(
> -		sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity,
> -		SCHED_CAPACITY_SCALE);
> +	env->imbalance = sds->busiest_stat.avg_load;

That should be group_load, not avg_load. With that fixed:

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

>  
>  	return 1;
>  }
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] sched/fair: trigger asym_packing during idle load balance
  2018-12-20  7:55 ` [PATCH 2/3] sched/fair: trigger asym_packing during idle load balance Vincent Guittot
@ 2018-12-20 11:19   ` Valentin Schneider
  2018-12-20 14:33     ` Vincent Guittot
  0 siblings, 1 reply; 16+ messages in thread
From: Valentin Schneider @ 2018-12-20 11:19 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel; +Cc: Morten.Rasmussen

On 20/12/2018 07:55, Vincent Guittot wrote:
> newly idle load balance is not always triggered when a cpu becomes idle.
> This prevent the scheduler to get a chance to migrate task for asym packing.
> Enable active migration because of asym packing during idle load balance too.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9b31247..487c73e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8853,7 +8853,7 @@ static int need_active_balance(struct lb_env *env)
>  {
>  	struct sched_domain *sd = env->sd;
>  
> -	if (env->idle == CPU_NEWLY_IDLE) {
> +	if (env->idle != CPU_NOT_IDLE) {
>  
>  		/*
>  		 * ASYM_PACKING needs to force migrate tasks from busy but
> 

Regarding just extending the condition to include idle balance:

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>


As in the previous thread, I'll still argue that if you want to *reliably*
exploit newidle balances to do asym packing active balancing, you should
add some logic to raise rq->rd->overload when we notice some asym packing
could be done, so that it can be leveraged by a higher-priority CPU doing
a newidle balance.

Otherwise the few newidle asym-packing active balances you'll get will be
due to somewhat random luck because we happened to set that overload flag
at some point.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval
  2018-12-20  7:55 ` [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval Vincent Guittot
@ 2018-12-20 11:22   ` Valentin Schneider
  2018-12-20 14:50     ` Vincent Guittot
  0 siblings, 1 reply; 16+ messages in thread
From: Valentin Schneider @ 2018-12-20 11:22 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel; +Cc: Morten.Rasmussen

On 20/12/2018 07:55, Vincent Guittot wrote:
> In case of active balance, we increase the balance interval to cover
> pinned tasks cases not covered by all_pinned logic. Neverthless, the
> active migration triggered by asym packing should be treated as the normal
> unbalanced case and reset the interval to default value otherwise active
> migration for asym_packing can be easily delayed for hundreds of ms
> because of this pinned task detection mecanism.
> The same happen to other conditions tested in need_active_balance() like
> mistfit task and when the capacity of src_cpu is reduced compared to
> dst_cpu (see comments in need_active_balance() for details).
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 40 +++++++++++++++++++++++++++-------------
>  1 file changed, 27 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 487c73e..9b1e701 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8849,21 +8849,25 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>   */
>  #define MAX_PINNED_INTERVAL	512
>  
> -static int need_active_balance(struct lb_env *env)
> +static inline bool
> +asym_active_balance(struct lb_env *env)
>  {
> -	struct sched_domain *sd = env->sd;
> +	/*
> +	 * ASYM_PACKING needs to force migrate tasks from busy but
> +	 * lower priority CPUs in order to pack all tasks in the
> +	 * highest priority CPUs.
> +	 */
> +	return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
> +	       sched_asym_prefer(env->dst_cpu, env->src_cpu);
> +}
>  
> -	if (env->idle != CPU_NOT_IDLE) {
> +static inline bool
> +voluntary_active_balance(struct lb_env *env)
> +{
> +	struct sched_domain *sd = env->sd;
>  
> -		/*
> -		 * ASYM_PACKING needs to force migrate tasks from busy but
> -		 * lower priority CPUs in order to pack all tasks in the
> -		 * highest priority CPUs.
> -		 */
> -		if ((sd->flags & SD_ASYM_PACKING) &&
> -		    sched_asym_prefer(env->dst_cpu, env->src_cpu))
> -			return 1;
> -	}
> +	if (asym_active_balance(env))
> +		return 1;
>  
>  	/*
>  	 * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.
> @@ -8881,6 +8885,16 @@ static int need_active_balance(struct lb_env *env)
>  	if (env->src_grp_type == group_misfit_task)
>  		return 1;
>  
> +	return 0;
> +}
> +

Yeah so that's the active balance classification I was afraid of, and I
don't agree with it.

The way I see things, we happen to have some mechanisms that let us know
straight away if we need an active balance (asym packing, misfit, lowered
capacity), and we rely on the sd->nr_balance_failed threshold for the
scenarios where we don't have any more information.

We do happen to have a threshold because we don't want to go overboard with
it, but when it is reached it's a clear sign that we *do* want to active
balance because that's all we can do to try and solve the imbalance.

To me, those are all legitimate reasons to. So they're all "voluntary"
really, we *do* want all of these.

> +static int need_active_balance(struct lb_env *env)
> +{
> +	struct sched_domain *sd = env->sd;
> +
> +	if (voluntary_active_balance(env))
> +		return 1;
> +
>  	return unlikely(sd->nr_balance_failed > sd->cache_nice_tries+2);
>  }
>  
> @@ -9142,7 +9156,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>  	} else
>  		sd->nr_balance_failed = 0;
>  
> -	if (likely(!active_balance)) {
> +	if (likely(!active_balance) || voluntary_active_balance(&env)) {

So now  we reset the interval for all active balances (expect last active
balance case), even when it is done as a last resort because all other
tasks were pinned.

Arguably the current code isn't much better (always increase the interval
when active balancing), but at least it covers this case. It would be a
waste not to take this into account when we can detect this scenario
(I'll reiterate my LBF_ALL_PINNED suggestion).

>  		/* We were unbalanced, so reset the balancing interval */
>  		sd->balance_interval = sd->min_interval;
>  	} else {
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/3] sched/fair: fix rounding issue for asym packing
  2018-12-20 11:16   ` Valentin Schneider
@ 2018-12-20 14:22     ` Vincent Guittot
  0 siblings, 0 replies; 16+ messages in thread
From: Vincent Guittot @ 2018-12-20 14:22 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen

On Thu, 20 Dec 2018 at 12:16, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> On 20/12/2018 07:55, Vincent Guittot wrote:
> > When check_asym_packing() is triggered, the imbalance is set to :
> >   busiest_stat.avg_load * busiest_stat.group_capacity / SCHED_CAPACITY_SCALE
> > But busiest_stat.avg_load equals
> >   sgs->group_load *SCHED_CAPACITY_SCALE / sgs->group_capacity
> > These divisions can generate a rounding that will make imbalance slightly
> > lower than the weighted load of the cfs_rq.
> > But this is enough to skip the rq in find_busiest_queue and prevents asym
> > migration to happen.
> >
> > Directly set imbalance to sgs->group_load to remove the rounding.
>                             ^^^^^^^^^^^^^^^
> I see where that's coming from, but using 'sgs' here is tad confusing since
> 'sds->busiest_stat' is what's actually used.
>
> Maybe just something like 'the busiest's sgs->group_load' would be good
> enough to make things explicit.
>
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >  kernel/sched/fair.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index ca46964..9b31247 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8476,9 +8476,7 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
> >       if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
> >               return 0;
> >
> > -     env->imbalance = DIV_ROUND_CLOSEST(
> > -             sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity,
> > -             SCHED_CAPACITY_SCALE);
> > +     env->imbalance = sds->busiest_stat.avg_load;
>
> That should be group_load, not avg_load. With that fixed:

ah yes... too quick at changing it

>
> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
>
> >
> >       return 1;
> >  }
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] sched/fair: trigger asym_packing during idle load balance
  2018-12-20 11:19   ` Valentin Schneider
@ 2018-12-20 14:33     ` Vincent Guittot
  2018-12-20 16:12       ` Valentin Schneider
  0 siblings, 1 reply; 16+ messages in thread
From: Vincent Guittot @ 2018-12-20 14:33 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen

On Thu, 20 Dec 2018 at 12:19, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> On 20/12/2018 07:55, Vincent Guittot wrote:
> > newly idle load balance is not always triggered when a cpu becomes idle.
> > This prevent the scheduler to get a chance to migrate task for asym packing.
> > Enable active migration because of asym packing during idle load balance too.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >  kernel/sched/fair.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 9b31247..487c73e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8853,7 +8853,7 @@ static int need_active_balance(struct lb_env *env)
> >  {
> >       struct sched_domain *sd = env->sd;
> >
> > -     if (env->idle == CPU_NEWLY_IDLE) {
> > +     if (env->idle != CPU_NOT_IDLE) {
> >
> >               /*
> >                * ASYM_PACKING needs to force migrate tasks from busy but
> >
>
> Regarding just extending the condition to include idle balance:
>
> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
>
>
> As in the previous thread, I'll still argue that if you want to *reliably*
> exploit newidle balances to do asym packing active balancing, you should
> add some logic to raise rq->rd->overload when we notice some asym packing
> could be done, so that it can be leveraged by a higher-priority CPU doing
> a newidle balance.

The source cpu with the task is never overloaded.
We need to start the load balance to know that it's worth migrating the task.

>
> Otherwise the few newidle asym-packing active balances you'll get will be
> due to somewhat random luck because we happened to set that overload flag
> at some point.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval
  2018-12-20 11:22   ` Valentin Schneider
@ 2018-12-20 14:50     ` Vincent Guittot
  2018-12-20 17:24       ` Valentin Schneider
  0 siblings, 1 reply; 16+ messages in thread
From: Vincent Guittot @ 2018-12-20 14:50 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen

On Thu, 20 Dec 2018 at 12:22, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> On 20/12/2018 07:55, Vincent Guittot wrote:
> > In case of active balance, we increase the balance interval to cover
> > pinned tasks cases not covered by all_pinned logic. Neverthless, the
> > active migration triggered by asym packing should be treated as the normal
> > unbalanced case and reset the interval to default value otherwise active
> > migration for asym_packing can be easily delayed for hundreds of ms
> > because of this pinned task detection mecanism.
> > The same happen to other conditions tested in need_active_balance() like
> > mistfit task and when the capacity of src_cpu is reduced compared to
> > dst_cpu (see comments in need_active_balance() for details).
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >  kernel/sched/fair.c | 40 +++++++++++++++++++++++++++-------------
> >  1 file changed, 27 insertions(+), 13 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 487c73e..9b1e701 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8849,21 +8849,25 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >   */
> >  #define MAX_PINNED_INTERVAL  512
> >
> > -static int need_active_balance(struct lb_env *env)
> > +static inline bool
> > +asym_active_balance(struct lb_env *env)
> >  {
> > -     struct sched_domain *sd = env->sd;
> > +     /*
> > +      * ASYM_PACKING needs to force migrate tasks from busy but
> > +      * lower priority CPUs in order to pack all tasks in the
> > +      * highest priority CPUs.
> > +      */
> > +     return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
> > +            sched_asym_prefer(env->dst_cpu, env->src_cpu);
> > +}
> >
> > -     if (env->idle != CPU_NOT_IDLE) {
> > +static inline bool
> > +voluntary_active_balance(struct lb_env *env)
> > +{
> > +     struct sched_domain *sd = env->sd;
> >
> > -             /*
> > -              * ASYM_PACKING needs to force migrate tasks from busy but
> > -              * lower priority CPUs in order to pack all tasks in the
> > -              * highest priority CPUs.
> > -              */
> > -             if ((sd->flags & SD_ASYM_PACKING) &&
> > -                 sched_asym_prefer(env->dst_cpu, env->src_cpu))
> > -                     return 1;
> > -     }
> > +     if (asym_active_balance(env))
> > +             return 1;
> >
> >       /*
> >        * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.
> > @@ -8881,6 +8885,16 @@ static int need_active_balance(struct lb_env *env)
> >       if (env->src_grp_type == group_misfit_task)
> >               return 1;
> >
> > +     return 0;
> > +}
> > +
>
> Yeah so that's the active balance classification I was afraid of, and I
> don't agree with it.
>
> The way I see things, we happen to have some mechanisms that let us know
> straight away if we need an active balance (asym packing, misfit, lowered
> capacity), and we rely on the sd->nr_balance_failed threshold for the
> scenarios where we don't have any more information.
>
> We do happen to have a threshold because we don't want to go overboard with
> it, but when it is reached it's a clear sign that we *do* want to active
> balance because that's all we can do to try and solve the imbalance.
>
> To me, those are all legitimate reasons to. So they're all "voluntary"
> really, we *do* want all of these.
>
> > +static int need_active_balance(struct lb_env *env)
> > +{
> > +     struct sched_domain *sd = env->sd;
> > +
> > +     if (voluntary_active_balance(env))
> > +             return 1;
> > +
> >       return unlikely(sd->nr_balance_failed > sd->cache_nice_tries+2);
> >  }
> >
> > @@ -9142,7 +9156,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> >       } else
> >               sd->nr_balance_failed = 0;
> >
> > -     if (likely(!active_balance)) {
> > +     if (likely(!active_balance) || voluntary_active_balance(&env)) {
>
> So now  we reset the interval for all active balances (expect last active
> balance case), even when it is done as a last resort because all other
> tasks were pinned.
>
> Arguably the current code isn't much better (always increase the interval
> when active balancing), but at least it covers this case. It would be a
> waste not to take this into account when we can detect this scenario

So i'd like to remind the $subject of this patchset: fix some known
issues for asym_packing.
While looking at this, we have added few other voluntary active
balances because it was "obvious" that this active migration were
voluntary one. But in fact, we don't have any UC which show a problem
for those additional UC so far.

The default behavior for active migration is to increase the interval
Now you want to extend the exception to others active migration UC
whereas it's not clear that we don't fall in the default active
migration UC and we should not increase the interval.

What you want is changing the behavior of the scheduler for UC that
haven't raised any problem where asym_packing has known problem/

Changing default behavior for active migration is not subject of this
patchset and should be treated in another one like the one discussed
with peter few days ago

> (I'll reiterate my LBF_ALL_PINNED suggestion).
>
> >               /* We were unbalanced, so reset the balancing interval */
> >               sd->balance_interval = sd->min_interval;
> >       } else {
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/3] sched/fair: fix rounding issue for asym packing
  2018-12-20  7:55 ` [PATCH v3 1/3] sched/fair: fix rounding issue for asym packing Vincent Guittot
  2018-12-20 11:16   ` Valentin Schneider
@ 2018-12-20 14:54   ` Vincent Guittot
  1 sibling, 0 replies; 16+ messages in thread
From: Vincent Guittot @ 2018-12-20 14:54 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: valentin.schneider, Morten.Rasmussen, Vincent Guittot

When check_asym_packing() is triggered, the imbalance is set to :
  busiest_stat.avg_load * busiest_stat.group_capacity / SCHED_CAPACITY_SCALE
But busiest_stat.avg_load equals :
  sgs->group_load *SCHED_CAPACITY_SCALE / sgs->group_capacity
These divisions can generate a rounding that will make imbalance slightly
lower than the weighted load of the cfs_rq.
But this is enough to skip the rq in find_busiest_queue and prevents asym
migration to happen.

Directly set imbalance to busiest's sgs->group_load to remove the rounding.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ca46964..1e4bed4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8476,9 +8476,7 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
 	if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
 		return 0;
 
-	env->imbalance = DIV_ROUND_CLOSEST(
-		sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity,
-		SCHED_CAPACITY_SCALE);
+	env->imbalance = sds->busiest_stat.group_load;
 
 	return 1;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] sched/fair: trigger asym_packing during idle load balance
  2018-12-20 14:33     ` Vincent Guittot
@ 2018-12-20 16:12       ` Valentin Schneider
  0 siblings, 0 replies; 16+ messages in thread
From: Valentin Schneider @ 2018-12-20 16:12 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen

On 20/12/2018 14:33, Vincent Guittot wrote:
[...]
>> As in the previous thread, I'll still argue that if you want to *reliably*
>> exploit newidle balances to do asym packing active balancing, you should
>> add some logic to raise rq->rd->overload when we notice some asym packing
>> could be done, so that it can be leveraged by a higher-priority CPU doing
>> a newidle balance.
> 
> The source cpu with the task is never overloaded.
> We need to start the load balance to know that it's worth migrating the task.
> 

Yep, that's my point exactly:  if you ever want to active balance a task
on a rq with nr_running == 1 when a higher priority CPU goes newidle, you'd
need an asym-packing version of:

  757ffdd705ee ("sched/fair: Set rq->rd->overload when misfit")

It's somewhat orthogonal to this patch, but the reason I'm bringing this up
is that the commit log says

  "newly idle load balance is not always triggered when a cpu becomes idle"

And not having root_domain->overload raised is a reason for that issue.

>>
>> Otherwise the few newidle asym-packing active balances you'll get will be
>> due to somewhat random luck because we happened to set that overload flag
>> at some point.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval
  2018-12-20 14:50     ` Vincent Guittot
@ 2018-12-20 17:24       ` Valentin Schneider
  2018-12-21 14:49         ` Vincent Guittot
  0 siblings, 1 reply; 16+ messages in thread
From: Valentin Schneider @ 2018-12-20 17:24 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen

On 20/12/2018 14:50, Vincent Guittot wrote:
[...]
>> So now  we reset the interval for all active balances (expect last active
>> balance case), even when it is done as a last resort because all other
>> tasks were pinned.
>>
>> Arguably the current code isn't much better (always increase the interval
>> when active balancing), but at least it covers this case. It would be a
>> waste not to take this into account when we can detect this scenario
> 
> So i'd like to remind the $subject of this patchset: fix some known
> issues for asym_packing.
> While looking at this, we have added few other voluntary active
> balances because it was "obvious" that this active migration were
> voluntary one. But in fact, we don't have any UC which show a problem
> for those additional UC so far.
> 
> The default behavior for active migration is to increase the interval
> Now you want to extend the exception to others active migration UC
> whereas it's not clear that we don't fall in the default active
> migration UC and we should not increase the interval.
 
Well if we stick to the rule of only increasing balance_interval when
pinned tasks get in the way, it's clear to me that this use case shouldn't
be segregated from the others.

I do agree however that it's not entirely clear if that balance_interval
increase was also intended to slow down the nr_balance_failed migrations.

I had a look at the history and stumbled on:

	8102679447da ("[PATCH] sched: improve load balancing pinned tasks")

Which explains the reasoning behind the active_balance balance_interval
increase:

	"""
	this one attempts to work out whether the balancing failure has
	been due to too many tasks pinned on the runqueue.  This allows it
	to be basically invisible to the regular blancing paths (ie. when
	there are no pinned tasks).
	"""

So AFAICT that is indeed the rule we should be following, and I would only
increase the balance_interval when there are pinned tasks, not because
of active_balance categories. So here it's a matter of fixing that
condition into what it was meant to be doing.

> What you want is changing the behavior of the scheduler for UC that
> haven't raised any problem where asym_packing has known problem/
> 
> Changing default behavior for active migration is not subject of this
> patchset and should be treated in another one like the one discussed
> with peter few days ago
> >> (I'll reiterate my LBF_ALL_PINNED suggestion).
>>
>>>               /* We were unbalanced, so reset the balancing interval */
>>>               sd->balance_interval = sd->min_interval;
>>>       } else {
>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval
  2018-12-20 17:24       ` Valentin Schneider
@ 2018-12-21 14:49         ` Vincent Guittot
  2018-12-21 17:15           ` Valentin Schneider
  0 siblings, 1 reply; 16+ messages in thread
From: Vincent Guittot @ 2018-12-21 14:49 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen

On Thu, 20 Dec 2018 at 18:24, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> On 20/12/2018 14:50, Vincent Guittot wrote:
> [...]
> >> So now  we reset the interval for all active balances (expect last active
> >> balance case), even when it is done as a last resort because all other
> >> tasks were pinned.
> >>
> >> Arguably the current code isn't much better (always increase the interval
> >> when active balancing), but at least it covers this case. It would be a
> >> waste not to take this into account when we can detect this scenario
> >
> > So i'd like to remind the $subject of this patchset: fix some known
> > issues for asym_packing.
> > While looking at this, we have added few other voluntary active
> > balances because it was "obvious" that this active migration were
> > voluntary one. But in fact, we don't have any UC which show a problem
> > for those additional UC so far.
> >
> > The default behavior for active migration is to increase the interval
> > Now you want to extend the exception to others active migration UC
> > whereas it's not clear that we don't fall in the default active
> > migration UC and we should not increase the interval.
>
> Well if we stick to the rule of only increasing balance_interval when
> pinned tasks get in the way, it's clear to me that this use case shouldn't
> be segregated from the others.
>
> I do agree however that it's not entirely clear if that balance_interval
> increase was also intended to slow down the nr_balance_failed migrations.
>
> I had a look at the history and stumbled on:
>
>         8102679447da ("[PATCH] sched: improve load balancing pinned tasks")
>
> Which explains the reasoning behind the active_balance balance_interval
> increase:
>
>         """
>         this one attempts to work out whether the balancing failure has
>         been due to too many tasks pinned on the runqueue.  This allows it
>         to be basically invisible to the regular blancing paths (ie. when
>         there are no pinned tasks).
>         """
>
> So AFAICT that is indeed the rule we should be following, and I would only
> increase the balance_interval when there are pinned tasks, not because
> of active_balance categories. So here it's a matter of fixing that
> condition into what it was meant to be doing.

After looking at shed.c at this sha1,  (sd->nr_balance_failed >
sd->cache_nice_tries+2) was the only condition for doing active
migration and as a result it was the only reason for doubling
sd->balance_interval.
My patch keeps exactly the same behavior for this condition
'sd->nr_balance_failed > sd->cache_nice_tries+2). And, I'm even more
convinced to exclude (sd->nr_balance_failed > sd->cache_nice_tries+2)
condition because it's the condition that has introduced the doubling
of the interval.

As said previously, you can argue that this behavior is not optimal
and discuss its validity, but the sha1 that you mentioned above,
introduced the current policy for (sd->nr_balance_failed >
sd->cache_nice_tries+2) condition.
Reverting such behavior would need more studies, tests and cares which
are out of the scope of this patchset and more related to a whole
refactoring of load_balance and calculte_imbalance; FYI, I have
submitted a topic on the subject for the next OSPM

>
> > What you want is changing the behavior of the scheduler for UC that
> > haven't raised any problem where asym_packing has known problem/
> >
> > Changing default behavior for active migration is not subject of this
> > patchset and should be treated in another one like the one discussed
> > with peter few days ago
> > >> (I'll reiterate my LBF_ALL_PINNED suggestion).
> >>
> >>>               /* We were unbalanced, so reset the balancing interval */
> >>>               sd->balance_interval = sd->min_interval;
> >>>       } else {
> >>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval
  2018-12-21 14:49         ` Vincent Guittot
@ 2018-12-21 17:15           ` Valentin Schneider
  2018-12-21 17:41             ` Vincent Guittot
  0 siblings, 1 reply; 16+ messages in thread
From: Valentin Schneider @ 2018-12-21 17:15 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen

On 21/12/2018 14:49, Vincent Guittot wrote:
[...]
> After looking at shed.c at this sha1,  (sd->nr_balance_failed >
> sd->cache_nice_tries+2) was the only condition for doing active
> migration and as a result it was the only reason for doubling
> sd->balance_interval.
> My patch keeps exactly the same behavior for this condition
> 'sd->nr_balance_failed > sd->cache_nice_tries+2). And, I'm even more
> convinced to exclude (sd->nr_balance_failed > sd->cache_nice_tries+2)
> condition because it's the condition that has introduced the doubling
> of the interval.
> 
> As said previously, you can argue that this behavior is not optimal
> and discuss its validity, but the sha1 that you mentioned above,
> introduced the current policy for (sd->nr_balance_failed >
> sd->cache_nice_tries+2) condition.
> Reverting such behavior would need more studies, tests and cares

I agree with you on that, those are valid concerns.

What I'm arguing for is instead of doing this in two steps (reset interval
only for some active balance types, then experiment only increasing it for
"active balance as a last resort"), I'd prefer doing it in one step.

Mostly because I think the intermediate step adds an active balance
categorization that can easily become confusing. Furthermore, that 2005
commit explicitly states it wants to cater to pinned tasks, but we didn't
have those LBF_* flags back then, so if we are to do something about it
we should be improving upon the original intent.

In the end it's not for me to decide, I just happen to find doing it that
way more elegant (from a functionality & code readability PoV).

> which
> are out of the scope of this patchset and more related to a whole
> refactoring of load_balance and calculte_imbalance; FYI, I have
> submitted a topic on the subject for the next OSPM


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval
  2018-12-21 17:15           ` Valentin Schneider
@ 2018-12-21 17:41             ` Vincent Guittot
  0 siblings, 0 replies; 16+ messages in thread
From: Vincent Guittot @ 2018-12-21 17:41 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen

On Fri, 21 Dec 2018 at 18:15, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> On 21/12/2018 14:49, Vincent Guittot wrote:
> [...]
> > After looking at shed.c at this sha1,  (sd->nr_balance_failed >
> > sd->cache_nice_tries+2) was the only condition for doing active
> > migration and as a result it was the only reason for doubling
> > sd->balance_interval.
> > My patch keeps exactly the same behavior for this condition
> > 'sd->nr_balance_failed > sd->cache_nice_tries+2). And, I'm even more
> > convinced to exclude (sd->nr_balance_failed > sd->cache_nice_tries+2)
> > condition because it's the condition that has introduced the doubling
> > of the interval.
> >
> > As said previously, you can argue that this behavior is not optimal
> > and discuss its validity, but the sha1 that you mentioned above,
> > introduced the current policy for (sd->nr_balance_failed >
> > sd->cache_nice_tries+2) condition.
> > Reverting such behavior would need more studies, tests and cares
>
> I agree with you on that, those are valid concerns.
>
> What I'm arguing for is instead of doing this in two steps (reset interval
> only for some active balance types, then experiment only increasing it for
> "active balance as a last resort"), I'd prefer doing it in one step.

Doing in 2 steps has the advantage of not delaying the current fix and
gives enough time for a complete study on the other step

>
> Mostly because I think the intermediate step adds an active balance
> categorization that can easily become confusing. Furthermore, that 2005
> commit explicitly states it wants to cater to pinned tasks, but we didn't
> have those LBF_* flags back then, so if we are to do something about it
> we should be improving upon the original intent.
>
> In the end it's not for me to decide, I just happen to find doing it that
> way more elegant (from a functionality & code readability PoV).
>
> > which
> > are out of the scope of this patchset and more related to a whole
> > refactoring of load_balance and calculte_imbalance; FYI, I have
> > submitted a topic on the subject for the next OSPM
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-12-21 17:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-20  7:55 [PATCH v3 0/3] sched/fair: some fixes for asym_packing Vincent Guittot
2018-12-20  7:55 ` [PATCH v3 1/3] sched/fair: fix rounding issue for asym packing Vincent Guittot
2018-12-20 11:16   ` Valentin Schneider
2018-12-20 14:22     ` Vincent Guittot
2018-12-20 14:54   ` [PATCH v4 " Vincent Guittot
2018-12-20  7:55 ` [PATCH 2/3] sched/fair: trigger asym_packing during idle load balance Vincent Guittot
2018-12-20 11:19   ` Valentin Schneider
2018-12-20 14:33     ` Vincent Guittot
2018-12-20 16:12       ` Valentin Schneider
2018-12-20  7:55 ` [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval Vincent Guittot
2018-12-20 11:22   ` Valentin Schneider
2018-12-20 14:50     ` Vincent Guittot
2018-12-20 17:24       ` Valentin Schneider
2018-12-21 14:49         ` Vincent Guittot
2018-12-21 17:15           ` Valentin Schneider
2018-12-21 17:41             ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).