linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] sched/fair: Active balancer RT/DL preemption fix
@ 2019-08-07 17:40 Valentin Schneider
  2019-08-07 17:40 ` [PATCH 1/3] sched/fair: Move active balance logic to its own function Valentin Schneider
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Valentin Schneider @ 2019-08-07 17:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, vincent.guittot

Vincent's load balance rework [1] got me thinking about how and where we
use rq.nr_running vs rq.cfs.h_nr_running checks, and this lead me to
stare intently at the active load balancer.

I haven't seen it happen (yet), but from reading the code it really looks
like we can have some scenarios where the cpu_stopper ends up preempting
a > CFS class task.

- Patch 1 is a preparatory code move
- Patch 2 is the actual fix
- Patch 3 is a related fix for the cpu_stopper function

This is based on top of today's tip/sched/core:
  a1dc0446d649 ("sched/core: Silence a warning in sched_init()")

@Vincent: I don't think this should conflict too badly with your rework,
but if you have any issues I'll try to give you a version rebased on top
of the rework.

[1]: https://lore.kernel.org/lkml/1564670424-26023-1-git-send-email-vincent.guittot@linaro.org/

Valentin Schneider (3):
  sched/fair: Move active balance logic to its own function
  sched/fair: Prevent active LB from preempting higher sched classes
  sched/fair: Check for CFS tasks in active_load_balance_cpu_stop()

 kernel/sched/fair.c | 130 +++++++++++++++++++++++++++-----------------
 1 file changed, 79 insertions(+), 51 deletions(-)

--
2.22.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] sched/fair: Move active balance logic to its own function
  2019-08-07 17:40 [PATCH 0/3] sched/fair: Active balancer RT/DL preemption fix Valentin Schneider
@ 2019-08-07 17:40 ` Valentin Schneider
  2019-08-07 17:40 ` [PATCH 2/3] sched/fair: Prevent active LB from preempting higher sched classes Valentin Schneider
  2019-08-07 17:40 ` [PATCH 3/3] sched/fair: Check for CFS tasks in active_load_balance_cpu_stop() Valentin Schneider
  2 siblings, 0 replies; 6+ messages in thread
From: Valentin Schneider @ 2019-08-07 17:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, vincent.guittot

The logic to trigger an active balance is already quite lengthy, and
as I'm about to add yet another unlock condition it's probably better
to give it its own corner.

The only annoying requirement is that we need to branch to
out_one_pinned when the active balance is cancelled due to the running
task's affinity. Something like < 0, == 0 and > 0 return values could
suffice, but I went for an enum for clarity.

No change in functionality intended, but do note I had to move the
nr_balance_failed in the same if block as the stop_one_cpu_nowait()
call to not have it on the task pinned path.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c | 122 ++++++++++++++++++++++++++------------------
 1 file changed, 73 insertions(+), 49 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 52834cba8ca8..b56b8edee3d3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8804,6 +8804,72 @@ static int need_active_balance(struct lb_env *env)
 
 static int active_load_balance_cpu_stop(void *data);
 
+/* Active load balance */
+enum alb_status {
+	cancelled = 0,
+	cancelled_affinity,
+	started
+};
+
+/* Attempt to move a remote rq's running task to env's dst_cpu */
+static inline enum alb_status active_load_balance(struct lb_env *env)
+{
+	enum alb_status status = cancelled;
+	struct sched_domain *sd = env->sd;
+	struct rq *busiest = env->src_rq;
+	unsigned long flags;
+
+	schedstat_inc(sd->lb_failed[env->idle]);
+	/*
+	 * Increment the failure counter only on periodic balance.
+	 * We do not want newidle balance, which can be very frequent, pollute
+	 * the failure counter causing excessive cache_hot migrations and
+	 * active balances.
+	 */
+	if (env->idle != CPU_NEWLY_IDLE)
+		sd->nr_balance_failed++;
+
+	if (!need_active_balance(env))
+		goto out;
+
+	raw_spin_lock_irqsave(&busiest->lock, flags);
+
+	/*
+	 * Don't kick the active_load_balance_cpu_stop, if the curr task on
+	 * busiest CPU can't be moved to dst_cpu:
+	 */
+	if (!cpumask_test_cpu(env->dst_cpu, busiest->curr->cpus_ptr)) {
+		env->flags |= LBF_ALL_PINNED;
+		status = cancelled_affinity;
+		goto unlock;
+	}
+
+	/*
+	 * ->active_balance synchronizes accesses to ->active_balance_work.
+	 * Once set, it's cleared only after active load balance is finished.
+	 */
+	if (!busiest->active_balance) {
+		busiest->active_balance = 1;
+		busiest->push_cpu = env->dst_cpu;
+		status = started;
+	}
+
+unlock:
+	raw_spin_unlock_irqrestore(&busiest->lock, flags);
+
+	if (status == started) {
+		stop_one_cpu_nowait(cpu_of(busiest),
+				    active_load_balance_cpu_stop, busiest,
+				    &busiest->active_balance_work);
+
+		/* We've kicked active balancing, force task migration. */
+		sd->nr_balance_failed = sd->cache_nice_tries+1;
+	}
+
+out:
+	return status;
+}
+
 static int should_we_balance(struct lb_env *env)
 {
 	struct sched_group *sg = env->sd->groups;
@@ -8850,7 +8916,8 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 			struct sched_domain *sd, enum cpu_idle_type idle,
 			int *continue_balancing)
 {
-	int ld_moved, cur_ld_moved, active_balance = 0;
+	enum alb_status active_balance;
+	int ld_moved, cur_ld_moved;
 	struct sched_domain *sd_parent = sd->parent;
 	struct sched_group *group;
 	struct rq *busiest;
@@ -9009,56 +9076,13 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 	}
 
 	if (!ld_moved) {
-		schedstat_inc(sd->lb_failed[idle]);
-		/*
-		 * Increment the failure counter only on periodic balance.
-		 * We do not want newidle balance, which can be very
-		 * frequent, pollute the failure counter causing
-		 * excessive cache_hot migrations and active balances.
-		 */
-		if (idle != CPU_NEWLY_IDLE)
-			sd->nr_balance_failed++;
-
-		if (need_active_balance(&env)) {
-			unsigned long flags;
-
-			raw_spin_lock_irqsave(&busiest->lock, flags);
+		active_balance = active_load_balance(&env);
+		if (active_balance == cancelled_affinity)
+			goto out_one_pinned;
 
-			/*
-			 * Don't kick the active_load_balance_cpu_stop,
-			 * if the curr task on busiest CPU can't be
-			 * moved to this_cpu:
-			 */
-			if (!cpumask_test_cpu(this_cpu, busiest->curr->cpus_ptr)) {
-				raw_spin_unlock_irqrestore(&busiest->lock,
-							    flags);
-				env.flags |= LBF_ALL_PINNED;
-				goto out_one_pinned;
-			}
-
-			/*
-			 * ->active_balance synchronizes accesses to
-			 * ->active_balance_work.  Once set, it's cleared
-			 * only after active load balance is finished.
-			 */
-			if (!busiest->active_balance) {
-				busiest->active_balance = 1;
-				busiest->push_cpu = this_cpu;
-				active_balance = 1;
-			}
-			raw_spin_unlock_irqrestore(&busiest->lock, flags);
-
-			if (active_balance) {
-				stop_one_cpu_nowait(cpu_of(busiest),
-					active_load_balance_cpu_stop, busiest,
-					&busiest->active_balance_work);
-			}
-
-			/* We've kicked active balancing, force task migration. */
-			sd->nr_balance_failed = sd->cache_nice_tries+1;
-		}
-	} else
+	} else {
 		sd->nr_balance_failed = 0;
+	}
 
 	if (likely(!active_balance) || voluntary_active_balance(&env)) {
 		/* We were unbalanced, so reset the balancing interval */
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] sched/fair: Prevent active LB from preempting higher sched classes
  2019-08-07 17:40 [PATCH 0/3] sched/fair: Active balancer RT/DL preemption fix Valentin Schneider
  2019-08-07 17:40 ` [PATCH 1/3] sched/fair: Move active balance logic to its own function Valentin Schneider
@ 2019-08-07 17:40 ` Valentin Schneider
  2019-08-08  9:24   ` Qais Yousef
  2019-08-07 17:40 ` [PATCH 3/3] sched/fair: Check for CFS tasks in active_load_balance_cpu_stop() Valentin Schneider
  2 siblings, 1 reply; 6+ messages in thread
From: Valentin Schneider @ 2019-08-07 17:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, vincent.guittot

The CFS load balancer can cause the cpu_stopper to run a function to
try and steal a rq's currently running task. However, it so happens
that while only CFS tasks will ever be migrated by that function, we
can end up preempting higher sched class tasks, since it is executed
by the cpu_stopper.

I don't expect this to be exceedingly common: we still need to have
had a busiest group in the first place, which needs

  busiest->sum_nr_running != 0

which is a cfs.h_nr_running sum, so we should have something to pull,
but if we fail to pull anything and the remote rq is executing
an RT/DL task we can hit this.

Add an extra check to not trigger the cpu_stopper if the remote
rq's running task isn't CFS.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b56b8edee3d3..79bd6ead589c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8834,6 +8834,10 @@ static inline enum alb_status active_load_balance(struct lb_env *env)
 
 	raw_spin_lock_irqsave(&busiest->lock, flags);
 
+	/* Make sure we're not about to stop a task from a higher sched class */
+	if (busiest->curr->sched_class != &fair_sched_class)
+		goto unlock;
+
 	/*
 	 * Don't kick the active_load_balance_cpu_stop, if the curr task on
 	 * busiest CPU can't be moved to dst_cpu:
--
2.22.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] sched/fair: Check for CFS tasks in active_load_balance_cpu_stop()
  2019-08-07 17:40 [PATCH 0/3] sched/fair: Active balancer RT/DL preemption fix Valentin Schneider
  2019-08-07 17:40 ` [PATCH 1/3] sched/fair: Move active balance logic to its own function Valentin Schneider
  2019-08-07 17:40 ` [PATCH 2/3] sched/fair: Prevent active LB from preempting higher sched classes Valentin Schneider
@ 2019-08-07 17:40 ` Valentin Schneider
  2 siblings, 0 replies; 6+ messages in thread
From: Valentin Schneider @ 2019-08-07 17:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, vincent.guittot

We should really only be looking for CFS tasks, since that's all
detach_one_task() can ever pull.

Replace the rq.nr_running check with a rq.cfs.h_nr_running one to do
just that.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 79bd6ead589c..099aad1930bb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9206,8 +9206,8 @@ static int active_load_balance_cpu_stop(void *data)
 		     !busiest_rq->active_balance))
 		goto out_unlock;
 
-	/* Is there any task to move? */
-	if (busiest_rq->nr_running <= 1)
+	/* Is there any CFS task to move? */
+	if (busiest_rq->cfs.h_nr_running < 1)
 		goto out_unlock;
 
 	/*
--
2.22.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] sched/fair: Prevent active LB from preempting higher sched classes
  2019-08-07 17:40 ` [PATCH 2/3] sched/fair: Prevent active LB from preempting higher sched classes Valentin Schneider
@ 2019-08-08  9:24   ` Qais Yousef
  2019-08-08 10:46     ` Valentin Schneider
  0 siblings, 1 reply; 6+ messages in thread
From: Qais Yousef @ 2019-08-08  9:24 UTC (permalink / raw)
  To: Valentin Schneider; +Cc: linux-kernel, mingo, peterz, vincent.guittot

On 08/07/19 18:40, Valentin Schneider wrote:
> The CFS load balancer can cause the cpu_stopper to run a function to
> try and steal a rq's currently running task. However, it so happens
> that while only CFS tasks will ever be migrated by that function, we
> can end up preempting higher sched class tasks, since it is executed
> by the cpu_stopper.
> 
> I don't expect this to be exceedingly common: we still need to have
> had a busiest group in the first place, which needs
> 
>   busiest->sum_nr_running != 0
> 
> which is a cfs.h_nr_running sum, so we should have something to pull,
> but if we fail to pull anything and the remote rq is executing
> an RT/DL task we can hit this.
> 
> Add an extra check to not trigger the cpu_stopper if the remote
> rq's running task isn't CFS.
> 
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  kernel/sched/fair.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b56b8edee3d3..79bd6ead589c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8834,6 +8834,10 @@ static inline enum alb_status active_load_balance(struct lb_env *env)
>  
>  	raw_spin_lock_irqsave(&busiest->lock, flags);
>  
> +	/* Make sure we're not about to stop a task from a higher sched class */
> +	if (busiest->curr->sched_class != &fair_sched_class)
> +		goto unlock;
> +

This looks correct to me, but I wonder if this check is something that belongs
to the CONFIG_PREEMPT_RT land. This will give a preference to not disrupt the
RT/DL tasks which is certainly the desired behavior there, but maybe in none
PREEMPT_RT world balancing CFS tasks is more important? Hmmm

--
Qais Yousef

>  	/*
>  	 * Don't kick the active_load_balance_cpu_stop, if the curr task on
>  	 * busiest CPU can't be moved to dst_cpu:
> --
> 2.22.0
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] sched/fair: Prevent active LB from preempting higher sched classes
  2019-08-08  9:24   ` Qais Yousef
@ 2019-08-08 10:46     ` Valentin Schneider
  0 siblings, 0 replies; 6+ messages in thread
From: Valentin Schneider @ 2019-08-08 10:46 UTC (permalink / raw)
  To: Qais Yousef; +Cc: linux-kernel, mingo, peterz, vincent.guittot

On 08/08/2019 10:24, Qais Yousef wrote:
>> @@ -8834,6 +8834,10 @@ static inline enum alb_status active_load_balance(struct lb_env *env)
>>  
>>  	raw_spin_lock_irqsave(&busiest->lock, flags);
>>  
>> +	/* Make sure we're not about to stop a task from a higher sched class */
>> +	if (busiest->curr->sched_class != &fair_sched_class)
>> +		goto unlock;
>> +
> 
> This looks correct to me, but I wonder if this check is something that belongs
> to the CONFIG_PREEMPT_RT land. This will give a preference to not disrupt the
> RT/DL tasks which is certainly the desired behavior there, but maybe in none
> PREEMPT_RT world balancing CFS tasks is more important? Hmmm
> 

My take on this is that if the running task isn't CFS, there is no point in
running the cpu_stopper there (PREEMPT_RT or not). We can still try other
things though.

It could be that the running task had been > CFS all along, so if we
failed to move any load then we just couldn't pull any CFS task and should
bail out of load balance at this point.

If the running task was CFS but got preempted by a > CFS task in the
meantime (e.g. after detach_tasks() failed to pull anything), the best we
could do is run detach_one_task() (locally - no need for any cpu_stopper)
to try and nab the now not-running CFS task. Otherwise we'll have to wait
for another round of load_balance().
Not sure how much we care about this case - I think it's extremely unlikely
to repeatedly want to pull a currently-running CFS task and have it
repeatedly preempted by a > CFS task whenever we get to active_load_balance().

Let me try and see if I can come up with something sensible with that
detach_one_task() thingy.

> --
> Qais Yousef
> 
>>  	/*
>>  	 * Don't kick the active_load_balance_cpu_stop, if the curr task on
>>  	 * busiest CPU can't be moved to dst_cpu:
>> --
>> 2.22.0
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-08-08 10:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-07 17:40 [PATCH 0/3] sched/fair: Active balancer RT/DL preemption fix Valentin Schneider
2019-08-07 17:40 ` [PATCH 1/3] sched/fair: Move active balance logic to its own function Valentin Schneider
2019-08-07 17:40 ` [PATCH 2/3] sched/fair: Prevent active LB from preempting higher sched classes Valentin Schneider
2019-08-08  9:24   ` Qais Yousef
2019-08-08 10:46     ` Valentin Schneider
2019-08-07 17:40 ` [PATCH 3/3] sched/fair: Check for CFS tasks in active_load_balance_cpu_stop() Valentin Schneider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).