linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] RT Capacity Awareness Improvements
@ 2020-02-14 16:39 Qais Yousef
  2020-02-14 16:39 ` [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case Qais Yousef
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Qais Yousef @ 2020-02-14 16:39 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Pavan Kondeti,
	Dietmar Eggemann
  Cc: Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel, Qais Yousef

Pavan has pointed out an oversight in the first implementation where we don't
fallback to another unfitting CPU if we are already running on unfitting one.

This also stirred discussion around handling downmigration from fitting to
unfitting CPU.

https://lore.kernel.org/lkml/20200203111451.0d1da58f@oasis.local.home/

Patch 1 adds the missing fallback when a fitting CPU wasn't found, reported
by Pavan.

Patch 2 allows downmigration in the pull case and marks the CPU as overloaded
as suggested by Steve.

Patch 3 fixes the condition in select_task_rq_rt() in case the new_cpu and the
task priorities are the *same*.

The series is based on Linus/master v5.6-rc1.

I ran the following test cases, the results of which can be found in [1]
Each test was run 3 times to demonstrate repeatability of the result.

The tests were ran on Juno-r2, which has 6 CPUs. CPUs [0, 3-5] are Littles and
CPUs [1-2] are Bigs.

By default RT tasks are boosted to max capacity, so no work was done to modify
that. ie: rt_task->uclamp_min = 1024 for all running tests.

	1. 6 RT Tasks
		* 2 Tasks always end up on the big cores
		* The rest end up on the little cores with migration among them
		  happening (it didn't before)
	2. 2 RT Tasks
		* We can see they always end up on the 2 big cores
	3. 4 RT Tasks
		* Results similar to 1
	4. 4 RT Tasks affined to little cores
		* The tasks run on the little cores with some migration as
		  expected

[1] https://gist.github.com/qais-yousef/bb99bdd912628489408a5edae33f85e1

Qais Yousef (3):
  sched/rt: cpupri_find: implement fallback mechanism for !fit case
  sched/rt: allow pulling unfitting task
  sched/rt: fix pushing unfit tasks to a better CPU

 kernel/sched/cpupri.c | 157 +++++++++++++++++++++++++++---------------
 kernel/sched/rt.c     |  50 ++++++++++----
 2 files changed, 139 insertions(+), 68 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
  2020-02-14 16:39 [PATCH 0/3] RT Capacity Awareness Improvements Qais Yousef
@ 2020-02-14 16:39 ` Qais Yousef
  2020-02-17 17:07   ` Valentin Schneider
  2020-02-17 19:09   ` Dietmar Eggemann
  2020-02-14 16:39 ` [PATCH 2/3] sched/rt: allow pulling unfitting task Qais Yousef
  2020-02-14 16:39 ` [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU Qais Yousef
  2 siblings, 2 replies; 29+ messages in thread
From: Qais Yousef @ 2020-02-14 16:39 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Pavan Kondeti,
	Dietmar Eggemann
  Cc: Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel, Qais Yousef

When searching for the best lowest_mask with a fitness_fn passed, make
sure we record the lowest_level that returns a valid lowest_mask so that
we can use that as a fallback in case we fail to find a fitting CPU at
all levels.

The intention in the original patch was not to allow a down migration to
unfitting CPU. But this missed the case where we are already running on
unfitting one.

With this change now RT tasks can still move between unfitting CPUs when
they're already running on such CPU.

And as Steve suggested; to adhere to the strict priority rules of RT, if
a task is already running on a fitting CPU but due to priority it can't
run on it, allow it to downmigrate to unfitting CPU so it can run.

Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---
 kernel/sched/cpupri.c | 157 +++++++++++++++++++++++++++---------------
 1 file changed, 101 insertions(+), 56 deletions(-)

diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
index 1a2719e1350a..1bcfa1995550 100644
--- a/kernel/sched/cpupri.c
+++ b/kernel/sched/cpupri.c
@@ -41,6 +41,59 @@ static int convert_prio(int prio)
 	return cpupri;
 }
 
+static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
+				struct cpumask *lowest_mask, int idx)
+{
+	struct cpupri_vec *vec  = &cp->pri_to_cpu[idx];
+	int skip = 0;
+
+	if (!atomic_read(&(vec)->count))
+		skip = 1;
+	/*
+	 * When looking at the vector, we need to read the counter,
+	 * do a memory barrier, then read the mask.
+	 *
+	 * Note: This is still all racey, but we can deal with it.
+	 *  Ideally, we only want to look at masks that are set.
+	 *
+	 *  If a mask is not set, then the only thing wrong is that we
+	 *  did a little more work than necessary.
+	 *
+	 *  If we read a zero count but the mask is set, because of the
+	 *  memory barriers, that can only happen when the highest prio
+	 *  task for a run queue has left the run queue, in which case,
+	 *  it will be followed by a pull. If the task we are processing
+	 *  fails to find a proper place to go, that pull request will
+	 *  pull this task if the run queue is running at a lower
+	 *  priority.
+	 */
+	smp_rmb();
+
+	/* Need to do the rmb for every iteration */
+	if (skip)
+		return 0;
+
+	if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids)
+		return 0;
+
+	if (lowest_mask) {
+		cpumask_and(lowest_mask, p->cpus_ptr, vec->mask);
+
+		/*
+		 * We have to ensure that we have at least one bit
+		 * still set in the array, since the map could have
+		 * been concurrently emptied between the first and
+		 * second reads of vec->mask.  If we hit this
+		 * condition, simply act as though we never hit this
+		 * priority level and continue on.
+		 */
+		if (cpumask_empty(lowest_mask))
+			return 0;
+	}
+
+	return 1;
+}
+
 /**
  * cpupri_find - find the best (lowest-pri) CPU in the system
  * @cp: The cpupri context
@@ -62,80 +115,72 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,
 		struct cpumask *lowest_mask,
 		bool (*fitness_fn)(struct task_struct *p, int cpu))
 {
-	int idx = 0;
 	int task_pri = convert_prio(p->prio);
+	int best_unfit_idx = -1;
+	int idx = 0, cpu;
 
 	BUG_ON(task_pri >= CPUPRI_NR_PRIORITIES);
 
 	for (idx = 0; idx < task_pri; idx++) {
-		struct cpupri_vec *vec  = &cp->pri_to_cpu[idx];
-		int skip = 0;
 
-		if (!atomic_read(&(vec)->count))
-			skip = 1;
-		/*
-		 * When looking at the vector, we need to read the counter,
-		 * do a memory barrier, then read the mask.
-		 *
-		 * Note: This is still all racey, but we can deal with it.
-		 *  Ideally, we only want to look at masks that are set.
-		 *
-		 *  If a mask is not set, then the only thing wrong is that we
-		 *  did a little more work than necessary.
-		 *
-		 *  If we read a zero count but the mask is set, because of the
-		 *  memory barriers, that can only happen when the highest prio
-		 *  task for a run queue has left the run queue, in which case,
-		 *  it will be followed by a pull. If the task we are processing
-		 *  fails to find a proper place to go, that pull request will
-		 *  pull this task if the run queue is running at a lower
-		 *  priority.
-		 */
-		smp_rmb();
-
-		/* Need to do the rmb for every iteration */
-		if (skip)
-			continue;
-
-		if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids)
+		if (!__cpupri_find(cp, p, lowest_mask, idx))
 			continue;
 
-		if (lowest_mask) {
-			int cpu;
+		if (!lowest_mask || !fitness_fn)
+			return 1;
 
-			cpumask_and(lowest_mask, p->cpus_ptr, vec->mask);
+		/* Ensure the capacity of the CPUs fit the task */
+		for_each_cpu(cpu, lowest_mask) {
+			if (!fitness_fn(p, cpu))
+				cpumask_clear_cpu(cpu, lowest_mask);
+		}
 
+		/*
+		 * If no CPU at the current priority can fit the task
+		 * continue looking
+		 */
+		if (cpumask_empty(lowest_mask)) {
 			/*
-			 * We have to ensure that we have at least one bit
-			 * still set in the array, since the map could have
-			 * been concurrently emptied between the first and
-			 * second reads of vec->mask.  If we hit this
-			 * condition, simply act as though we never hit this
-			 * priority level and continue on.
+			 * Store our fallback priority in case we
+			 * didn't find a fitting CPU
 			 */
-			if (cpumask_empty(lowest_mask))
-				continue;
+			if (best_unfit_idx == -1)
+				best_unfit_idx = idx;
 
-			if (!fitness_fn)
-				return 1;
-
-			/* Ensure the capacity of the CPUs fit the task */
-			for_each_cpu(cpu, lowest_mask) {
-				if (!fitness_fn(p, cpu))
-					cpumask_clear_cpu(cpu, lowest_mask);
-			}
-
-			/*
-			 * If no CPU at the current priority can fit the task
-			 * continue looking
-			 */
-			if (cpumask_empty(lowest_mask))
-				continue;
+			continue;
 		}
 
 		return 1;
 	}
 
+	/*
+	 * If we failed to find a fitting lowest_mask, make sure we fall back
+	 * to the last known unfitting lowest_mask.
+	 *
+	 * Note that the map of the recorded idx might have changed since then,
+	 * so we must ensure to do the full dance to make sure that level still
+	 * holds a valid lowest_mask.
+	 *
+	 * As per above, the map could have been concurrently emptied while we
+	 * were busy searching for a fitting lowest_mask at the other priority
+	 * levels.
+	 *
+	 * This rule favours honouring priority over fitting the task in the
+	 * correct CPU (Capacity Awareness being the only user now).
+	 * The idea is that if a higher priority task can run, then it should
+	 * run even if this ends up being on unfitting CPU.
+	 *
+	 * The cost of this trade-off is not entirely clear and will probably
+	 * be good for some workloads and bad for others.
+	 *
+	 * The main idea here is that if some CPUs were overcommitted, we try
+	 * to spread which is what the scheduler traditionally did. Sys admins
+	 * must do proper RT planning to avoid overloading the system if they
+	 * really care.
+	 */
+	if (best_unfit_idx != -1)
+		return __cpupri_find(cp, p, lowest_mask, best_unfit_idx);
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/3] sched/rt: allow pulling unfitting task
  2020-02-14 16:39 [PATCH 0/3] RT Capacity Awareness Improvements Qais Yousef
  2020-02-14 16:39 ` [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case Qais Yousef
@ 2020-02-14 16:39 ` Qais Yousef
  2020-02-17  9:10   ` Pavan Kondeti
  2020-02-14 16:39 ` [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU Qais Yousef
  2 siblings, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2020-02-14 16:39 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Pavan Kondeti,
	Dietmar Eggemann
  Cc: Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel, Qais Yousef

When implemented RT Capacity Awareness; the logic was done such that if
a task was running on a fitting CPU, then it was sticky and we would try
our best to keep it there.

But as Steve suggested, to adhere to the strict priority rules of RT
class; allow pulling an RT task to unfitting CPU to ensure it gets a
chance to run ASAP. When doing so, mark the queue as overloaded to give
the system a chance to push the task to a better fitting CPU when a
chance arises.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---
 kernel/sched/rt.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4043abe45459..0c8bac134d3a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1646,10 +1646,20 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
 
 static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
 {
-	if (!task_running(rq, p) &&
-	    cpumask_test_cpu(cpu, p->cpus_ptr) &&
-	    rt_task_fits_capacity(p, cpu))
+	if (!task_running(rq, p) && cpumask_test_cpu(cpu, p->cpus_ptr)) {
+
+		/*
+		 * If the CPU doesn't fit the task, allow pulling but mark the
+		 * rq as overloaded so that we can push it again to a more
+		 * suitable CPU ASAP.
+		 */
+		if (!rt_task_fits_capacity(p, cpu)) {
+			rt_set_overload(rq);
+			rq->rt.overloaded = 1;
+		}
+
 		return 1;
+	}
 
 	return 0;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU
  2020-02-14 16:39 [PATCH 0/3] RT Capacity Awareness Improvements Qais Yousef
  2020-02-14 16:39 ` [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case Qais Yousef
  2020-02-14 16:39 ` [PATCH 2/3] sched/rt: allow pulling unfitting task Qais Yousef
@ 2020-02-14 16:39 ` Qais Yousef
  2020-02-17  9:23   ` Pavan Kondeti
  2 siblings, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2020-02-14 16:39 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Pavan Kondeti,
	Dietmar Eggemann
  Cc: Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel, Qais Yousef

If a task was running on unfit CPU we could ignore migrating if the
priority level of the new fitting CPU is the *same* as the unfit one.

Add an extra check to select_task_rq_rt() to allow the push in case:

	* old_cpu.highest_priority == new_cpu.highest_priority
	* task_fits(p, new_cpu)

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---

I was seeing some delays in migrating a task to a big CPU sometimes although it
was free, and I think this fixes it.

TBH, I fail to see how the check of

	p->prio < cpu_rq(target)->rt.highest_prio.curr

is necessary as find_lowest_rq() surely implies the above condition by
definition?

Unless we're fighting a race condition here where the rt_rq priority has
changed between the time we selected the lowest_rq and taking the decision to
migrate, then this makes sense.


 kernel/sched/rt.c | 34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 0c8bac134d3a..5ea235f2cfe8 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1430,7 +1430,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
 {
 	struct task_struct *curr;
 	struct rq *rq;
-	bool test;
+	bool test, fit;
 
 	/* For anything but wake ups, just return the task_cpu */
 	if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
@@ -1471,16 +1471,32 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
 	       unlikely(rt_task(curr)) &&
 	       (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
 
-	if (test || !rt_task_fits_capacity(p, cpu)) {
+	fit = rt_task_fits_capacity(p, cpu);
+
+	if (test || !fit) {
 		int target = find_lowest_rq(p);
 
-		/*
-		 * Don't bother moving it if the destination CPU is
-		 * not running a lower priority task.
-		 */
-		if (target != -1 &&
-		    p->prio < cpu_rq(target)->rt.highest_prio.curr)
-			cpu = target;
+		if (target != -1) {
+			/*
+			 * Don't bother moving it if the destination CPU is
+			 * not running a lower priority task.
+			 */
+			if (p->prio < cpu_rq(target)->rt.highest_prio.curr) {
+
+				cpu = target;
+
+			} else if (p->prio == cpu_rq(target)->rt.highest_prio.curr) {
+
+				/*
+				 * If the priority is the same and the new CPU
+				 * is a better fit, then move, otherwise don't
+				 * bother here either.
+				 */
+				fit = rt_task_fits_capacity(p, target);
+				if (fit)
+					cpu = target;
+			}
+		}
 	}
 	rcu_read_unlock();
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] sched/rt: allow pulling unfitting task
  2020-02-14 16:39 ` [PATCH 2/3] sched/rt: allow pulling unfitting task Qais Yousef
@ 2020-02-17  9:10   ` Pavan Kondeti
  2020-02-17 11:20     ` Qais Yousef
  2020-02-19 13:43     ` Qais Yousef
  0 siblings, 2 replies; 29+ messages in thread
From: Pavan Kondeti @ 2020-02-17  9:10 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

Hi Qais,

On Fri, Feb 14, 2020 at 04:39:48PM +0000, Qais Yousef wrote:
> When implemented RT Capacity Awareness; the logic was done such that if
> a task was running on a fitting CPU, then it was sticky and we would try
> our best to keep it there.
> 
> But as Steve suggested, to adhere to the strict priority rules of RT
> class; allow pulling an RT task to unfitting CPU to ensure it gets a
> chance to run ASAP. When doing so, mark the queue as overloaded to give
> the system a chance to push the task to a better fitting CPU when a
> chance arises.
> 
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> ---
>  kernel/sched/rt.c | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 4043abe45459..0c8bac134d3a 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1646,10 +1646,20 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
>  
>  static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
>  {
> -	if (!task_running(rq, p) &&
> -	    cpumask_test_cpu(cpu, p->cpus_ptr) &&
> -	    rt_task_fits_capacity(p, cpu))
> +	if (!task_running(rq, p) && cpumask_test_cpu(cpu, p->cpus_ptr)) {
> +
> +		/*
> +		 * If the CPU doesn't fit the task, allow pulling but mark the
> +		 * rq as overloaded so that we can push it again to a more
> +		 * suitable CPU ASAP.
> +		 */
> +		if (!rt_task_fits_capacity(p, cpu)) {
> +			rt_set_overload(rq);
> +			rq->rt.overloaded = 1;
> +		}
> +

Here rq is source rq from which the task is being pulled. I can't understand
how marking overload condition on source_rq help. Because overload condition
gets cleared in the task dequeue path. i.e dec_rt_tasks->dec_rt_migration->
update_rt_migration().

Also, the overload condition with nr_running=1 may not work as expected unless
we track this overload condition (due to unfit) separately. Because a task
can be pushed only when it is NOT running. So a task running on silver will
continue to run there until it wakes up next time or another high prio task
gets queued here (due to affinity).

btw, Are you testing this path by disabling RT_PUSH_IPI feature? I ask this
because, This feature gets turned on by default in our b.L platforms and
RT task migrations happens by the busy CPU pushing the tasks. Or are there
any cases where we can run into pick_rt_task() even when RT_PUSH_IPI is
enabled?

Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU
  2020-02-14 16:39 ` [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU Qais Yousef
@ 2020-02-17  9:23   ` Pavan Kondeti
  2020-02-17 13:53     ` Qais Yousef
  0 siblings, 1 reply; 29+ messages in thread
From: Pavan Kondeti @ 2020-02-17  9:23 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

Hi Qais,

On Fri, Feb 14, 2020 at 04:39:49PM +0000, Qais Yousef wrote:

[...]

> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 0c8bac134d3a..5ea235f2cfe8 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1430,7 +1430,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
>  {
>  	struct task_struct *curr;
>  	struct rq *rq;
> -	bool test;
> +	bool test, fit;
>  
>  	/* For anything but wake ups, just return the task_cpu */
>  	if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
> @@ -1471,16 +1471,32 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
>  	       unlikely(rt_task(curr)) &&
>  	       (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
>  
> -	if (test || !rt_task_fits_capacity(p, cpu)) {
> +	fit = rt_task_fits_capacity(p, cpu);
> +
> +	if (test || !fit) {
>  		int target = find_lowest_rq(p);
>  
> -		/*
> -		 * Don't bother moving it if the destination CPU is
> -		 * not running a lower priority task.
> -		 */
> -		if (target != -1 &&
> -		    p->prio < cpu_rq(target)->rt.highest_prio.curr)
> -			cpu = target;
> +		if (target != -1) {
> +			/*
> +			 * Don't bother moving it if the destination CPU is
> +			 * not running a lower priority task.
> +			 */
> +			if (p->prio < cpu_rq(target)->rt.highest_prio.curr) {
> +
> +				cpu = target;
> +
> +			} else if (p->prio == cpu_rq(target)->rt.highest_prio.curr) {
> +
> +				/*
> +				 * If the priority is the same and the new CPU
> +				 * is a better fit, then move, otherwise don't
> +				 * bother here either.
> +				 */
> +				fit = rt_task_fits_capacity(p, target);
> +				if (fit)
> +					cpu = target;
> +			}
> +		}

I understand that we are opting for the migration when priorities are tied but
the task can fit on the new task. But there is no guarantee that this task
stay there. Because any CPU that drops RT prio can pull the task. Then why
not leave it to the balancer?

I notice a case where tasks would migrate for no reason (happens without this
patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
task can go to *any* little CPU. There is no bias towards its previous CPU.
I don't know if it makes any difference but I see RT task placement is too
keen on reducing the migrations unless it is absolutely needed.

Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] sched/rt: allow pulling unfitting task
  2020-02-17  9:10   ` Pavan Kondeti
@ 2020-02-17 11:20     ` Qais Yousef
  2020-02-19 13:43     ` Qais Yousef
  1 sibling, 0 replies; 29+ messages in thread
From: Qais Yousef @ 2020-02-17 11:20 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 02/17/20 14:40, Pavan Kondeti wrote:
> Hi Qais,
> 
> On Fri, Feb 14, 2020 at 04:39:48PM +0000, Qais Yousef wrote:
> > When implemented RT Capacity Awareness; the logic was done such that if
> > a task was running on a fitting CPU, then it was sticky and we would try
> > our best to keep it there.
> > 
> > But as Steve suggested, to adhere to the strict priority rules of RT
> > class; allow pulling an RT task to unfitting CPU to ensure it gets a
> > chance to run ASAP. When doing so, mark the queue as overloaded to give
> > the system a chance to push the task to a better fitting CPU when a
> > chance arises.
> > 
> > Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > ---
> >  kernel/sched/rt.c | 16 +++++++++++++---
> >  1 file changed, 13 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 4043abe45459..0c8bac134d3a 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1646,10 +1646,20 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
> >  
> >  static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
> >  {
> > -	if (!task_running(rq, p) &&
> > -	    cpumask_test_cpu(cpu, p->cpus_ptr) &&
> > -	    rt_task_fits_capacity(p, cpu))
> > +	if (!task_running(rq, p) && cpumask_test_cpu(cpu, p->cpus_ptr)) {
> > +
> > +		/*
> > +		 * If the CPU doesn't fit the task, allow pulling but mark the
> > +		 * rq as overloaded so that we can push it again to a more
> > +		 * suitable CPU ASAP.
> > +		 */
> > +		if (!rt_task_fits_capacity(p, cpu)) {
> > +			rt_set_overload(rq);
> > +			rq->rt.overloaded = 1;
> > +		}
> > +
> 
> Here rq is source rq from which the task is being pulled. I can't understand
> how marking overload condition on source_rq help. Because overload condition
> gets cleared in the task dequeue path. i.e dec_rt_tasks->dec_rt_migration->
> update_rt_migration().

Err yes indeed I rushed this patch. Let me try again.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU
  2020-02-17  9:23   ` Pavan Kondeti
@ 2020-02-17 13:53     ` Qais Yousef
  2020-02-18  4:16       ` Pavan Kondeti
  2020-02-19 14:02       ` Qais Yousef
  0 siblings, 2 replies; 29+ messages in thread
From: Qais Yousef @ 2020-02-17 13:53 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 02/17/20 14:53, Pavan Kondeti wrote:
> Hi Qais,
> 
> On Fri, Feb 14, 2020 at 04:39:49PM +0000, Qais Yousef wrote:
> 
> [...]
> 
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 0c8bac134d3a..5ea235f2cfe8 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1430,7 +1430,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> >  {
> >  	struct task_struct *curr;
> >  	struct rq *rq;
> > -	bool test;
> > +	bool test, fit;
> >  
> >  	/* For anything but wake ups, just return the task_cpu */
> >  	if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
> > @@ -1471,16 +1471,32 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> >  	       unlikely(rt_task(curr)) &&
> >  	       (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
> >  
> > -	if (test || !rt_task_fits_capacity(p, cpu)) {
> > +	fit = rt_task_fits_capacity(p, cpu);
> > +
> > +	if (test || !fit) {
> >  		int target = find_lowest_rq(p);
> >  
> > -		/*
> > -		 * Don't bother moving it if the destination CPU is
> > -		 * not running a lower priority task.
> > -		 */
> > -		if (target != -1 &&
> > -		    p->prio < cpu_rq(target)->rt.highest_prio.curr)
> > -			cpu = target;
> > +		if (target != -1) {
> > +			/*
> > +			 * Don't bother moving it if the destination CPU is
> > +			 * not running a lower priority task.
> > +			 */
> > +			if (p->prio < cpu_rq(target)->rt.highest_prio.curr) {
> > +
> > +				cpu = target;
> > +
> > +			} else if (p->prio == cpu_rq(target)->rt.highest_prio.curr) {
> > +
> > +				/*
> > +				 * If the priority is the same and the new CPU
> > +				 * is a better fit, then move, otherwise don't
> > +				 * bother here either.
> > +				 */
> > +				fit = rt_task_fits_capacity(p, target);
> > +				if (fit)
> > +					cpu = target;
> > +			}
> > +		}
> 
> I understand that we are opting for the migration when priorities are tied but
> the task can fit on the new task. But there is no guarantee that this task
> stay there. Because any CPU that drops RT prio can pull the task. Then why
> not leave it to the balancer?

This patch does help in the 2 RT task test case. Without it I can see a big
delay for the task to migrate from a little CPU to a big one, although the big
is free.

Maybe my test is too short (1 second). The delay I've seen is 0.5-0.7s..

https://imgur.com/a/qKJk4w4

Maybe I missed the real root cause. Let me dig more.

> 
> I notice a case where tasks would migrate for no reason (happens without this
> patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
> task can go to *any* little CPU. There is no bias towards its previous CPU.
> I don't know if it makes any difference but I see RT task placement is too
> keen on reducing the migrations unless it is absolutely needed.

In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask
and prefer it if it is.

But yeah I see it happening too

https://imgur.com/a/FYqLIko

Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't
show that.

I shouldn't have changed something to affect this bias. Do you think it's
something I introduced?

It's something maybe worth digging into though. I'll try to have a look.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
  2020-02-14 16:39 ` [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case Qais Yousef
@ 2020-02-17 17:07   ` Valentin Schneider
  2020-02-17 23:34     ` Qais Yousef
  2020-02-17 19:09   ` Dietmar Eggemann
  1 sibling, 1 reply; 29+ messages in thread
From: Valentin Schneider @ 2020-02-17 17:07 UTC (permalink / raw)
  To: Qais Yousef, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Pavan Kondeti, Dietmar Eggemann
  Cc: Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman, linux-kernel

On 2/14/20 4:39 PM, Qais Yousef wrote:
> When searching for the best lowest_mask with a fitness_fn passed, make
> sure we record the lowest_level that returns a valid lowest_mask so that
> we can use that as a fallback in case we fail to find a fitting CPU at
> all levels.
> 
> The intention in the original patch was not to allow a down migration to
> unfitting CPU. But this missed the case where we are already running on
> unfitting one.
> 
> With this change now RT tasks can still move between unfitting CPUs when
> they're already running on such CPU.
> 
> And as Steve suggested; to adhere to the strict priority rules of RT, if
> a task is already running on a fitting CPU but due to priority it can't
> run on it, allow it to downmigrate to unfitting CPU so it can run.
> 
> Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> ---
>  kernel/sched/cpupri.c | 157 +++++++++++++++++++++++++++---------------
>  1 file changed, 101 insertions(+), 56 deletions(-)
> 
> diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
> index 1a2719e1350a..1bcfa1995550 100644
> --- a/kernel/sched/cpupri.c
> +++ b/kernel/sched/cpupri.c
> @@ -41,6 +41,59 @@ static int convert_prio(int prio)
>  	return cpupri;
>  }
>  
> +static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
> +				struct cpumask *lowest_mask, int idx)
> +{
> +	struct cpupri_vec *vec  = &cp->pri_to_cpu[idx];
> +	int skip = 0;
> +
> +	if (!atomic_read(&(vec)->count))
> +		skip = 1;
> +	/*
> +	 * When looking at the vector, we need to read the counter,
> +	 * do a memory barrier, then read the mask.
> +	 *
> +	 * Note: This is still all racey, but we can deal with it.
> +	 *  Ideally, we only want to look at masks that are set.
> +	 *
> +	 *  If a mask is not set, then the only thing wrong is that we
> +	 *  did a little more work than necessary.
> +	 *
> +	 *  If we read a zero count but the mask is set, because of the
> +	 *  memory barriers, that can only happen when the highest prio
> +	 *  task for a run queue has left the run queue, in which case,
> +	 *  it will be followed by a pull. If the task we are processing
> +	 *  fails to find a proper place to go, that pull request will
> +	 *  pull this task if the run queue is running at a lower
> +	 *  priority.
> +	 */
> +	smp_rmb();
> +
> +	/* Need to do the rmb for every iteration */
> +	if (skip)
> +		return 0;
> +
> +	if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids)
> +		return 0;
> +
> +	if (lowest_mask) {
> +		cpumask_and(lowest_mask, p->cpus_ptr, vec->mask);
> +
> +		/*
> +		 * We have to ensure that we have at least one bit
> +		 * still set in the array, since the map could have
> +		 * been concurrently emptied between the first and
> +		 * second reads of vec->mask.  If we hit this
> +		 * condition, simply act as though we never hit this
> +		 * priority level and continue on.
> +		 */
> +		if (cpumask_empty(lowest_mask))
> +			return 0;
> +	}
> +
> +	return 1;
> +}
> +

Just a drive-by comment; could you split that code move into its own patch?
It'd make the history a bit easier to read IMO.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
  2020-02-14 16:39 ` [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case Qais Yousef
  2020-02-17 17:07   ` Valentin Schneider
@ 2020-02-17 19:09   ` Dietmar Eggemann
  2020-02-17 23:45     ` Qais Yousef
  1 sibling, 1 reply; 29+ messages in thread
From: Dietmar Eggemann @ 2020-02-17 19:09 UTC (permalink / raw)
  To: Qais Yousef, Ingo Molnar, Peter Zijlstra, Steven Rostedt, Pavan Kondeti
  Cc: Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman, linux-kernel

On 14/02/2020 17:39, Qais Yousef wrote:

[...]

>  /**
>   * cpupri_find - find the best (lowest-pri) CPU in the system
>   * @cp: The cpupri context
> @@ -62,80 +115,72 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,
>  		struct cpumask *lowest_mask,
>  		bool (*fitness_fn)(struct task_struct *p, int cpu))
>  {
> -	int idx = 0;
>  	int task_pri = convert_prio(p->prio);
> +	int best_unfit_idx = -1;
> +	int idx = 0, cpu;
>  
>  	BUG_ON(task_pri >= CPUPRI_NR_PRIORITIES);
>  
>  	for (idx = 0; idx < task_pri; idx++) {
> -		struct cpupri_vec *vec  = &cp->pri_to_cpu[idx];
> -		int skip = 0;
>  
> -		if (!atomic_read(&(vec)->count))
> -			skip = 1;
> -		/*
> -		 * When looking at the vector, we need to read the counter,
> -		 * do a memory barrier, then read the mask.
> -		 *
> -		 * Note: This is still all racey, but we can deal with it.
> -		 *  Ideally, we only want to look at masks that are set.
> -		 *
> -		 *  If a mask is not set, then the only thing wrong is that we
> -		 *  did a little more work than necessary.
> -		 *
> -		 *  If we read a zero count but the mask is set, because of the
> -		 *  memory barriers, that can only happen when the highest prio
> -		 *  task for a run queue has left the run queue, in which case,
> -		 *  it will be followed by a pull. If the task we are processing
> -		 *  fails to find a proper place to go, that pull request will
> -		 *  pull this task if the run queue is running at a lower
> -		 *  priority.
> -		 */
> -		smp_rmb();
> -
> -		/* Need to do the rmb for every iteration */
> -		if (skip)
> -			continue;
> -
> -		if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids)
> +		if (!__cpupri_find(cp, p, lowest_mask, idx))
>  			continue;
>  
> -		if (lowest_mask) {
> -			int cpu;

Shouldn't we add an extra condition here?

+               if (!static_branch_unlikely(&sched_asym_cpucapacity))
+                       return 1;
+

Otherwise non-heterogeneous systems have to got through this
for_each_cpu(cpu, lowest_mask) further below for no good reason.

> +		if (!lowest_mask || !fitness_fn)
> +			return 1;
>  
> -			cpumask_and(lowest_mask, p->cpus_ptr, vec->mask);
> +		/* Ensure the capacity of the CPUs fit the task */
> +		for_each_cpu(cpu, lowest_mask) {
> +			if (!fitness_fn(p, cpu))
> +				cpumask_clear_cpu(cpu, lowest_mask);
> +		}

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
  2020-02-17 17:07   ` Valentin Schneider
@ 2020-02-17 23:34     ` Qais Yousef
  2020-02-18 10:01       ` Valentin Schneider
  0 siblings, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2020-02-17 23:34 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Pavan Kondeti,
	Dietmar Eggemann, Juri Lelli, Vincent Guittot, Ben Segall,
	Mel Gorman, linux-kernel

On 02/17/20 17:07, Valentin Schneider wrote:
> Just a drive-by comment; could you split that code move into its own patch?
> It'd make the history a bit easier to read IMO.

Hmm I don't see how it would help the history.

In large series with big churn splitting helps to facilitate review, but
I don't think reviewing this patch is hard because of creating the new
function.

And git-blame will have this patch all over the new function, so people who
care to know the reason of the split will land at the right place directly
without any extra level of indirection.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
  2020-02-17 19:09   ` Dietmar Eggemann
@ 2020-02-17 23:45     ` Qais Yousef
  2020-02-18  9:53       ` Dietmar Eggemann
  2020-02-18 16:46       ` Steven Rostedt
  0 siblings, 2 replies; 29+ messages in thread
From: Qais Yousef @ 2020-02-17 23:45 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Pavan Kondeti,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 02/17/20 20:09, Dietmar Eggemann wrote:
> On 14/02/2020 17:39, Qais Yousef wrote:
> 
> [...]
> 
> >  /**
> >   * cpupri_find - find the best (lowest-pri) CPU in the system
> >   * @cp: The cpupri context
> > @@ -62,80 +115,72 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,
> >  		struct cpumask *lowest_mask,
> >  		bool (*fitness_fn)(struct task_struct *p, int cpu))
> >  {
> > -	int idx = 0;
> >  	int task_pri = convert_prio(p->prio);
> > +	int best_unfit_idx = -1;
> > +	int idx = 0, cpu;
> >  
> >  	BUG_ON(task_pri >= CPUPRI_NR_PRIORITIES);
> >  
> >  	for (idx = 0; idx < task_pri; idx++) {
> > -		struct cpupri_vec *vec  = &cp->pri_to_cpu[idx];
> > -		int skip = 0;
> >  
> > -		if (!atomic_read(&(vec)->count))
> > -			skip = 1;
> > -		/*
> > -		 * When looking at the vector, we need to read the counter,
> > -		 * do a memory barrier, then read the mask.
> > -		 *
> > -		 * Note: This is still all racey, but we can deal with it.
> > -		 *  Ideally, we only want to look at masks that are set.
> > -		 *
> > -		 *  If a mask is not set, then the only thing wrong is that we
> > -		 *  did a little more work than necessary.
> > -		 *
> > -		 *  If we read a zero count but the mask is set, because of the
> > -		 *  memory barriers, that can only happen when the highest prio
> > -		 *  task for a run queue has left the run queue, in which case,
> > -		 *  it will be followed by a pull. If the task we are processing
> > -		 *  fails to find a proper place to go, that pull request will
> > -		 *  pull this task if the run queue is running at a lower
> > -		 *  priority.
> > -		 */
> > -		smp_rmb();
> > -
> > -		/* Need to do the rmb for every iteration */
> > -		if (skip)
> > -			continue;
> > -
> > -		if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids)
> > +		if (!__cpupri_find(cp, p, lowest_mask, idx))
> >  			continue;
> >  
> > -		if (lowest_mask) {
> > -			int cpu;
> 
> Shouldn't we add an extra condition here?
> 
> +               if (!static_branch_unlikely(&sched_asym_cpucapacity))
> +                       return 1;
> +
> 
> Otherwise non-heterogeneous systems have to got through this
> for_each_cpu(cpu, lowest_mask) further below for no good reason.

Hmm below is the best solution I can think of at the moment. Works for you?

It's independent of what this patch tries to fix, so I'll add as a separate
patch to the series in the next update.

Thanks

--
Qais Yousef

---

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 5ea235f2cfe8..5f2eaf3affde 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -14,6 +14,8 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);

 struct rt_bandwidth def_rt_bandwidth;

+typedef bool (*fitness_fn_t)(struct task_struct *p, int cpu);
+
 static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer)
 {
        struct rt_bandwidth *rt_b =
@@ -1708,6 +1710,7 @@ static int find_lowest_rq(struct task_struct *task)
        struct cpumask *lowest_mask = this_cpu_cpumask_var_ptr(local_cpu_mask);
        int this_cpu = smp_processor_id();
        int cpu      = task_cpu(task);
+       fitness_fn_t fitness_fn;

        /* Make sure the mask is initialized first */
        if (unlikely(!lowest_mask))
@@ -1716,8 +1719,17 @@ static int find_lowest_rq(struct task_struct *task)
        if (task->nr_cpus_allowed == 1)
                return -1; /* No other targets possible */

+       /*
+        * Help cpupri_find avoid the cost of looking for a fitting CPU when
+        * not really needed.
+        */
+       if (static_branch_unlikely(&sched_asym_cpucapacity))
+               fitness_fn = rt_task_fits_capacity;
+       else
+               fitness_fn = NULL;
+
        if (!cpupri_find(&task_rq(task)->rd->cpupri, task, lowest_mask,
-                        rt_task_fits_capacity))
+                        fitness_fn))
                return -1; /* No targets found */

        /*


> 
> > +		if (!lowest_mask || !fitness_fn)
> > +			return 1;
> >  
> > -			cpumask_and(lowest_mask, p->cpus_ptr, vec->mask);
> > +		/* Ensure the capacity of the CPUs fit the task */
> > +		for_each_cpu(cpu, lowest_mask) {
> > +			if (!fitness_fn(p, cpu))
> > +				cpumask_clear_cpu(cpu, lowest_mask);
> > +		}
> 
> [...]

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU
  2020-02-17 13:53     ` Qais Yousef
@ 2020-02-18  4:16       ` Pavan Kondeti
  2020-02-18 17:47         ` Qais Yousef
  2020-02-19 14:02       ` Qais Yousef
  1 sibling, 1 reply; 29+ messages in thread
From: Pavan Kondeti @ 2020-02-18  4:16 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On Mon, Feb 17, 2020 at 01:53:07PM +0000, Qais Yousef wrote:
> On 02/17/20 14:53, Pavan Kondeti wrote:
> > Hi Qais,
> > 
> > On Fri, Feb 14, 2020 at 04:39:49PM +0000, Qais Yousef wrote:
> > 
> > [...]
> > 
> > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > > index 0c8bac134d3a..5ea235f2cfe8 100644
> > > --- a/kernel/sched/rt.c
> > > +++ b/kernel/sched/rt.c
> > > @@ -1430,7 +1430,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> > >  {
> > >  	struct task_struct *curr;
> > >  	struct rq *rq;
> > > -	bool test;
> > > +	bool test, fit;
> > >  
> > >  	/* For anything but wake ups, just return the task_cpu */
> > >  	if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
> > > @@ -1471,16 +1471,32 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> > >  	       unlikely(rt_task(curr)) &&
> > >  	       (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
> > >  
> > > -	if (test || !rt_task_fits_capacity(p, cpu)) {
> > > +	fit = rt_task_fits_capacity(p, cpu);
> > > +
> > > +	if (test || !fit) {
> > >  		int target = find_lowest_rq(p);
> > >  
> > > -		/*
> > > -		 * Don't bother moving it if the destination CPU is
> > > -		 * not running a lower priority task.
> > > -		 */
> > > -		if (target != -1 &&
> > > -		    p->prio < cpu_rq(target)->rt.highest_prio.curr)
> > > -			cpu = target;
> > > +		if (target != -1) {
> > > +			/*
> > > +			 * Don't bother moving it if the destination CPU is
> > > +			 * not running a lower priority task.
> > > +			 */
> > > +			if (p->prio < cpu_rq(target)->rt.highest_prio.curr) {
> > > +
> > > +				cpu = target;
> > > +
> > > +			} else if (p->prio == cpu_rq(target)->rt.highest_prio.curr) {
> > > +
> > > +				/*
> > > +				 * If the priority is the same and the new CPU
> > > +				 * is a better fit, then move, otherwise don't
> > > +				 * bother here either.
> > > +				 */
> > > +				fit = rt_task_fits_capacity(p, target);
> > > +				if (fit)
> > > +					cpu = target;
> > > +			}
> > > +		}
> > 
> > I understand that we are opting for the migration when priorities are tied but
> > the task can fit on the new task. But there is no guarantee that this task
> > stay there. Because any CPU that drops RT prio can pull the task. Then why
> > not leave it to the balancer?
> 
> This patch does help in the 2 RT task test case. Without it I can see a big
> delay for the task to migrate from a little CPU to a big one, although the big
> is free.
> 
> Maybe my test is too short (1 second). The delay I've seen is 0.5-0.7s..
> 
> https://imgur.com/a/qKJk4w4
> 
> Maybe I missed the real root cause. Let me dig more.
> 
> > 
> > I notice a case where tasks would migrate for no reason (happens without this
> > patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
> > task can go to *any* little CPU. There is no bias towards its previous CPU.
> > I don't know if it makes any difference but I see RT task placement is too
> > keen on reducing the migrations unless it is absolutely needed.
> 
> In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask
> and prefer it if it is.
> 
> But yeah I see it happening too
> 
> https://imgur.com/a/FYqLIko
> 
> Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't
> show that.
> 
> I shouldn't have changed something to affect this bias. Do you think it's
> something I introduced?
> 
> It's something maybe worth digging into though. I'll try to have a look.
> 

The original RT task placement i.e without capacity awareness, places the task
on the previous CPU if the task can preempt the running task. I interpreted it
as that "higher prio RT" task should get better treatment even if it results
in stopping the lower prio RT execution and migrating it to another CPU.

Now coming to your patch (merged), we force find_lowest_rq() if the previous
CPU can't fit the task though this task can right away run there. When the
lowest mask returns an unfit CPU (with your new patch), We have two choices,
either to place it on this unfit CPU (may involve migration) or place it on
the previous CPU to avoid the migration. We are selecting the first approach.

The task_cpu(p) check in find_lowest_rq() only works when the previous CPU
does not have a RT task. If it is running a lower prio RT task than the
waking task, the lowest_mask may not contain the previous CPU.

I don't if any workload hurts due to this change in behavior. So not sure
if we have to restore the original behavior. Something like below will do.

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4043abe..c80d948 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1475,11 +1475,15 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
 		int target = find_lowest_rq(p);
 
 		/*
-		 * Don't bother moving it if the destination CPU is
-		 * not running a lower priority task.
+		 * Don't bother moving it
+		 *
+		 * - If the destination CPU is not running a lower priority task
+		 * - The task can't fit on the destination CPU and it can run
+		 *   right away on it's previous CPU.
 		 */
-		if (target != -1 &&
-		    p->prio < cpu_rq(target)->rt.highest_prio.curr)
+		if (target != -1 && target != cpu &&
+		    p->prio < cpu_rq(target)->rt.highest_prio.curr &&
+		    (test || rt_task_fits_capacity(p, target)))
 			cpu = target;
 	}
 	rcu_read_unlock();

Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
  2020-02-17 23:45     ` Qais Yousef
@ 2020-02-18  9:53       ` Dietmar Eggemann
  2020-02-18 17:28         ` Qais Yousef
  2020-02-18 16:46       ` Steven Rostedt
  1 sibling, 1 reply; 29+ messages in thread
From: Dietmar Eggemann @ 2020-02-18  9:53 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Pavan Kondeti,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 18/02/2020 00:45, Qais Yousef wrote:
> On 02/17/20 20:09, Dietmar Eggemann wrote:
>> On 14/02/2020 17:39, Qais Yousef wrote:
>>
>> [...]
>>
>>>  /**
>>>   * cpupri_find - find the best (lowest-pri) CPU in the system
>>>   * @cp: The cpupri context
>>> @@ -62,80 +115,72 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,
>>>  		struct cpumask *lowest_mask,
>>>  		bool (*fitness_fn)(struct task_struct *p, int cpu))
>>>  {
>>> -	int idx = 0;
>>>  	int task_pri = convert_prio(p->prio);
>>> +	int best_unfit_idx = -1;
>>> +	int idx = 0, cpu;
>>>  
>>>  	BUG_ON(task_pri >= CPUPRI_NR_PRIORITIES);
>>>  
>>>  	for (idx = 0; idx < task_pri; idx++) {
>>> -		struct cpupri_vec *vec  = &cp->pri_to_cpu[idx];
>>> -		int skip = 0;
>>>  
>>> -		if (!atomic_read(&(vec)->count))
>>> -			skip = 1;
>>> -		/*
>>> -		 * When looking at the vector, we need to read the counter,
>>> -		 * do a memory barrier, then read the mask.
>>> -		 *
>>> -		 * Note: This is still all racey, but we can deal with it.
>>> -		 *  Ideally, we only want to look at masks that are set.
>>> -		 *
>>> -		 *  If a mask is not set, then the only thing wrong is that we
>>> -		 *  did a little more work than necessary.
>>> -		 *
>>> -		 *  If we read a zero count but the mask is set, because of the
>>> -		 *  memory barriers, that can only happen when the highest prio
>>> -		 *  task for a run queue has left the run queue, in which case,
>>> -		 *  it will be followed by a pull. If the task we are processing
>>> -		 *  fails to find a proper place to go, that pull request will
>>> -		 *  pull this task if the run queue is running at a lower
>>> -		 *  priority.
>>> -		 */
>>> -		smp_rmb();
>>> -
>>> -		/* Need to do the rmb for every iteration */
>>> -		if (skip)
>>> -			continue;
>>> -
>>> -		if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids)
>>> +		if (!__cpupri_find(cp, p, lowest_mask, idx))
>>>  			continue;
>>>  
>>> -		if (lowest_mask) {
>>> -			int cpu;
>>
>> Shouldn't we add an extra condition here?
>>
>> +               if (!static_branch_unlikely(&sched_asym_cpucapacity))
>> +                       return 1;
>> +
>>
>> Otherwise non-heterogeneous systems have to got through this
>> for_each_cpu(cpu, lowest_mask) further below for no good reason.
> 
> Hmm below is the best solution I can think of at the moment. Works for you?
> 
> It's independent of what this patch tries to fix, so I'll add as a separate
> patch to the series in the next update.

OK.

Since we can't set it as early as init_sched_rt_class()

root@juno:~# dmesg | grep "\*\*\*"
[    0.501697] *** set sched_asym_cpucapacity <-- CPU cap asym by uArch
[    0.505847] *** init_sched_rt_class()
[    1.796706] *** set sched_asym_cpucapacity <-- CPUfreq kicked in

we probably have to do it either by bailing out of cpupri_find() early
with this extra condition (above) or by initializing the func pointer
dynamically (your example).

[...]

> @@ -1708,6 +1710,7 @@ static int find_lowest_rq(struct task_struct *task)
>         struct cpumask *lowest_mask = this_cpu_cpumask_var_ptr(local_cpu_mask);
>         int this_cpu = smp_processor_id();
>         int cpu      = task_cpu(task);
> +       fitness_fn_t fitness_fn;
> 
>         /* Make sure the mask is initialized first */
>         if (unlikely(!lowest_mask))
> @@ -1716,8 +1719,17 @@ static int find_lowest_rq(struct task_struct *task)
>         if (task->nr_cpus_allowed == 1)
>                 return -1; /* No other targets possible */
> 
> +       /*
> +        * Help cpupri_find avoid the cost of looking for a fitting CPU when
> +        * not really needed.
> +        */

In case the commend is really needed, for me it would work better
logically inverse.

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
  2020-02-17 23:34     ` Qais Yousef
@ 2020-02-18 10:01       ` Valentin Schneider
  0 siblings, 0 replies; 29+ messages in thread
From: Valentin Schneider @ 2020-02-18 10:01 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Pavan Kondeti,
	Dietmar Eggemann, Juri Lelli, Vincent Guittot, Ben Segall,
	Mel Gorman, linux-kernel



On 2/17/20 11:34 PM, Qais Yousef wrote:
> On 02/17/20 17:07, Valentin Schneider wrote:
>> Just a drive-by comment; could you split that code move into its own patch?
>> It'd make the history a bit easier to read IMO.
> 
> Hmm I don't see how it would help the history.
> 
> In large series with big churn splitting helps to facilitate review, but
> I don't think reviewing this patch is hard because of creating the new
> function.
> 
> And git-blame will have this patch all over the new function, so people who
> care to know the reason of the split will land at the right place directly
> without any extra level of indirection.
> 

As a git-blame addict I yearn for nice clean splits, and this is a code move
plus a new behaviour; having them in the same patch doesn't make for the
prettiest diff there is (also helps for review, yadda yadda). That said, I'm
not going to argue further over it, I'm barely a tenant in this house.

> Thanks
> 
> --
> Qais Yousef
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
  2020-02-17 23:45     ` Qais Yousef
  2020-02-18  9:53       ` Dietmar Eggemann
@ 2020-02-18 16:46       ` Steven Rostedt
  2020-02-18 17:27         ` Qais Yousef
  1 sibling, 1 reply; 29+ messages in thread
From: Steven Rostedt @ 2020-02-18 16:46 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Dietmar Eggemann, Ingo Molnar, Peter Zijlstra, Pavan Kondeti,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On Mon, 17 Feb 2020 23:45:49 +0000
Qais Yousef <qais.yousef@arm.com> wrote:

> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -14,6 +14,8 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
> 
>  struct rt_bandwidth def_rt_bandwidth;
> 
> +typedef bool (*fitness_fn_t)(struct task_struct *p, int cpu);
> +
>  static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer)
>  {
>         struct rt_bandwidth *rt_b =
> @@ -1708,6 +1710,7 @@ static int find_lowest_rq(struct task_struct *task)
>         struct cpumask *lowest_mask = this_cpu_cpumask_var_ptr(local_cpu_mask);
>         int this_cpu = smp_processor_id();
>         int cpu      = task_cpu(task);
> +       fitness_fn_t fitness_fn;
> 
>         /* Make sure the mask is initialized first */
>         if (unlikely(!lowest_mask))
> @@ -1716,8 +1719,17 @@ static int find_lowest_rq(struct task_struct *task)
>         if (task->nr_cpus_allowed == 1)
>                 return -1; /* No other targets possible */
> 
> +       /*
> +        * Help cpupri_find avoid the cost of looking for a fitting CPU when
> +        * not really needed.
> +        */
> +       if (static_branch_unlikely(&sched_asym_cpucapacity))
> +               fitness_fn = rt_task_fits_capacity;
> +       else
> +               fitness_fn = NULL;
> +
>         if (!cpupri_find(&task_rq(task)->rd->cpupri, task, lowest_mask,
> -                        rt_task_fits_capacity))
> +                        fitness_fn))
>                 return -1; /* No targets found */
> 
>         /*


If we are going to use static branches, then lets just remove the
parameter totally. That is, make two functions (with helpers), where
one needs this fitness function the other does not.

	if (static_branch_unlikely(&sched_asym_cpu_capacity))
		ret = cpupri_find_fitness(...);
	else
		ret = cpupri_find(...);

	if (!ret)
		return -1;

Something like that?

-- Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
  2020-02-18 16:46       ` Steven Rostedt
@ 2020-02-18 17:27         ` Qais Yousef
  2020-02-18 18:03           ` Steven Rostedt
  0 siblings, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2020-02-18 17:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dietmar Eggemann, Ingo Molnar, Peter Zijlstra, Pavan Kondeti,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 02/18/20 11:46, Steven Rostedt wrote:
> On Mon, 17 Feb 2020 23:45:49 +0000
> Qais Yousef <qais.yousef@arm.com> wrote:
> 
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -14,6 +14,8 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
> > 
> >  struct rt_bandwidth def_rt_bandwidth;
> > 
> > +typedef bool (*fitness_fn_t)(struct task_struct *p, int cpu);
> > +
> >  static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer)
> >  {
> >         struct rt_bandwidth *rt_b =
> > @@ -1708,6 +1710,7 @@ static int find_lowest_rq(struct task_struct *task)
> >         struct cpumask *lowest_mask = this_cpu_cpumask_var_ptr(local_cpu_mask);
> >         int this_cpu = smp_processor_id();
> >         int cpu      = task_cpu(task);
> > +       fitness_fn_t fitness_fn;
> > 
> >         /* Make sure the mask is initialized first */
> >         if (unlikely(!lowest_mask))
> > @@ -1716,8 +1719,17 @@ static int find_lowest_rq(struct task_struct *task)
> >         if (task->nr_cpus_allowed == 1)
> >                 return -1; /* No other targets possible */
> > 
> > +       /*
> > +        * Help cpupri_find avoid the cost of looking for a fitting CPU when
> > +        * not really needed.
> > +        */
> > +       if (static_branch_unlikely(&sched_asym_cpucapacity))
> > +               fitness_fn = rt_task_fits_capacity;
> > +       else
> > +               fitness_fn = NULL;
> > +
> >         if (!cpupri_find(&task_rq(task)->rd->cpupri, task, lowest_mask,
> > -                        rt_task_fits_capacity))
> > +                        fitness_fn))
> >                 return -1; /* No targets found */
> > 
> >         /*
> 
> 
> If we are going to use static branches, then lets just remove the
> parameter totally. That is, make two functions (with helpers), where
> one needs this fitness function the other does not.
> 
> 	if (static_branch_unlikely(&sched_asym_cpu_capacity))
> 		ret = cpupri_find_fitness(...);
> 	else
> 		ret = cpupri_find(...);
> 
> 	if (!ret)
> 		return -1;
> 
> Something like that?

Is there any implication on code generation here?

I like my flavour better tbh. But I don't mind refactoring the function out if
it does make it more readable.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
  2020-02-18  9:53       ` Dietmar Eggemann
@ 2020-02-18 17:28         ` Qais Yousef
  0 siblings, 0 replies; 29+ messages in thread
From: Qais Yousef @ 2020-02-18 17:28 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Pavan Kondeti,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 02/18/20 10:53, Dietmar Eggemann wrote:
> On 18/02/2020 00:45, Qais Yousef wrote:
> > +       /*
> > +        * Help cpupri_find avoid the cost of looking for a fitting CPU when
> > +        * not really needed.
> > +        */
> 
> In case the commend is really needed, for me it would work better
> logically inverse.

Sure.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU
  2020-02-18  4:16       ` Pavan Kondeti
@ 2020-02-18 17:47         ` Qais Yousef
  2020-02-19  2:46           ` Pavan Kondeti
  0 siblings, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2020-02-18 17:47 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 02/18/20 09:46, Pavan Kondeti wrote:
> The original RT task placement i.e without capacity awareness, places the task
> on the previous CPU if the task can preempt the running task. I interpreted it
> as that "higher prio RT" task should get better treatment even if it results
> in stopping the lower prio RT execution and migrating it to another CPU.
> 
> Now coming to your patch (merged), we force find_lowest_rq() if the previous
> CPU can't fit the task though this task can right away run there. When the
> lowest mask returns an unfit CPU (with your new patch), We have two choices,
> either to place it on this unfit CPU (may involve migration) or place it on
> the previous CPU to avoid the migration. We are selecting the first approach.
> 
> The task_cpu(p) check in find_lowest_rq() only works when the previous CPU
> does not have a RT task. If it is running a lower prio RT task than the
> waking task, the lowest_mask may not contain the previous CPU.
> 
> I don't if any workload hurts due to this change in behavior. So not sure
> if we have to restore the original behavior. Something like below will do.

Is this patch equivalent to yours? If yes, then I got you. If not, then I need
to re-read this again..

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ace9acf9d63c..854a0c9a7be6 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1476,6 +1476,13 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
        if (test || !rt_task_fits_capacity(p, cpu)) {
                int target = find_lowest_rq(p);

+               /*
+                * Bail out if we were forcing a migration to find a better
+                * fitting CPU but our search failed.
+                */
+               if (!test && !rt_task_fits_capacity(p, target))
+                       goto out_unlock;
+
                /*
                 * Don't bother moving it if the destination CPU is
                 * not running a lower priority task.
@@ -1484,6 +1491,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
                    p->prio < cpu_rq(target)->rt.highest_prio.curr)
                        cpu = target;
        }
+
+out_unlock:
        rcu_read_unlock();

 out:


> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 4043abe..c80d948 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1475,11 +1475,15 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
>  		int target = find_lowest_rq(p);
>  
>  		/*
> -		 * Don't bother moving it if the destination CPU is
> -		 * not running a lower priority task.
> +		 * Don't bother moving it
> +		 *
> +		 * - If the destination CPU is not running a lower priority task
> +		 * - The task can't fit on the destination CPU and it can run
> +		 *   right away on it's previous CPU.
>  		 */
> -		if (target != -1 &&
> -		    p->prio < cpu_rq(target)->rt.highest_prio.curr)
> +		if (target != -1 && target != cpu &&
> +		    p->prio < cpu_rq(target)->rt.highest_prio.curr &&
> +		    (test || rt_task_fits_capacity(p, target)))
>  			cpu = target;
>  	}
>  	rcu_read_unlock();
> 
> Thanks,
> Pavan
> 
> -- 
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
  2020-02-18 17:27         ` Qais Yousef
@ 2020-02-18 18:03           ` Steven Rostedt
  2020-02-18 18:52             ` Qais Yousef
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Rostedt @ 2020-02-18 18:03 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Dietmar Eggemann, Ingo Molnar, Peter Zijlstra, Pavan Kondeti,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On Tue, 18 Feb 2020 17:27:46 +0000
Qais Yousef <qais.yousef@arm.com> wrote:

> > If we are going to use static branches, then lets just remove the
> > parameter totally. That is, make two functions (with helpers), where
> > one needs this fitness function the other does not.
> > 
> > 	if (static_branch_unlikely(&sched_asym_cpu_capacity))
> > 		ret = cpupri_find_fitness(...);
> > 	else
> > 		ret = cpupri_find(...);
> > 
> > 	if (!ret)
> > 		return -1;
> > 
> > Something like that?  
> 
> Is there any implication on code generation here?
> 
> I like my flavour better tbh. But I don't mind refactoring the function out if
> it does make it more readable.

I just figured we remove the passing of the parameter (which does make
an impact on the code generation).

Also, perhaps it would be better to not have to pass functions to the
cpupri_find(). Is there any other function that needs to be past, or
just this one in this series?

-- Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
  2020-02-18 18:03           ` Steven Rostedt
@ 2020-02-18 18:52             ` Qais Yousef
  0 siblings, 0 replies; 29+ messages in thread
From: Qais Yousef @ 2020-02-18 18:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dietmar Eggemann, Ingo Molnar, Peter Zijlstra, Pavan Kondeti,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 02/18/20 13:03, Steven Rostedt wrote:
> On Tue, 18 Feb 2020 17:27:46 +0000
> Qais Yousef <qais.yousef@arm.com> wrote:
> 
> > > If we are going to use static branches, then lets just remove the
> > > parameter totally. That is, make two functions (with helpers), where
> > > one needs this fitness function the other does not.
> > > 
> > > 	if (static_branch_unlikely(&sched_asym_cpu_capacity))
> > > 		ret = cpupri_find_fitness(...);
> > > 	else
> > > 		ret = cpupri_find(...);
> > > 
> > > 	if (!ret)
> > > 		return -1;
> > > 
> > > Something like that?  
> > 
> > Is there any implication on code generation here?
> > 
> > I like my flavour better tbh. But I don't mind refactoring the function out if
> > it does make it more readable.
> 
> I just figured we remove the passing of the parameter (which does make
> an impact on the code generation).

Ok. My mind went to protecting the whole function call with the static key
could be better.

> Also, perhaps it would be better to not have to pass functions to the
> cpupri_find(). Is there any other function that needs to be past, or
> just this one in this series?

I had that discussion in the past with Dietmar [1]

My argument was this way the code is generic and self contained and allows for
easy extension for other potential users.

I'm happy to split the function into cpupri_find_fitness() (with a fn ptr) and
cpupri_find() (original) like you suggest above - if you're still okay with
that..

[1] https://lore.kernel.org/lkml/39c08971-5d07-8018-915b-9c6284f89d5d@arm.com/

Thanks

--
Qais Youesf

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU
  2020-02-18 17:47         ` Qais Yousef
@ 2020-02-19  2:46           ` Pavan Kondeti
  2020-02-19 10:46             ` Qais Yousef
  0 siblings, 1 reply; 29+ messages in thread
From: Pavan Kondeti @ 2020-02-19  2:46 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On Tue, Feb 18, 2020 at 05:47:19PM +0000, Qais Yousef wrote:
> On 02/18/20 09:46, Pavan Kondeti wrote:
> > The original RT task placement i.e without capacity awareness, places the task
> > on the previous CPU if the task can preempt the running task. I interpreted it
> > as that "higher prio RT" task should get better treatment even if it results
> > in stopping the lower prio RT execution and migrating it to another CPU.
> > 
> > Now coming to your patch (merged), we force find_lowest_rq() if the previous
> > CPU can't fit the task though this task can right away run there. When the
> > lowest mask returns an unfit CPU (with your new patch), We have two choices,
> > either to place it on this unfit CPU (may involve migration) or place it on
> > the previous CPU to avoid the migration. We are selecting the first approach.
> > 
> > The task_cpu(p) check in find_lowest_rq() only works when the previous CPU
> > does not have a RT task. If it is running a lower prio RT task than the
> > waking task, the lowest_mask may not contain the previous CPU.
> > 
> > I don't if any workload hurts due to this change in behavior. So not sure
> > if we have to restore the original behavior. Something like below will do.
> 
> Is this patch equivalent to yours? If yes, then I got you. If not, then I need
> to re-read this again..
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index ace9acf9d63c..854a0c9a7be6 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1476,6 +1476,13 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
>         if (test || !rt_task_fits_capacity(p, cpu)) {
>                 int target = find_lowest_rq(p);
> 
> +               /*
> +                * Bail out if we were forcing a migration to find a better
> +                * fitting CPU but our search failed.
> +                */
> +               if (!test && !rt_task_fits_capacity(p, target))
> +                       goto out_unlock;
> +

Yes. This is what I was referring to.

Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU
  2020-02-19  2:46           ` Pavan Kondeti
@ 2020-02-19 10:46             ` Qais Yousef
  0 siblings, 0 replies; 29+ messages in thread
From: Qais Yousef @ 2020-02-19 10:46 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 02/19/20 08:16, Pavan Kondeti wrote:
> On Tue, Feb 18, 2020 at 05:47:19PM +0000, Qais Yousef wrote:
> > On 02/18/20 09:46, Pavan Kondeti wrote:
> > > The original RT task placement i.e without capacity awareness, places the task
> > > on the previous CPU if the task can preempt the running task. I interpreted it
> > > as that "higher prio RT" task should get better treatment even if it results
> > > in stopping the lower prio RT execution and migrating it to another CPU.
> > > 
> > > Now coming to your patch (merged), we force find_lowest_rq() if the previous
> > > CPU can't fit the task though this task can right away run there. When the
> > > lowest mask returns an unfit CPU (with your new patch), We have two choices,
> > > either to place it on this unfit CPU (may involve migration) or place it on
> > > the previous CPU to avoid the migration. We are selecting the first approach.
> > > 
> > > The task_cpu(p) check in find_lowest_rq() only works when the previous CPU
> > > does not have a RT task. If it is running a lower prio RT task than the
> > > waking task, the lowest_mask may not contain the previous CPU.
> > > 
> > > I don't if any workload hurts due to this change in behavior. So not sure
> > > if we have to restore the original behavior. Something like below will do.
> > 
> > Is this patch equivalent to yours? If yes, then I got you. If not, then I need
> > to re-read this again..
> > 
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index ace9acf9d63c..854a0c9a7be6 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1476,6 +1476,13 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> >         if (test || !rt_task_fits_capacity(p, cpu)) {
> >                 int target = find_lowest_rq(p);
> > 
> > +               /*
> > +                * Bail out if we were forcing a migration to find a better
> > +                * fitting CPU but our search failed.
> > +                */
> > +               if (!test && !rt_task_fits_capacity(p, target))
> > +                       goto out_unlock;
> > +
> 
> Yes. This is what I was referring to.

Cool. I can't see how this could be a problem too but since as you say it'd
preserve the older behavior, I'll add it to the lot with proper changelog.

Thanks!

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] sched/rt: allow pulling unfitting task
  2020-02-17  9:10   ` Pavan Kondeti
  2020-02-17 11:20     ` Qais Yousef
@ 2020-02-19 13:43     ` Qais Yousef
  2020-02-21  8:07       ` Pavan Kondeti
  1 sibling, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2020-02-19 13:43 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 02/17/20 14:40, Pavan Kondeti wrote:
> Hi Qais,
> 
> On Fri, Feb 14, 2020 at 04:39:48PM +0000, Qais Yousef wrote:
> > When implemented RT Capacity Awareness; the logic was done such that if
> > a task was running on a fitting CPU, then it was sticky and we would try
> > our best to keep it there.
> > 
> > But as Steve suggested, to adhere to the strict priority rules of RT
> > class; allow pulling an RT task to unfitting CPU to ensure it gets a
> > chance to run ASAP. When doing so, mark the queue as overloaded to give
> > the system a chance to push the task to a better fitting CPU when a
> > chance arises.
> > 
> > Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > ---
> >  kernel/sched/rt.c | 16 +++++++++++++---
> >  1 file changed, 13 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 4043abe45459..0c8bac134d3a 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1646,10 +1646,20 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
> >  
> >  static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
> >  {
> > -	if (!task_running(rq, p) &&
> > -	    cpumask_test_cpu(cpu, p->cpus_ptr) &&
> > -	    rt_task_fits_capacity(p, cpu))
> > +	if (!task_running(rq, p) && cpumask_test_cpu(cpu, p->cpus_ptr)) {
> > +
> > +		/*
> > +		 * If the CPU doesn't fit the task, allow pulling but mark the
> > +		 * rq as overloaded so that we can push it again to a more
> > +		 * suitable CPU ASAP.
> > +		 */
> > +		if (!rt_task_fits_capacity(p, cpu)) {
> > +			rt_set_overload(rq);
> > +			rq->rt.overloaded = 1;
> > +		}
> > +
> 
> Here rq is source rq from which the task is being pulled. I can't understand
> how marking overload condition on source_rq help. Because overload condition
> gets cleared in the task dequeue path. i.e dec_rt_tasks->dec_rt_migration->
> update_rt_migration().
> 
> Also, the overload condition with nr_running=1 may not work as expected unless
> we track this overload condition (due to unfit) separately. Because a task
> can be pushed only when it is NOT running. So a task running on silver will
> continue to run there until it wakes up next time or another high prio task
> gets queued here (due to affinity).
> 
> btw, Are you testing this path by disabling RT_PUSH_IPI feature? I ask this
> because, This feature gets turned on by default in our b.L platforms and
> RT task migrations happens by the busy CPU pushing the tasks. Or are there
> any cases where we can run into pick_rt_task() even when RT_PUSH_IPI is
> enabled?

I changed the approach to set the overload at wake up now. I think I got away
without having to encode the reason in rq->rt.overload.

Steve, Pavan, if you can scrutinize this approach I'd be appreciated. It seemed
to work fine with my testing.

I'll push an updated series if this turned out okay.

Thanks

--
Qais Yousef

-->8--


When implemented RT Capacity Awareness; the logic was done such that if
a task was running on a fitting CPU, then it was sticky and we would try
our best to keep it there.

But as Steve suggested, to adhere to the strict priority rules of RT
class; allow pulling an RT task to unfitting CPU to ensure it gets a
chance to run ASAP.

To better handle the fact the task is running on unfit CPU, when it
wakes up mark it as overloaded which will cause it to be pushed to
a fitting CPU when it becomes available.

The latter change requires teaching push_rt_task() how to handle pushing
unfit task.

If the unfit task is the only pushable task, then we only force the push
if we find a fitting CPU. Otherwise we bail out.

Else if the task is higher priorirty than current task, then we
reschedule.

Else if the rq has other pushable tasks, then we push the unfitting task
anyway to reduce the pressure on the rq even if the target CPU is unfit
too.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---
 kernel/sched/rt.c | 52 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 48 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 6d959be4bba0..6d92219d5733 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1658,8 +1658,7 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
 static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
 {
 	if (!task_running(rq, p) &&
-	    cpumask_test_cpu(cpu, p->cpus_ptr) &&
-	    rt_task_fits_capacity(p, cpu))
+	    cpumask_test_cpu(cpu, p->cpus_ptr))
 		return 1;
 
 	return 0;
@@ -1860,6 +1859,7 @@ static int push_rt_task(struct rq *rq)
 	struct task_struct *next_task;
 	struct rq *lowest_rq;
 	int ret = 0;
+	bool fit;
 
 	if (!rq->rt.overloaded)
 		return 0;
@@ -1872,12 +1872,21 @@ static int push_rt_task(struct rq *rq)
 	if (WARN_ON(next_task == rq->curr))
 		return 0;
 
+	/*
+	 * The rq could be overloaded because it has unfitting task, if that's
+	 * the case they we need to try harder to find a better fitting CPU.
+	 */
+	fit = rt_task_fits_capacity(next_task, cpu_of(rq));
+
 	/*
 	 * It's possible that the next_task slipped in of
 	 * higher priority than current. If that's the case
 	 * just reschedule current.
+	 *
+	 * Unless next_task doesn't fit in this cpu, then continue with the
+	 * attempt to push it.
 	 */
-	if (unlikely(next_task->prio < rq->curr->prio)) {
+	if (unlikely(next_task->prio < rq->curr->prio && fit)) {
 		resched_curr(rq);
 		return 0;
 	}
@@ -1920,6 +1929,33 @@ static int push_rt_task(struct rq *rq)
 		goto retry;
 	}
 
+	/*
+	 * Bail out if the task doesn't fit on either CPUs.
+	 *
+	 * Unless..
+	 *
+	 * * The priority of next_task is higher than current, then we
+	 *   resched_curr(). We forced skipping this condition above.
+	 *
+	 * * The rq has more tasks to push, then we probably should push anyway
+	 *   reduce the load on this rq.
+	 */
+	if (!fit && !rt_task_fits_capacity(next_task, cpu_of(lowest_rq))) {
+
+		/* we forced skipping this condition, so re-evaluate it */
+		if (unlikely(next_task->prio < rq->curr->prio)) {
+			resched_curr(rq);
+			goto out_unlock;
+		}
+
+		/*
+		 * If there are more tasks to push, then the rq is overloaded
+		 * with more than just this task, so push anyway.
+		 */
+		if (has_pushable_tasks(rq))
+			goto out_unlock;
+	}
+
 	deactivate_task(rq, next_task, 0);
 	set_task_cpu(next_task, lowest_rq->cpu);
 	activate_task(lowest_rq, next_task, 0);
@@ -1927,6 +1963,7 @@ static int push_rt_task(struct rq *rq)
 
 	resched_curr(lowest_rq);
 
+out_unlock:
 	double_unlock_balance(rq, lowest_rq);
 
 out:
@@ -2223,7 +2260,14 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
 			    (rq->curr->nr_cpus_allowed < 2 ||
 			     rq->curr->prio <= p->prio);
 
-	if (need_to_push || !rt_task_fits_capacity(p, cpu_of(rq)))
+	bool fit = rt_task_fits_capacity(p, cpu_of(rq));
+
+	if (!fit && !rq->rt.overloaded) {
+		rt_set_overload(rq);
+		rq->rt.overloaded = 1;
+	}
+
+	if (need_to_push || !fit)
 		push_rt_tasks(rq);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU
  2020-02-17 13:53     ` Qais Yousef
  2020-02-18  4:16       ` Pavan Kondeti
@ 2020-02-19 14:02       ` Qais Yousef
  2020-02-21  8:15         ` Pavan Kondeti
  1 sibling, 1 reply; 29+ messages in thread
From: Qais Yousef @ 2020-02-19 14:02 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 02/17/20 13:53, Qais Yousef wrote:
> On 02/17/20 14:53, Pavan Kondeti wrote:
> > I notice a case where tasks would migrate for no reason (happens without this
> > patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
> > task can go to *any* little CPU. There is no bias towards its previous CPU.
> > I don't know if it makes any difference but I see RT task placement is too
> > keen on reducing the migrations unless it is absolutely needed.
> 
> In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask
> and prefer it if it is.
> 
> But yeah I see it happening too
> 
> https://imgur.com/a/FYqLIko
> 
> Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't
> show that.
> 
> I shouldn't have changed something to affect this bias. Do you think it's
> something I introduced?
> 
> It's something maybe worth digging into though. I'll try to have a look.

FWIW, I dug a bit into this and I found out we have a thundering herd issue.

Since I just have a set of periodic task that all start together,
select_task_rq_rt() ends up selecting the same fitting CPU for all of them
(CPU1). The end up all waking up on CPU1, only to get pushed back out
again with only one surviving.

This reshuffles the task placement ending with some tasks being swapped.

I don't think this problem is specific to my change and could happen without
it.

The problem is caused by the way find_lowest_rq() selects a cpu in the mask

1750                         best_cpu = cpumask_first_and(lowest_mask,
1751                                                      sched_domain_span(sd));
1752                         if (best_cpu < nr_cpu_ids) {
1753                                 rcu_read_unlock();
1754                                 return best_cpu;
1755                         }

It always returns the first CPU in the mask. Or the mask could only contain
a single CPU too. The end result is that we most likely end up herding all the
tasks that wake up simultaneously to the same CPU.

I'm not sure how to fix this problem yet.

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] sched/rt: allow pulling unfitting task
  2020-02-19 13:43     ` Qais Yousef
@ 2020-02-21  8:07       ` Pavan Kondeti
  2020-02-21 11:08         ` Qais Yousef
  0 siblings, 1 reply; 29+ messages in thread
From: Pavan Kondeti @ 2020-02-21  8:07 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

Hi Quais,

On Wed, Feb 19, 2020 at 01:43:08PM +0000, Qais Yousef wrote:

[...]

> > 
> > Here rq is source rq from which the task is being pulled. I can't understand
> > how marking overload condition on source_rq help. Because overload condition
> > gets cleared in the task dequeue path. i.e dec_rt_tasks->dec_rt_migration->
> > update_rt_migration().
> > 
> > Also, the overload condition with nr_running=1 may not work as expected unless
> > we track this overload condition (due to unfit) separately. Because a task
> > can be pushed only when it is NOT running. So a task running on silver will
> > continue to run there until it wakes up next time or another high prio task
> > gets queued here (due to affinity).
> > 
> > btw, Are you testing this path by disabling RT_PUSH_IPI feature? I ask this
> > because, This feature gets turned on by default in our b.L platforms and
> > RT task migrations happens by the busy CPU pushing the tasks. Or are there
> > any cases where we can run into pick_rt_task() even when RT_PUSH_IPI is
> > enabled?
> 
> I changed the approach to set the overload at wake up now. I think I got away
> without having to encode the reason in rq->rt.overload.
> 
> Steve, Pavan, if you can scrutinize this approach I'd be appreciated. It seemed
> to work fine with my testing.
> 

The 1st patch in this series added the support to return an unfit CPU incase
all other BIG CPUs are busy with RT tasks. I believe that change it self
solves the RT tasks spreading problem if we just remove
rt_task_fits_capacity() from pick_rt_task(). In fact rt_task_fits_capacity()
can also be removed from switched_to_rt() and task_woken_rt(). Because
a RT task can be pulled/pushed only if it not currently running. If the
intention is to push/pull a running RT task from an unfit CPU to a BIG CPU,
that won't work as we are not waking any migration/X to carry the migration.

If an CPU has 2 RT tasks (excluding the running RT task), then we have a
choice about which RT task to be pushed based on the destination CPU's
capacity. Are we trying to solve this problem in this patch?

Please see the below comments as well and let me know if we are on the
same page or not.

> 
> 
> When implemented RT Capacity Awareness; the logic was done such that if
> a task was running on a fitting CPU, then it was sticky and we would try
> our best to keep it there.
> 
> But as Steve suggested, to adhere to the strict priority rules of RT
> class; allow pulling an RT task to unfitting CPU to ensure it gets a
> chance to run ASAP.
> 
> To better handle the fact the task is running on unfit CPU, when it
> wakes up mark it as overloaded which will cause it to be pushed to
> a fitting CPU when it becomes available.
> 
> The latter change requires teaching push_rt_task() how to handle pushing
> unfit task.
> 
> If the unfit task is the only pushable task, then we only force the push
> if we find a fitting CPU. Otherwise we bail out.
> 
> Else if the task is higher priorirty than current task, then we
> reschedule.
> 
> Else if the rq has other pushable tasks, then we push the unfitting task
> anyway to reduce the pressure on the rq even if the target CPU is unfit
> too.
> 
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> ---
>  kernel/sched/rt.c | 52 +++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 48 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 6d959be4bba0..6d92219d5733 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1658,8 +1658,7 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
>  static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
>  {
>  	if (!task_running(rq, p) &&
> -	    cpumask_test_cpu(cpu, p->cpus_ptr) &&
> -	    rt_task_fits_capacity(p, cpu))
> +	    cpumask_test_cpu(cpu, p->cpus_ptr))
>  		return 1;
>  

Yes. We come here means rq has more than 1 runnable task, so we prefer
spreading.

>  	return 0;
> @@ -1860,6 +1859,7 @@ static int push_rt_task(struct rq *rq)
>  	struct task_struct *next_task;
>  	struct rq *lowest_rq;
>  	int ret = 0;
> +	bool fit;
>  
>  	if (!rq->rt.overloaded)
>  		return 0;
> @@ -1872,12 +1872,21 @@ static int push_rt_task(struct rq *rq)
>  	if (WARN_ON(next_task == rq->curr))
>  		return 0;
>  
> +	/*
> +	 * The rq could be overloaded because it has unfitting task, if that's
> +	 * the case they we need to try harder to find a better fitting CPU.
> +	 */
> +	fit = rt_task_fits_capacity(next_task, cpu_of(rq));
> +
>  	/*
>  	 * It's possible that the next_task slipped in of
>  	 * higher priority than current. If that's the case
>  	 * just reschedule current.
> +	 *
> +	 * Unless next_task doesn't fit in this cpu, then continue with the
> +	 * attempt to push it.
>  	 */
> -	if (unlikely(next_task->prio < rq->curr->prio)) {
> +	if (unlikely(next_task->prio < rq->curr->prio && fit)) {
>  		resched_curr(rq);
>  		return 0;
>  	}
> @@ -1920,6 +1929,33 @@ static int push_rt_task(struct rq *rq)
>  		goto retry;
>  	}
>  
> +	/*
> +	 * Bail out if the task doesn't fit on either CPUs.
> +	 *
> +	 * Unless..
> +	 *
> +	 * * The priority of next_task is higher than current, then we
> +	 *   resched_curr(). We forced skipping this condition above.
> +	 *
> +	 * * The rq has more tasks to push, then we probably should push anyway
> +	 *   reduce the load on this rq.
> +	 */
> +	if (!fit && !rt_task_fits_capacity(next_task, cpu_of(lowest_rq))) {
> +
> +		/* we forced skipping this condition, so re-evaluate it */
> +		if (unlikely(next_task->prio < rq->curr->prio)) {
> +			resched_curr(rq);
> +			goto out_unlock;
> +		}
> +
> +		/*
> +		 * If there are more tasks to push, then the rq is overloaded
> +		 * with more than just this task, so push anyway.
> +		 */
> +		if (has_pushable_tasks(rq))
> +			goto out_unlock;

The next_task is not yet removed from the pushable_tasks list, so this will
always be true and we skip pushing. Even if you add some logic, what happens
if the other task also does not fit? How would the task finally be pushed
to other CPUs?

> +	}
> +
>  	deactivate_task(rq, next_task, 0);
>  	set_task_cpu(next_task, lowest_rq->cpu);
>  	activate_task(lowest_rq, next_task, 0);
> @@ -1927,6 +1963,7 @@ static int push_rt_task(struct rq *rq)
>  
>  	resched_curr(lowest_rq);
>  
> +out_unlock:
>  	double_unlock_balance(rq, lowest_rq);
>  
>  out:
> @@ -2223,7 +2260,14 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
>  			    (rq->curr->nr_cpus_allowed < 2 ||
>  			     rq->curr->prio <= p->prio);
>  
> -	if (need_to_push || !rt_task_fits_capacity(p, cpu_of(rq)))
> +	bool fit = rt_task_fits_capacity(p, cpu_of(rq));
> +
> +	if (!fit && !rq->rt.overloaded) {
> +		rt_set_overload(rq);
> +		rq->rt.overloaded = 1;
> +	}
> +
> +	if (need_to_push || !fit)
>  		push_rt_tasks(rq);
>  }

If this is the only RT task queued on this CPU and current task is not a RT
task, why are we calling push_rt_tasks() in case if unfit scenario. In most
cases, the task is woken on this rq, because there is no other higher capacity
rq that can take this task for whatever reason.

btw, if we set overload condition with just 1 RT task, we keep receiving the
IPI (irq work) to push the task and we can't do that because a running RT
task can't be pushed.

Thanks,
Pavan
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU
  2020-02-19 14:02       ` Qais Yousef
@ 2020-02-21  8:15         ` Pavan Kondeti
  2020-02-21 11:12           ` Qais Yousef
  0 siblings, 1 reply; 29+ messages in thread
From: Pavan Kondeti @ 2020-02-21  8:15 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On Wed, Feb 19, 2020 at 02:02:44PM +0000, Qais Yousef wrote:
> On 02/17/20 13:53, Qais Yousef wrote:
> > On 02/17/20 14:53, Pavan Kondeti wrote:
> > > I notice a case where tasks would migrate for no reason (happens without this
> > > patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
> > > task can go to *any* little CPU. There is no bias towards its previous CPU.
> > > I don't know if it makes any difference but I see RT task placement is too
> > > keen on reducing the migrations unless it is absolutely needed.
> > 
> > In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask
> > and prefer it if it is.
> > 
> > But yeah I see it happening too
> > 
> > https://imgur.com/a/FYqLIko
> > 
> > Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't
> > show that.
> > 
> > I shouldn't have changed something to affect this bias. Do you think it's
> > something I introduced?
> > 
> > It's something maybe worth digging into though. I'll try to have a look.
> 
> FWIW, I dug a bit into this and I found out we have a thundering herd issue.
> 
> Since I just have a set of periodic task that all start together,
> select_task_rq_rt() ends up selecting the same fitting CPU for all of them
> (CPU1). The end up all waking up on CPU1, only to get pushed back out
> again with only one surviving.
> 
> This reshuffles the task placement ending with some tasks being swapped.
> 
> I don't think this problem is specific to my change and could happen without
> it.
> 
> The problem is caused by the way find_lowest_rq() selects a cpu in the mask
> 
> 1750                         best_cpu = cpumask_first_and(lowest_mask,
> 1751                                                      sched_domain_span(sd));
> 1752                         if (best_cpu < nr_cpu_ids) {
> 1753                                 rcu_read_unlock();
> 1754                                 return best_cpu;
> 1755                         }
> 
> It always returns the first CPU in the mask. Or the mask could only contain
> a single CPU too. The end result is that we most likely end up herding all the
> tasks that wake up simultaneously to the same CPU.
> 
> I'm not sure how to fix this problem yet.
> 

Yes, I have seen this problem too. This is not limited to RT even fair class
(find_energy_efficient_cpu path) also have the same issue. There is a window
where we select a CPU for the task and the task being queued there. Because of
this, we may select the same CPU for two successive waking tasks. Turning off
TTWU_QUEUE sched feature addresses this up to some extent. At least it would
solve the cases like multiple tasks getting woken up from an interrupt handler.

Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] sched/rt: allow pulling unfitting task
  2020-02-21  8:07       ` Pavan Kondeti
@ 2020-02-21 11:08         ` Qais Yousef
  0 siblings, 0 replies; 29+ messages in thread
From: Qais Yousef @ 2020-02-21 11:08 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 02/21/20 13:37, Pavan Kondeti wrote:
> Hi Quais,

I know my name breaks the English rules, but there's no u after my Q :)

> 
> On Wed, Feb 19, 2020 at 01:43:08PM +0000, Qais Yousef wrote:
> 
> [...]
> 
> > > 
> > > Here rq is source rq from which the task is being pulled. I can't understand
> > > how marking overload condition on source_rq help. Because overload condition
> > > gets cleared in the task dequeue path. i.e dec_rt_tasks->dec_rt_migration->
> > > update_rt_migration().
> > > 
> > > Also, the overload condition with nr_running=1 may not work as expected unless
> > > we track this overload condition (due to unfit) separately. Because a task
> > > can be pushed only when it is NOT running. So a task running on silver will
> > > continue to run there until it wakes up next time or another high prio task
> > > gets queued here (due to affinity).
> > > 
> > > btw, Are you testing this path by disabling RT_PUSH_IPI feature? I ask this
> > > because, This feature gets turned on by default in our b.L platforms and
> > > RT task migrations happens by the busy CPU pushing the tasks. Or are there
> > > any cases where we can run into pick_rt_task() even when RT_PUSH_IPI is
> > > enabled?
> > 
> > I changed the approach to set the overload at wake up now. I think I got away
> > without having to encode the reason in rq->rt.overload.
> > 
> > Steve, Pavan, if you can scrutinize this approach I'd be appreciated. It seemed
> > to work fine with my testing.
> > 
> 
> The 1st patch in this series added the support to return an unfit CPU incase

If you can give your reviewed-by and maybe tested-by for that patch that'd be
great.

> all other BIG CPUs are busy with RT tasks. I believe that change it self
> solves the RT tasks spreading problem if we just remove
> rt_task_fits_capacity() from pick_rt_task(). In fact rt_task_fits_capacity()

Yes I will be removing this.

> can also be removed from switched_to_rt() and task_woken_rt(). Because
> a RT task can be pulled/pushed only if it not currently running. If the

I did actually consider pushing a patch that does just that.

The issue is that I have seen delays in migrating the task to the big CPU,
although the CPU was free. It was hard to hit, but I had runs where the
migration didn't happen at all during the 1s test duration.

I need to use the same overloaded trick in switched_to_rt() too though.

> intention is to push/pull a running RT task from an unfit CPU to a BIG CPU,
> that won't work as we are not waking any migration/X to carry the migration.

Hmm, I thought this should happen as a side effect of the push/pull functions.

> 
> If an CPU has 2 RT tasks (excluding the running RT task), then we have a
> choice about which RT task to be pushed based on the destination CPU's
> capacity. Are we trying to solve this problem in this patch?

If there are 2 tasks on the same CPU (rq), then it's overloaded and the push
should be triggered anyway, no?

What I'm trying to solve is if a task has woken up on the wrong CPU, or has
just switched to RT on the wrong CPU, then we want to push ASAP.

We can rely on select_task_rq_rt() to do the work, but I have seen big delays
for this to trigger.

> 
> Please see the below comments as well and let me know if we are on the
> same page or not.
> 
> > 
> > 
> > When implemented RT Capacity Awareness; the logic was done such that if
> > a task was running on a fitting CPU, then it was sticky and we would try
> > our best to keep it there.
> > 
> > But as Steve suggested, to adhere to the strict priority rules of RT
> > class; allow pulling an RT task to unfitting CPU to ensure it gets a
> > chance to run ASAP.
> > 
> > To better handle the fact the task is running on unfit CPU, when it
> > wakes up mark it as overloaded which will cause it to be pushed to
> > a fitting CPU when it becomes available.
> > 
> > The latter change requires teaching push_rt_task() how to handle pushing
> > unfit task.
> > 
> > If the unfit task is the only pushable task, then we only force the push
> > if we find a fitting CPU. Otherwise we bail out.
> > 
> > Else if the task is higher priorirty than current task, then we
> > reschedule.
> > 
> > Else if the rq has other pushable tasks, then we push the unfitting task
> > anyway to reduce the pressure on the rq even if the target CPU is unfit
> > too.
> > 
> > Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > ---
> >  kernel/sched/rt.c | 52 +++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 48 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 6d959be4bba0..6d92219d5733 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1658,8 +1658,7 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
> >  static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
> >  {
> >  	if (!task_running(rq, p) &&
> > -	    cpumask_test_cpu(cpu, p->cpus_ptr) &&
> > -	    rt_task_fits_capacity(p, cpu))
> > +	    cpumask_test_cpu(cpu, p->cpus_ptr))
> >  		return 1;
> >  
> 
> Yes. We come here means rq has more than 1 runnable task, so we prefer
> spreading.

I might put this change only for this patch and send the fixes so they can go
before 5.6 is released. Unless the current patch ends up needing few tweaks
only.

> 
> >  	return 0;
> > @@ -1860,6 +1859,7 @@ static int push_rt_task(struct rq *rq)
> >  	struct task_struct *next_task;
> >  	struct rq *lowest_rq;
> >  	int ret = 0;
> > +	bool fit;
> >  
> >  	if (!rq->rt.overloaded)
> >  		return 0;
> > @@ -1872,12 +1872,21 @@ static int push_rt_task(struct rq *rq)
> >  	if (WARN_ON(next_task == rq->curr))
> >  		return 0;
> >  
> > +	/*
> > +	 * The rq could be overloaded because it has unfitting task, if that's
> > +	 * the case they we need to try harder to find a better fitting CPU.
> > +	 */
> > +	fit = rt_task_fits_capacity(next_task, cpu_of(rq));
> > +
> >  	/*
> >  	 * It's possible that the next_task slipped in of
> >  	 * higher priority than current. If that's the case
> >  	 * just reschedule current.
> > +	 *
> > +	 * Unless next_task doesn't fit in this cpu, then continue with the
> > +	 * attempt to push it.
> >  	 */
> > -	if (unlikely(next_task->prio < rq->curr->prio)) {
> > +	if (unlikely(next_task->prio < rq->curr->prio && fit)) {
> >  		resched_curr(rq);
> >  		return 0;
> >  	}
> > @@ -1920,6 +1929,33 @@ static int push_rt_task(struct rq *rq)
> >  		goto retry;
> >  	}
> >  
> > +	/*
> > +	 * Bail out if the task doesn't fit on either CPUs.
> > +	 *
> > +	 * Unless..
> > +	 *
> > +	 * * The priority of next_task is higher than current, then we
> > +	 *   resched_curr(). We forced skipping this condition above.
> > +	 *
> > +	 * * The rq has more tasks to push, then we probably should push anyway
> > +	 *   reduce the load on this rq.
> > +	 */
> > +	if (!fit && !rt_task_fits_capacity(next_task, cpu_of(lowest_rq))) {
> > +
> > +		/* we forced skipping this condition, so re-evaluate it */
> > +		if (unlikely(next_task->prio < rq->curr->prio)) {
> > +			resched_curr(rq);
> > +			goto out_unlock;
> > +		}
> > +
> > +		/*
> > +		 * If there are more tasks to push, then the rq is overloaded
> > +		 * with more than just this task, so push anyway.
> > +		 */
> > +		if (has_pushable_tasks(rq))
> > +			goto out_unlock;

Hmm, I accidently inverted the logic too before posting the patch. It should
have been (!has_pushable_tasks).

> 
> The next_task is not yet removed from the pushable_tasks list, so this will
> always be true and we skip pushing. Even if you add some logic, what happens

I saw the loop in push_rt_task[s]() and thought pick_next_pushable_task() is
what dequeues..

Is there another way to know how many tasks are pushable? Or do I need a new
helper?

> if the other task also does not fit? How would the task finally be pushed
> to other CPUs?

If we had 2 unfitting tasks; we'll push one and keep one. If we had 2 tasks,
one unfitting and one fitting, then we'll still push 1. I think the dequeue
should have cleared the overloaded state after pushing the first task anyway.

Does this answer your question or were you trying to point out something else?

> 
> > +	}
> > +
> >  	deactivate_task(rq, next_task, 0);
> >  	set_task_cpu(next_task, lowest_rq->cpu);
> >  	activate_task(lowest_rq, next_task, 0);
> > @@ -1927,6 +1963,7 @@ static int push_rt_task(struct rq *rq)
> >  
> >  	resched_curr(lowest_rq);
> >  
> > +out_unlock:
> >  	double_unlock_balance(rq, lowest_rq);
> >  
> >  out:
> > @@ -2223,7 +2260,14 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
> >  			    (rq->curr->nr_cpus_allowed < 2 ||
> >  			     rq->curr->prio <= p->prio);
> >  
> > -	if (need_to_push || !rt_task_fits_capacity(p, cpu_of(rq)))
> > +	bool fit = rt_task_fits_capacity(p, cpu_of(rq));
> > +
> > +	if (!fit && !rq->rt.overloaded) {
> > +		rt_set_overload(rq);
> > +		rq->rt.overloaded = 1;
> > +	}
> > +
> > +	if (need_to_push || !fit)
> >  		push_rt_tasks(rq);
> >  }
> 
> If this is the only RT task queued on this CPU and current task is not a RT
> task, why are we calling push_rt_tasks() in case if unfit scenario. In most
> cases, the task is woken on this rq, because there is no other higher capacity
> rq that can take this task for whatever reason.

But what if the big CPU is now free for it to run?

I need to add a similar condition to switched_to_rt() otherwise the
push_rt_tasks() would be a NOP.

> 
> btw, if we set overload condition with just 1 RT task, we keep receiving the
> IPI (irq work) to push the task and we can't do that because a running RT
> task can't be pushed.

Good point.

We can teach the logic not to send IPIs in this case?

Given all of that. I might still agree this might be unnecessary work. So I'll
split it out into a separate patch anyway and send an updated series of the
fixes we agreed on to go into 5.6.

Thanks!

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU
  2020-02-21  8:15         ` Pavan Kondeti
@ 2020-02-21 11:12           ` Qais Yousef
  0 siblings, 0 replies; 29+ messages in thread
From: Qais Yousef @ 2020-02-21 11:12 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Ingo Molnar, Peter Zijlstra, Steven Rostedt, Dietmar Eggemann,
	Juri Lelli, Vincent Guittot, Ben Segall, Mel Gorman,
	linux-kernel

On 02/21/20 13:45, Pavan Kondeti wrote:
> On Wed, Feb 19, 2020 at 02:02:44PM +0000, Qais Yousef wrote:
> > On 02/17/20 13:53, Qais Yousef wrote:
> > > On 02/17/20 14:53, Pavan Kondeti wrote:
> > > > I notice a case where tasks would migrate for no reason (happens without this
> > > > patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
> > > > task can go to *any* little CPU. There is no bias towards its previous CPU.
> > > > I don't know if it makes any difference but I see RT task placement is too
> > > > keen on reducing the migrations unless it is absolutely needed.
> > > 
> > > In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask
> > > and prefer it if it is.
> > > 
> > > But yeah I see it happening too
> > > 
> > > https://imgur.com/a/FYqLIko
> > > 
> > > Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't
> > > show that.
> > > 
> > > I shouldn't have changed something to affect this bias. Do you think it's
> > > something I introduced?
> > > 
> > > It's something maybe worth digging into though. I'll try to have a look.
> > 
> > FWIW, I dug a bit into this and I found out we have a thundering herd issue.
> > 
> > Since I just have a set of periodic task that all start together,
> > select_task_rq_rt() ends up selecting the same fitting CPU for all of them
> > (CPU1). The end up all waking up on CPU1, only to get pushed back out
> > again with only one surviving.
> > 
> > This reshuffles the task placement ending with some tasks being swapped.
> > 
> > I don't think this problem is specific to my change and could happen without
> > it.
> > 
> > The problem is caused by the way find_lowest_rq() selects a cpu in the mask
> > 
> > 1750                         best_cpu = cpumask_first_and(lowest_mask,
> > 1751                                                      sched_domain_span(sd));
> > 1752                         if (best_cpu < nr_cpu_ids) {
> > 1753                                 rcu_read_unlock();
> > 1754                                 return best_cpu;
> > 1755                         }
> > 
> > It always returns the first CPU in the mask. Or the mask could only contain
> > a single CPU too. The end result is that we most likely end up herding all the
> > tasks that wake up simultaneously to the same CPU.
> > 
> > I'm not sure how to fix this problem yet.
> > 
> 
> Yes, I have seen this problem too. This is not limited to RT even fair class
> (find_energy_efficient_cpu path) also have the same issue. There is a window
> where we select a CPU for the task and the task being queued there. Because of
> this, we may select the same CPU for two successive waking tasks. Turning off
> TTWU_QUEUE sched feature addresses this up to some extent. At least it would
> solve the cases like multiple tasks getting woken up from an interrupt handler.

Oh, handy. Let me try this out.

I added it to my to-do to investigate it when I have time anyway.

In modern systems where L3 is spanning all CPUs, the migration isn't that
costly, but it'd still be unnecessary wakeup latency that can add up.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-02-21 11:12 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 16:39 [PATCH 0/3] RT Capacity Awareness Improvements Qais Yousef
2020-02-14 16:39 ` [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case Qais Yousef
2020-02-17 17:07   ` Valentin Schneider
2020-02-17 23:34     ` Qais Yousef
2020-02-18 10:01       ` Valentin Schneider
2020-02-17 19:09   ` Dietmar Eggemann
2020-02-17 23:45     ` Qais Yousef
2020-02-18  9:53       ` Dietmar Eggemann
2020-02-18 17:28         ` Qais Yousef
2020-02-18 16:46       ` Steven Rostedt
2020-02-18 17:27         ` Qais Yousef
2020-02-18 18:03           ` Steven Rostedt
2020-02-18 18:52             ` Qais Yousef
2020-02-14 16:39 ` [PATCH 2/3] sched/rt: allow pulling unfitting task Qais Yousef
2020-02-17  9:10   ` Pavan Kondeti
2020-02-17 11:20     ` Qais Yousef
2020-02-19 13:43     ` Qais Yousef
2020-02-21  8:07       ` Pavan Kondeti
2020-02-21 11:08         ` Qais Yousef
2020-02-14 16:39 ` [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU Qais Yousef
2020-02-17  9:23   ` Pavan Kondeti
2020-02-17 13:53     ` Qais Yousef
2020-02-18  4:16       ` Pavan Kondeti
2020-02-18 17:47         ` Qais Yousef
2020-02-19  2:46           ` Pavan Kondeti
2020-02-19 10:46             ` Qais Yousef
2020-02-19 14:02       ` Qais Yousef
2020-02-21  8:15         ` Pavan Kondeti
2020-02-21 11:12           ` Qais Yousef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).