linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs
@ 2019-06-26  5:06 Viresh Kumar
  2019-06-26  5:06 ` [PATCH V3 1/2] sched: Start tracking SCHED_IDLE tasks count in cfs_rq Viresh Kumar
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Viresh Kumar @ 2019-06-26  5:06 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Viresh Kumar, linux-kernel, Vincent Guittot, tkjos,
	Daniel Lezcano, quentin.perret, chris.redpath, steven.sistare,
	subhra.mazumdar, songliubraving

Hi,

We try to find an idle CPU to run the next task, but in case we don't
find an idle CPU it is better to pick a CPU which will run the task the
soonest, for performance reason.

A CPU which isn't idle but has only SCHED_IDLE activity queued on it
should be a good target based on this criteria as any normal fair task
will most likely preempt the currently running SCHED_IDLE task
immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one
shall give better results as it should be able to run the task sooner
than an idle CPU (which requires to be woken up from an idle state).

This patchset updates both fast and slow paths with this optimization.

Testing is done with the help of rt-app currently and here are the
details:

- Tested on Octacore Hikey platform (all CPUs change frequency
  together).

- rt-app json [1] creates few tasks and we monitor the scheduling
  latency for them by looking at "wu_lat" field (usec).

- The histograms are created using
  https://github.com/adkein/textogram: textogram -a 0 -z 1000 -n 10

- the stats are accumulated using: https://github.com/nferraz/st

- NOTE: The % values shown don't add up, just look at total numbers
  instead


Test 1: Create 8 CFS tasks (no SCHED_IDLE tasks) without this patchset:

      0 - 100  : ##################################################   72% (3688)
    100 - 200  : ################                                     24% (1253)
    200 - 300  : ##                                                    2% (149)
    300 - 400  :                                                       0% (22)
    400 - 500  :                                                       0% (1)
    500 - 600  :                                                       0% (3)
    600 - 700  :                                                       0% (1)
    700 - 800  :                                                       0% (1)
    800 - 900  :
    900 - 1000 :                                                       0% (1)
         >1000 : 0% (17)

   N       min     max     sum     mean    stddev
   5136    0       2452    535985  104.358 104.585


Test 2: Create 8 CFS tasks and 5 SCHED_IDLE tasks:

        A. Without sched-idle patchset:

      0 - 100  : ##################################################   88% (3102)
    100 - 200  : ##                                                    4% (148)
    200 - 300  :                                                       1% (41)
    300 - 400  :                                                       0% (27)
    400 - 500  :                                                       0% (33)
    500 - 600  :                                                       0% (32)
    600 - 700  :                                                       1% (36)
    700 - 800  :                                                       0% (27)
    800 - 900  :                                                       0% (19)
    900 - 1000 :                                                       0% (26)
         >1000 : 34% (1218)

   N       min     max     sum             mean    stddev
   4710    0       67664   5.25956e+06     1116.68 2315.09


        B. With sched-idle patchset:

      0 - 100  : ##################################################   99% (5042)
    100 - 200  :                                                       0% (8)
    200 - 300  :
    300 - 400  :
    400 - 500  :                                                       0% (2)
    500 - 600  :                                                       0% (1)
    600 - 700  :
    700 - 800  :                                                       0% (1)
    800 - 900  :                                                       0% (1)
    900 - 1000 :
         >1000 : 0% (40)

   N       min     max     sum     mean    stddev
   5095    0       7773    523170  102.683 475.482


The mean latency dropped to 10% and the stddev to around 25% with this
patchset.

@Song: Can you please see if the slowpath changes bring any further
improvements in your test case ?

V2->V3:
- Removed a pointless branch from 1/2 (PeterZ).
- Removed the RFC tags as patches are getting in better shape now.
- Updated the slow path as well, earlier versions only supported fast
  paths.
- Rebased over latest tip/master, fixed rebase conflicts.
- Improved commit logs.

--
viresh

[1] https://pastebin.com/TMHGGBxD

Viresh Kumar (2):
  sched: Start tracking SCHED_IDLE tasks count in cfs_rq
  sched/fair: Fallback to sched-idle CPU if idle CPU isn't found

 kernel/sched/fair.c  | 57 ++++++++++++++++++++++++++++++++++----------
 kernel/sched/sched.h |  2 ++
 2 files changed, 47 insertions(+), 12 deletions(-)

-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH V3 1/2] sched: Start tracking SCHED_IDLE tasks count in cfs_rq
  2019-06-26  5:06 [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs Viresh Kumar
@ 2019-06-26  5:06 ` Viresh Kumar
  2019-07-25 16:15   ` [tip:sched/core] sched/fair: " tip-bot for Viresh Kumar
  2019-06-26  5:06 ` [PATCH V3 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found Viresh Kumar
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Viresh Kumar @ 2019-06-26  5:06 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Viresh Kumar, linux-kernel, Vincent Guittot, tkjos,
	Daniel Lezcano, quentin.perret, chris.redpath, steven.sistare,
	subhra.mazumdar, songliubraving

Track how many tasks are present with SCHED_IDLE policy in each cfs_rq.
This will be used by later commits.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/sched/fair.c  | 14 ++++++++++++--
 kernel/sched/sched.h |  2 ++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 036be95a87e9..1277adc3e7ed 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4500,7 +4500,7 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
 	struct rq *rq = rq_of(cfs_rq);
 	struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
 	struct sched_entity *se;
-	long task_delta, dequeue = 1;
+	long task_delta, idle_task_delta, dequeue = 1;
 	bool empty;
 
 	se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
@@ -4511,6 +4511,7 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
 	rcu_read_unlock();
 
 	task_delta = cfs_rq->h_nr_running;
+	idle_task_delta = cfs_rq->idle_h_nr_running;
 	for_each_sched_entity(se) {
 		struct cfs_rq *qcfs_rq = cfs_rq_of(se);
 		/* throttled entity or throttle-on-deactivate */
@@ -4520,6 +4521,7 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
 		if (dequeue)
 			dequeue_entity(qcfs_rq, se, DEQUEUE_SLEEP);
 		qcfs_rq->h_nr_running -= task_delta;
+		qcfs_rq->idle_h_nr_running -= idle_task_delta;
 
 		if (qcfs_rq->load.weight)
 			dequeue = 0;
@@ -4559,7 +4561,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 	struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
 	struct sched_entity *se;
 	int enqueue = 1;
-	long task_delta;
+	long task_delta, idle_task_delta;
 
 	se = cfs_rq->tg->se[cpu_of(rq)];
 
@@ -4579,6 +4581,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 		return;
 
 	task_delta = cfs_rq->h_nr_running;
+	idle_task_delta = cfs_rq->idle_h_nr_running;
 	for_each_sched_entity(se) {
 		if (se->on_rq)
 			enqueue = 0;
@@ -4587,6 +4590,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 		if (enqueue)
 			enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
 		cfs_rq->h_nr_running += task_delta;
+		cfs_rq->idle_h_nr_running += idle_task_delta;
 
 		if (cfs_rq_throttled(cfs_rq))
 			break;
@@ -5200,6 +5204,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 {
 	struct cfs_rq *cfs_rq;
 	struct sched_entity *se = &p->se;
+	int idle_h_nr_running = task_has_idle_policy(p);
 
 	/*
 	 * The code below (indirectly) updates schedutil which looks at
@@ -5232,6 +5237,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		if (cfs_rq_throttled(cfs_rq))
 			break;
 		cfs_rq->h_nr_running++;
+		cfs_rq->idle_h_nr_running += idle_h_nr_running;
 
 		flags = ENQUEUE_WAKEUP;
 	}
@@ -5239,6 +5245,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
 		cfs_rq->h_nr_running++;
+		cfs_rq->idle_h_nr_running += idle_h_nr_running;
 
 		if (cfs_rq_throttled(cfs_rq))
 			break;
@@ -5300,6 +5307,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	struct cfs_rq *cfs_rq;
 	struct sched_entity *se = &p->se;
 	int task_sleep = flags & DEQUEUE_SLEEP;
+	int idle_h_nr_running = task_has_idle_policy(p);
 
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
@@ -5314,6 +5322,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		if (cfs_rq_throttled(cfs_rq))
 			break;
 		cfs_rq->h_nr_running--;
+		cfs_rq->idle_h_nr_running -= idle_h_nr_running;
 
 		/* Don't dequeue parent if it has other entities besides us */
 		if (cfs_rq->load.weight) {
@@ -5333,6 +5342,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
 		cfs_rq->h_nr_running--;
+		cfs_rq->idle_h_nr_running -= idle_h_nr_running;
 
 		if (cfs_rq_throttled(cfs_rq))
 			break;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 802b1f3405f2..1f95411f5e61 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -484,6 +484,8 @@ struct cfs_rq {
 	unsigned long		runnable_weight;
 	unsigned int		nr_running;
 	unsigned int		h_nr_running;
+	/* h_nr_running for SCHED_IDLE tasks */
+	unsigned int		idle_h_nr_running;
 
 	u64			exec_clock;
 	u64			min_vruntime;
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found
  2019-06-26  5:06 [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs Viresh Kumar
  2019-06-26  5:06 ` [PATCH V3 1/2] sched: Start tracking SCHED_IDLE tasks count in cfs_rq Viresh Kumar
@ 2019-06-26  5:06 ` Viresh Kumar
  2019-06-29  1:16   ` Subhra Mazumdar
  2019-07-25 16:16   ` [tip:sched/core] sched/fair: Fall back " tip-bot for Viresh Kumar
  2019-07-01 13:43 ` [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs Peter Zijlstra
  2019-12-09  3:50 ` Wanpeng Li
  3 siblings, 2 replies; 15+ messages in thread
From: Viresh Kumar @ 2019-06-26  5:06 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Viresh Kumar, linux-kernel, Vincent Guittot, tkjos,
	Daniel Lezcano, quentin.perret, chris.redpath, steven.sistare,
	subhra.mazumdar, songliubraving

We try to find an idle CPU to run the next task, but in case we don't
find an idle CPU it is better to pick a CPU which will run the task the
soonest, for performance reason.

A CPU which isn't idle but has only SCHED_IDLE activity queued on it
should be a good target based on this criteria as any normal fair task
will most likely preempt the currently running SCHED_IDLE task
immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one
shall give better results as it should be able to run the task sooner
than an idle CPU (which requires to be woken up from an idle state).

This patch updates both fast and slow paths with this optimization.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++----------
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1277adc3e7ed..2e0527fd468c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5376,6 +5376,15 @@ static struct {
 
 #endif /* CONFIG_NO_HZ_COMMON */
 
+/* CPU only has SCHED_IDLE tasks enqueued */
+static int sched_idle_cpu(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&
+			rq->nr_running);
+}
+
 static unsigned long cpu_runnable_load(struct rq *rq)
 {
 	return cfs_rq_runnable_load_avg(&rq->cfs);
@@ -5698,7 +5707,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 	unsigned int min_exit_latency = UINT_MAX;
 	u64 latest_idle_timestamp = 0;
 	int least_loaded_cpu = this_cpu;
-	int shallowest_idle_cpu = -1;
+	int shallowest_idle_cpu = -1, si_cpu = -1;
 	int i;
 
 	/* Check if we have any choice: */
@@ -5729,7 +5738,12 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 				latest_idle_timestamp = rq->idle_stamp;
 				shallowest_idle_cpu = i;
 			}
-		} else if (shallowest_idle_cpu == -1) {
+		} else if (shallowest_idle_cpu == -1 && si_cpu == -1) {
+			if (sched_idle_cpu(i)) {
+				si_cpu = i;
+				continue;
+			}
+
 			load = cpu_runnable_load(cpu_rq(i));
 			if (load < min_load) {
 				min_load = load;
@@ -5738,7 +5752,11 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 		}
 	}
 
-	return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
+	if (shallowest_idle_cpu != -1)
+		return shallowest_idle_cpu;
+	if (si_cpu != -1)
+		return si_cpu;
+	return least_loaded_cpu;
 }
 
 static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p,
@@ -5891,7 +5909,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
  */
 static int select_idle_smt(struct task_struct *p, int target)
 {
-	int cpu;
+	int cpu, si_cpu = -1;
 
 	if (!static_branch_likely(&sched_smt_present))
 		return -1;
@@ -5901,9 +5919,11 @@ static int select_idle_smt(struct task_struct *p, int target)
 			continue;
 		if (available_idle_cpu(cpu))
 			return cpu;
+		if (si_cpu == -1 && sched_idle_cpu(cpu))
+			si_cpu = cpu;
 	}
 
-	return -1;
+	return si_cpu;
 }
 
 #else /* CONFIG_SCHED_SMT */
@@ -5931,7 +5951,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	u64 avg_cost, avg_idle;
 	u64 time, cost;
 	s64 delta;
-	int cpu, nr = INT_MAX;
+	int cpu, nr = INT_MAX, si_cpu = -1;
 	int this = smp_processor_id();
 
 	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
@@ -5960,11 +5980,13 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 
 	for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
 		if (!--nr)
-			return -1;
+			return si_cpu;
 		if (!cpumask_test_cpu(cpu, p->cpus_ptr))
 			continue;
 		if (available_idle_cpu(cpu))
 			break;
+		if (si_cpu == -1 && sched_idle_cpu(cpu))
+			si_cpu = cpu;
 	}
 
 	time = cpu_clock(this) - time;
@@ -5983,13 +6005,14 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	struct sched_domain *sd;
 	int i, recent_used_cpu;
 
-	if (available_idle_cpu(target))
+	if (available_idle_cpu(target) || sched_idle_cpu(target))
 		return target;
 
 	/*
 	 * If the previous CPU is cache affine and idle, don't be stupid:
 	 */
-	if (prev != target && cpus_share_cache(prev, target) && available_idle_cpu(prev))
+	if (prev != target && cpus_share_cache(prev, target) &&
+	    (available_idle_cpu(prev) || sched_idle_cpu(prev)))
 		return prev;
 
 	/* Check a recently used CPU as a potential idle candidate: */
@@ -5997,7 +6020,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (recent_used_cpu != prev &&
 	    recent_used_cpu != target &&
 	    cpus_share_cache(recent_used_cpu, target) &&
-	    available_idle_cpu(recent_used_cpu) &&
+	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
 	    cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr)) {
 		/*
 		 * Replace recent_used_cpu with prev as it is a potential
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V3 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found
  2019-06-26  5:06 ` [PATCH V3 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found Viresh Kumar
@ 2019-06-29  1:16   ` Subhra Mazumdar
  2019-07-01  8:03     ` Viresh Kumar
  2019-07-25 16:16   ` [tip:sched/core] sched/fair: Fall back " tip-bot for Viresh Kumar
  1 sibling, 1 reply; 15+ messages in thread
From: Subhra Mazumdar @ 2019-06-29  1:16 UTC (permalink / raw)
  To: Viresh Kumar, Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Vincent Guittot, tkjos, Daniel Lezcano,
	quentin.perret, chris.redpath, steven.sistare, songliubraving


On 6/25/19 10:06 PM, Viresh Kumar wrote:
> We try to find an idle CPU to run the next task, but in case we don't
> find an idle CPU it is better to pick a CPU which will run the task the
> soonest, for performance reason.
>
> A CPU which isn't idle but has only SCHED_IDLE activity queued on it
> should be a good target based on this criteria as any normal fair task
> will most likely preempt the currently running SCHED_IDLE task
> immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one
> shall give better results as it should be able to run the task sooner
> than an idle CPU (which requires to be woken up from an idle state).
>
> This patch updates both fast and slow paths with this optimization.
>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>   kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++----------
>   1 file changed, 33 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1277adc3e7ed..2e0527fd468c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5376,6 +5376,15 @@ static struct {
>   
>   #endif /* CONFIG_NO_HZ_COMMON */
>   
> +/* CPU only has SCHED_IDLE tasks enqueued */
> +static int sched_idle_cpu(int cpu)
> +{
> +	struct rq *rq = cpu_rq(cpu);
> +
> +	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&
> +			rq->nr_running);
> +}
> +
Shouldn't this check if rq->curr is also sched idle? And why not drop the
rq->nr_running non zero check?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found
  2019-06-29  1:16   ` Subhra Mazumdar
@ 2019-07-01  8:03     ` Viresh Kumar
  2019-07-01 22:08       ` Subhra Mazumdar
  0 siblings, 1 reply; 15+ messages in thread
From: Viresh Kumar @ 2019-07-01  8:03 UTC (permalink / raw)
  To: Subhra Mazumdar
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Vincent Guittot,
	tkjos, Daniel Lezcano, quentin.perret, chris.redpath,
	steven.sistare, songliubraving

On 28-06-19, 18:16, Subhra Mazumdar wrote:
> 
> On 6/25/19 10:06 PM, Viresh Kumar wrote:
> > We try to find an idle CPU to run the next task, but in case we don't
> > find an idle CPU it is better to pick a CPU which will run the task the
> > soonest, for performance reason.
> > 
> > A CPU which isn't idle but has only SCHED_IDLE activity queued on it
> > should be a good target based on this criteria as any normal fair task
> > will most likely preempt the currently running SCHED_IDLE task
> > immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one
> > shall give better results as it should be able to run the task sooner
> > than an idle CPU (which requires to be woken up from an idle state).
> > 
> > This patch updates both fast and slow paths with this optimization.
> > 
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
> >   kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++----------
> >   1 file changed, 33 insertions(+), 10 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 1277adc3e7ed..2e0527fd468c 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5376,6 +5376,15 @@ static struct {
> >   #endif /* CONFIG_NO_HZ_COMMON */
> > +/* CPU only has SCHED_IDLE tasks enqueued */
> > +static int sched_idle_cpu(int cpu)
> > +{
> > +	struct rq *rq = cpu_rq(cpu);
> > +
> > +	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&
> > +			rq->nr_running);
> > +}
> > +
> Shouldn't this check if rq->curr is also sched idle?

Why wouldn't the current set of checks be enough to guarantee that ?

> And why not drop the rq->nr_running non zero check?

Because CPU isn't sched-idle if nr_running and idle_h_nr_running are both 0,
i.e. it is an IDLE cpu in that case. And so I thought it is important to have
this check as well.

-- 
viresh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs
  2019-06-26  5:06 [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs Viresh Kumar
  2019-06-26  5:06 ` [PATCH V3 1/2] sched: Start tracking SCHED_IDLE tasks count in cfs_rq Viresh Kumar
  2019-06-26  5:06 ` [PATCH V3 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found Viresh Kumar
@ 2019-07-01 13:43 ` Peter Zijlstra
  2019-07-03  9:13   ` Viresh Kumar
  2019-12-09  3:50 ` Wanpeng Li
  3 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2019-07-01 13:43 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ingo Molnar, linux-kernel, Vincent Guittot, tkjos,
	Daniel Lezcano, quentin.perret, chris.redpath, steven.sistare,
	subhra.mazumdar, songliubraving

On Wed, Jun 26, 2019 at 10:36:28AM +0530, Viresh Kumar wrote:
> Hi,
> 
> We try to find an idle CPU to run the next task, but in case we don't
> find an idle CPU it is better to pick a CPU which will run the task the
> soonest, for performance reason.
> 
> A CPU which isn't idle but has only SCHED_IDLE activity queued on it
> should be a good target based on this criteria as any normal fair task
> will most likely preempt the currently running SCHED_IDLE task
> immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one
> shall give better results as it should be able to run the task sooner
> than an idle CPU (which requires to be woken up from an idle state).
> 
> This patchset updates both fast and slow paths with this optimization.

So this basically does the trivial SCHED_IDLE<-* wakeup preemption test;
one could consider doing the full wakeup preemption test instead.

Now; the obvious argument against doing this is cost; esp. the cgroup
case is very expensive I suppose. But it might be a fun experiment to
try.

That said; I'm tempted to apply these patches..

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found
  2019-07-01  8:03     ` Viresh Kumar
@ 2019-07-01 22:08       ` Subhra Mazumdar
  2019-07-02  8:35         ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Subhra Mazumdar @ 2019-07-01 22:08 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Vincent Guittot,
	tkjos, Daniel Lezcano, quentin.perret, chris.redpath,
	steven.sistare, songliubraving


On 7/1/19 1:03 AM, Viresh Kumar wrote:
> On 28-06-19, 18:16, Subhra Mazumdar wrote:
>> On 6/25/19 10:06 PM, Viresh Kumar wrote:
>>> We try to find an idle CPU to run the next task, but in case we don't
>>> find an idle CPU it is better to pick a CPU which will run the task the
>>> soonest, for performance reason.
>>>
>>> A CPU which isn't idle but has only SCHED_IDLE activity queued on it
>>> should be a good target based on this criteria as any normal fair task
>>> will most likely preempt the currently running SCHED_IDLE task
>>> immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one
>>> shall give better results as it should be able to run the task sooner
>>> than an idle CPU (which requires to be woken up from an idle state).
>>>
>>> This patch updates both fast and slow paths with this optimization.
>>>
>>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>>> ---
>>>    kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++----------
>>>    1 file changed, 33 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 1277adc3e7ed..2e0527fd468c 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -5376,6 +5376,15 @@ static struct {
>>>    #endif /* CONFIG_NO_HZ_COMMON */
>>> +/* CPU only has SCHED_IDLE tasks enqueued */
>>> +static int sched_idle_cpu(int cpu)
>>> +{
>>> +	struct rq *rq = cpu_rq(cpu);
>>> +
>>> +	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&
>>> +			rq->nr_running);
>>> +}
>>> +
>> Shouldn't this check if rq->curr is also sched idle?
> Why wouldn't the current set of checks be enough to guarantee that ?
I thought nr_running does not include the on-cpu thread.
>
>> And why not drop the rq->nr_running non zero check?
> Because CPU isn't sched-idle if nr_running and idle_h_nr_running are both 0,
> i.e. it is an IDLE cpu in that case. And so I thought it is important to have
> this check as well.
>
idle_cpu() not only checks nr_running is 0 but also rq->curr == rq->idle

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found
  2019-07-01 22:08       ` Subhra Mazumdar
@ 2019-07-02  8:35         ` Peter Zijlstra
  2019-07-02 16:32           ` Subhra Mazumdar
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2019-07-02  8:35 UTC (permalink / raw)
  To: Subhra Mazumdar
  Cc: Viresh Kumar, Ingo Molnar, linux-kernel, Vincent Guittot, tkjos,
	Daniel Lezcano, quentin.perret, chris.redpath, steven.sistare,
	songliubraving

On Mon, Jul 01, 2019 at 03:08:41PM -0700, Subhra Mazumdar wrote:
> On 7/1/19 1:03 AM, Viresh Kumar wrote:
> > On 28-06-19, 18:16, Subhra Mazumdar wrote:
> > > On 6/25/19 10:06 PM, Viresh Kumar wrote:

> > > > @@ -5376,6 +5376,15 @@ static struct {
> > > >    #endif /* CONFIG_NO_HZ_COMMON */
> > > > +/* CPU only has SCHED_IDLE tasks enqueued */
> > > > +static int sched_idle_cpu(int cpu)
> > > > +{
> > > > +	struct rq *rq = cpu_rq(cpu);
> > > > +
> > > > +	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&
> > > > +			rq->nr_running);
> > > > +}
> > > > +
> > > Shouldn't this check if rq->curr is also sched idle?
> > Why wouldn't the current set of checks be enough to guarantee that ?
> I thought nr_running does not include the on-cpu thread.

It very much does.

> > > And why not drop the rq->nr_running non zero check?
> > Because CPU isn't sched-idle if nr_running and idle_h_nr_running are both 0,
> > i.e. it is an IDLE cpu in that case. And so I thought it is important to have
> > this check as well.
> > 
> idle_cpu() not only checks nr_running is 0 but also rq->curr == rq->idle

idle_cpu() will try very hard to declare a CPU !idle. But I don't see
how that it relevant. sched_idle_cpu() will only return true if there
are only SCHED_IDLE tasks on the CPU. Viresh's test is simple and
straight forward.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found
  2019-07-02  8:35         ` Peter Zijlstra
@ 2019-07-02 16:32           ` Subhra Mazumdar
  0 siblings, 0 replies; 15+ messages in thread
From: Subhra Mazumdar @ 2019-07-02 16:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Viresh Kumar, Ingo Molnar, linux-kernel, Vincent Guittot, tkjos,
	Daniel Lezcano, quentin.perret, chris.redpath, steven.sistare,
	songliubraving


On 7/2/19 1:35 AM, Peter Zijlstra wrote:
> On Mon, Jul 01, 2019 at 03:08:41PM -0700, Subhra Mazumdar wrote:
>> On 7/1/19 1:03 AM, Viresh Kumar wrote:
>>> On 28-06-19, 18:16, Subhra Mazumdar wrote:
>>>> On 6/25/19 10:06 PM, Viresh Kumar wrote:
>>>>> @@ -5376,6 +5376,15 @@ static struct {
>>>>>     #endif /* CONFIG_NO_HZ_COMMON */
>>>>> +/* CPU only has SCHED_IDLE tasks enqueued */
>>>>> +static int sched_idle_cpu(int cpu)
>>>>> +{
>>>>> +	struct rq *rq = cpu_rq(cpu);
>>>>> +
>>>>> +	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&
>>>>> +			rq->nr_running);
>>>>> +}
>>>>> +
>>>> Shouldn't this check if rq->curr is also sched idle?
>>> Why wouldn't the current set of checks be enough to guarantee that ?
>> I thought nr_running does not include the on-cpu thread.
> It very much does.
>
>>>> And why not drop the rq->nr_running non zero check?
>>> Because CPU isn't sched-idle if nr_running and idle_h_nr_running are both 0,
>>> i.e. it is an IDLE cpu in that case. And so I thought it is important to have
>>> this check as well.
>>>
>> idle_cpu() not only checks nr_running is 0 but also rq->curr == rq->idle
> idle_cpu() will try very hard to declare a CPU !idle. But I don't see
> how that it relevant. sched_idle_cpu() will only return true if there
> are only SCHED_IDLE tasks on the CPU. Viresh's test is simple and
> straight forward.

OK makes sense.

Thanks,
Subhra


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs
  2019-07-01 13:43 ` [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs Peter Zijlstra
@ 2019-07-03  9:13   ` Viresh Kumar
  0 siblings, 0 replies; 15+ messages in thread
From: Viresh Kumar @ 2019-07-03  9:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Vincent Guittot, tkjos,
	Daniel Lezcano, quentin.perret, chris.redpath, steven.sistare,
	subhra.mazumdar, songliubraving

On 01-07-19, 15:43, Peter Zijlstra wrote:
> On Wed, Jun 26, 2019 at 10:36:28AM +0530, Viresh Kumar wrote:
> > Hi,
> > 
> > We try to find an idle CPU to run the next task, but in case we don't
> > find an idle CPU it is better to pick a CPU which will run the task the
> > soonest, for performance reason.
> > 
> > A CPU which isn't idle but has only SCHED_IDLE activity queued on it
> > should be a good target based on this criteria as any normal fair task
> > will most likely preempt the currently running SCHED_IDLE task
> > immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one
> > shall give better results as it should be able to run the task sooner
> > than an idle CPU (which requires to be woken up from an idle state).
> > 
> > This patchset updates both fast and slow paths with this optimization.
> 
> So this basically does the trivial SCHED_IDLE<-* wakeup preemption test;

Right.

> one could consider doing the full wakeup preemption test instead.

I am not sure what you meant by "full wakeup preemption test" :(

> Now; the obvious argument against doing this is cost; esp. the cgroup
> case is very expensive I suppose. But it might be a fun experiment to
> try.

> That said; I'm tempted to apply these patches..

Please do, who is stopping you :)

-- 
viresh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched/fair: Start tracking SCHED_IDLE tasks count in cfs_rq
  2019-06-26  5:06 ` [PATCH V3 1/2] sched: Start tracking SCHED_IDLE tasks count in cfs_rq Viresh Kumar
@ 2019-07-25 16:15   ` tip-bot for Viresh Kumar
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Viresh Kumar @ 2019-07-25 16:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, peterz, tglx, daniel.lezcano, torvalds, mingo,
	vincent.guittot, hpa, viresh.kumar

Commit-ID:  43e9f7f231e40e4534fc3a735da152911a085c16
Gitweb:     https://git.kernel.org/tip/43e9f7f231e40e4534fc3a735da152911a085c16
Author:     Viresh Kumar <viresh.kumar@linaro.org>
AuthorDate: Wed, 26 Jun 2019 10:36:29 +0530
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 25 Jul 2019 15:51:53 +0200

sched/fair: Start tracking SCHED_IDLE tasks count in cfs_rq

Track how many tasks are present with SCHED_IDLE policy in each cfs_rq.
This will be used by later commits.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: chris.redpath@arm.com
Cc: quentin.perret@linaro.org
Cc: songliubraving@fb.com
Cc: steven.sistare@oracle.com
Cc: subhra.mazumdar@oracle.com
Cc: tkjos@google.com
Link: https://lkml.kernel.org/r/0d3cdc427fc68808ad5bccc40e86ed0bf9da8bb4.1561523542.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c  | 14 ++++++++++++--
 kernel/sched/sched.h |  3 ++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9be36ffb5689..9ed5ab53872f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4555,7 +4555,7 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
 	struct rq *rq = rq_of(cfs_rq);
 	struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
 	struct sched_entity *se;
-	long task_delta, dequeue = 1;
+	long task_delta, idle_task_delta, dequeue = 1;
 	bool empty;
 
 	se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
@@ -4566,6 +4566,7 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
 	rcu_read_unlock();
 
 	task_delta = cfs_rq->h_nr_running;
+	idle_task_delta = cfs_rq->idle_h_nr_running;
 	for_each_sched_entity(se) {
 		struct cfs_rq *qcfs_rq = cfs_rq_of(se);
 		/* throttled entity or throttle-on-deactivate */
@@ -4575,6 +4576,7 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
 		if (dequeue)
 			dequeue_entity(qcfs_rq, se, DEQUEUE_SLEEP);
 		qcfs_rq->h_nr_running -= task_delta;
+		qcfs_rq->idle_h_nr_running -= idle_task_delta;
 
 		if (qcfs_rq->load.weight)
 			dequeue = 0;
@@ -4614,7 +4616,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 	struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
 	struct sched_entity *se;
 	int enqueue = 1;
-	long task_delta;
+	long task_delta, idle_task_delta;
 
 	se = cfs_rq->tg->se[cpu_of(rq)];
 
@@ -4634,6 +4636,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 		return;
 
 	task_delta = cfs_rq->h_nr_running;
+	idle_task_delta = cfs_rq->idle_h_nr_running;
 	for_each_sched_entity(se) {
 		if (se->on_rq)
 			enqueue = 0;
@@ -4642,6 +4645,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 		if (enqueue)
 			enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
 		cfs_rq->h_nr_running += task_delta;
+		cfs_rq->idle_h_nr_running += idle_task_delta;
 
 		if (cfs_rq_throttled(cfs_rq))
 			break;
@@ -5255,6 +5259,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 {
 	struct cfs_rq *cfs_rq;
 	struct sched_entity *se = &p->se;
+	int idle_h_nr_running = task_has_idle_policy(p);
 
 	/*
 	 * The code below (indirectly) updates schedutil which looks at
@@ -5287,6 +5292,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		if (cfs_rq_throttled(cfs_rq))
 			break;
 		cfs_rq->h_nr_running++;
+		cfs_rq->idle_h_nr_running += idle_h_nr_running;
 
 		flags = ENQUEUE_WAKEUP;
 	}
@@ -5294,6 +5300,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
 		cfs_rq->h_nr_running++;
+		cfs_rq->idle_h_nr_running += idle_h_nr_running;
 
 		if (cfs_rq_throttled(cfs_rq))
 			break;
@@ -5355,6 +5362,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	struct cfs_rq *cfs_rq;
 	struct sched_entity *se = &p->se;
 	int task_sleep = flags & DEQUEUE_SLEEP;
+	int idle_h_nr_running = task_has_idle_policy(p);
 
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
@@ -5369,6 +5377,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		if (cfs_rq_throttled(cfs_rq))
 			break;
 		cfs_rq->h_nr_running--;
+		cfs_rq->idle_h_nr_running -= idle_h_nr_running;
 
 		/* Don't dequeue parent if it has other entities besides us */
 		if (cfs_rq->load.weight) {
@@ -5388,6 +5397,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
 		cfs_rq->h_nr_running--;
+		cfs_rq->idle_h_nr_running -= idle_h_nr_running;
 
 		if (cfs_rq_throttled(cfs_rq))
 			break;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 802b1f3405f2..aaca0e743776 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -483,7 +483,8 @@ struct cfs_rq {
 	struct load_weight	load;
 	unsigned long		runnable_weight;
 	unsigned int		nr_running;
-	unsigned int		h_nr_running;
+	unsigned int		h_nr_running;      /* SCHED_{NORMAL,BATCH,IDLE} */
+	unsigned int		idle_h_nr_running; /* SCHED_IDLE */
 
 	u64			exec_clock;
 	u64			min_vruntime;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched/fair: Fall back to sched-idle CPU if idle CPU isn't found
  2019-06-26  5:06 ` [PATCH V3 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found Viresh Kumar
  2019-06-29  1:16   ` Subhra Mazumdar
@ 2019-07-25 16:16   ` tip-bot for Viresh Kumar
  1 sibling, 0 replies; 15+ messages in thread
From: tip-bot for Viresh Kumar @ 2019-07-25 16:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, daniel.lezcano, linux-kernel, viresh.kumar, hpa, peterz,
	vincent.guittot, torvalds, mingo

Commit-ID:  3c29e651e16dd3b3179cfb2d055ee9538e37515c
Gitweb:     https://git.kernel.org/tip/3c29e651e16dd3b3179cfb2d055ee9538e37515c
Author:     Viresh Kumar <viresh.kumar@linaro.org>
AuthorDate: Wed, 26 Jun 2019 10:36:30 +0530
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 25 Jul 2019 15:51:54 +0200

sched/fair: Fall back to sched-idle CPU if idle CPU isn't found

We try to find an idle CPU to run the next task, but in case we don't
find an idle CPU it is better to pick a CPU which will run the task the
soonest, for performance reason.

A CPU which isn't idle but has only SCHED_IDLE activity queued on it
should be a good target based on this criteria as any normal fair task
will most likely preempt the currently running SCHED_IDLE task
immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one
shall give better results as it should be able to run the task sooner
than an idle CPU (which requires to be woken up from an idle state).

This patch updates both fast and slow paths with this optimization.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: chris.redpath@arm.com
Cc: quentin.perret@linaro.org
Cc: songliubraving@fb.com
Cc: steven.sistare@oracle.com
Cc: subhra.mazumdar@oracle.com
Cc: tkjos@google.com
Link: https://lkml.kernel.org/r/eeafa25fdeb6f6edd5b2da716bc8f0ba7708cbcf.1561523542.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++----------
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9ed5ab53872f..52564e050062 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5431,6 +5431,15 @@ static struct {
 
 #endif /* CONFIG_NO_HZ_COMMON */
 
+/* CPU only has SCHED_IDLE tasks enqueued */
+static int sched_idle_cpu(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&
+			rq->nr_running);
+}
+
 static unsigned long cpu_runnable_load(struct rq *rq)
 {
 	return cfs_rq_runnable_load_avg(&rq->cfs);
@@ -5753,7 +5762,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 	unsigned int min_exit_latency = UINT_MAX;
 	u64 latest_idle_timestamp = 0;
 	int least_loaded_cpu = this_cpu;
-	int shallowest_idle_cpu = -1;
+	int shallowest_idle_cpu = -1, si_cpu = -1;
 	int i;
 
 	/* Check if we have any choice: */
@@ -5784,7 +5793,12 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 				latest_idle_timestamp = rq->idle_stamp;
 				shallowest_idle_cpu = i;
 			}
-		} else if (shallowest_idle_cpu == -1) {
+		} else if (shallowest_idle_cpu == -1 && si_cpu == -1) {
+			if (sched_idle_cpu(i)) {
+				si_cpu = i;
+				continue;
+			}
+
 			load = cpu_runnable_load(cpu_rq(i));
 			if (load < min_load) {
 				min_load = load;
@@ -5793,7 +5807,11 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 		}
 	}
 
-	return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
+	if (shallowest_idle_cpu != -1)
+		return shallowest_idle_cpu;
+	if (si_cpu != -1)
+		return si_cpu;
+	return least_loaded_cpu;
 }
 
 static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p,
@@ -5946,7 +5964,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
  */
 static int select_idle_smt(struct task_struct *p, int target)
 {
-	int cpu;
+	int cpu, si_cpu = -1;
 
 	if (!static_branch_likely(&sched_smt_present))
 		return -1;
@@ -5956,9 +5974,11 @@ static int select_idle_smt(struct task_struct *p, int target)
 			continue;
 		if (available_idle_cpu(cpu))
 			return cpu;
+		if (si_cpu == -1 && sched_idle_cpu(cpu))
+			si_cpu = cpu;
 	}
 
-	return -1;
+	return si_cpu;
 }
 
 #else /* CONFIG_SCHED_SMT */
@@ -5986,8 +6006,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	u64 avg_cost, avg_idle;
 	u64 time, cost;
 	s64 delta;
-	int cpu, nr = INT_MAX;
 	int this = smp_processor_id();
+	int cpu, nr = INT_MAX, si_cpu = -1;
 
 	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
 	if (!this_sd)
@@ -6015,11 +6035,13 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 
 	for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
 		if (!--nr)
-			return -1;
+			return si_cpu;
 		if (!cpumask_test_cpu(cpu, p->cpus_ptr))
 			continue;
 		if (available_idle_cpu(cpu))
 			break;
+		if (si_cpu == -1 && sched_idle_cpu(cpu))
+			si_cpu = cpu;
 	}
 
 	time = cpu_clock(this) - time;
@@ -6038,13 +6060,14 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	struct sched_domain *sd;
 	int i, recent_used_cpu;
 
-	if (available_idle_cpu(target))
+	if (available_idle_cpu(target) || sched_idle_cpu(target))
 		return target;
 
 	/*
 	 * If the previous CPU is cache affine and idle, don't be stupid:
 	 */
-	if (prev != target && cpus_share_cache(prev, target) && available_idle_cpu(prev))
+	if (prev != target && cpus_share_cache(prev, target) &&
+	    (available_idle_cpu(prev) || sched_idle_cpu(prev)))
 		return prev;
 
 	/* Check a recently used CPU as a potential idle candidate: */
@@ -6052,7 +6075,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (recent_used_cpu != prev &&
 	    recent_used_cpu != target &&
 	    cpus_share_cache(recent_used_cpu, target) &&
-	    available_idle_cpu(recent_used_cpu) &&
+	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
 	    cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr)) {
 		/*
 		 * Replace recent_used_cpu with prev as it is a potential

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs
  2019-06-26  5:06 [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs Viresh Kumar
                   ` (2 preceding siblings ...)
  2019-07-01 13:43 ` [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs Peter Zijlstra
@ 2019-12-09  3:50 ` Wanpeng Li
  2019-12-10  6:33   ` Viresh Kumar
  3 siblings, 1 reply; 15+ messages in thread
From: Wanpeng Li @ 2019-12-09  3:50 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Vincent Guittot, Todd Kjos,
	Daniel Lezcano, quentin.perret, chris.redpath, steven.sistare,
	subhra.mazumdar, songliubraving

On Wed, 26 Jun 2019 at 13:07, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> Hi,
>
> We try to find an idle CPU to run the next task, but in case we don't
> find an idle CPU it is better to pick a CPU which will run the task the
> soonest, for performance reason.
>
> A CPU which isn't idle but has only SCHED_IDLE activity queued on it
> should be a good target based on this criteria as any normal fair task
> will most likely preempt the currently running SCHED_IDLE task
> immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one
> shall give better results as it should be able to run the task sooner
> than an idle CPU (which requires to be woken up from an idle state).
>
> This patchset updates both fast and slow paths with this optimization.
>
> Testing is done with the help of rt-app currently and here are the
> details:
>
> - Tested on Octacore Hikey platform (all CPUs change frequency
>   together).
>
> - rt-app json [1] creates few tasks and we monitor the scheduling
>   latency for them by looking at "wu_lat" field (usec).
>
> - The histograms are created using
>   https://github.com/adkein/textogram: textogram -a 0 -z 1000 -n 10
>
> - the stats are accumulated using: https://github.com/nferraz/st

Hi Viresh,

Thanks for the great work! Could you give the whole commad-line for us testing?

    Wanpeng

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs
  2019-12-09  3:50 ` Wanpeng Li
@ 2019-12-10  6:33   ` Viresh Kumar
  2019-12-10 11:15     ` Wanpeng Li
  0 siblings, 1 reply; 15+ messages in thread
From: Viresh Kumar @ 2019-12-10  6:33 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Vincent Guittot, Todd Kjos,
	Daniel Lezcano, quentin.perret, chris.redpath, steven.sistare,
	subhra.mazumdar, songliubraving

On 09-12-19, 11:50, Wanpeng Li wrote:
> On Wed, 26 Jun 2019 at 13:07, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > Hi,
> >
> > We try to find an idle CPU to run the next task, but in case we don't
> > find an idle CPU it is better to pick a CPU which will run the task the
> > soonest, for performance reason.
> >
> > A CPU which isn't idle but has only SCHED_IDLE activity queued on it
> > should be a good target based on this criteria as any normal fair task
> > will most likely preempt the currently running SCHED_IDLE task
> > immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one
> > shall give better results as it should be able to run the task sooner
> > than an idle CPU (which requires to be woken up from an idle state).
> >
> > This patchset updates both fast and slow paths with this optimization.
> >
> > Testing is done with the help of rt-app currently and here are the
> > details:
> >
> > - Tested on Octacore Hikey platform (all CPUs change frequency
> >   together).
> >
> > - rt-app json [1] creates few tasks and we monitor the scheduling
> >   latency for them by looking at "wu_lat" field (usec).
> >
> > - The histograms are created using
> >   https://github.com/adkein/textogram: textogram -a 0 -z 1000 -n 10
> >
> > - the stats are accumulated using: https://github.com/nferraz/st
> 
> Hi Viresh,
> 
> Thanks for the great work! Could you give the whole commad-line for us testing?

The rt-app json [1] can be run using:

$ rt-app sched-idle.json

This will create couple of files named rt-app-cfs_thread-*.log and
rt-app-idle_thread-*.log. I looked mostly at the cfs files here as that's what
we were looking for. We will be interested only in the last column of these
files, which says "wu_lat". Remove everything apart from that column (and remove
the first row as well, which has field names) from all cfs files (or write a
sed/awk command to do it for you.

After that I you can generate the numbers (mean/max/min/etc) using:

$ st rt-app-cfs_thread-*.log

-- 
viresh

[1] https://pastebin.com/TMHGGBxD

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs
  2019-12-10  6:33   ` Viresh Kumar
@ 2019-12-10 11:15     ` Wanpeng Li
  0 siblings, 0 replies; 15+ messages in thread
From: Wanpeng Li @ 2019-12-10 11:15 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Vincent Guittot, Todd Kjos,
	Daniel Lezcano, quentin.perret, chris.redpath, steven.sistare,
	subhra.mazumdar, songliubraving

On Tue, 10 Dec 2019 at 14:33, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 09-12-19, 11:50, Wanpeng Li wrote:
> > On Wed, 26 Jun 2019 at 13:07, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > Hi,
> > >
> > > We try to find an idle CPU to run the next task, but in case we don't
> > > find an idle CPU it is better to pick a CPU which will run the task the
> > > soonest, for performance reason.
> > >
> > > A CPU which isn't idle but has only SCHED_IDLE activity queued on it
> > > should be a good target based on this criteria as any normal fair task
> > > will most likely preempt the currently running SCHED_IDLE task
> > > immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one
> > > shall give better results as it should be able to run the task sooner
> > > than an idle CPU (which requires to be woken up from an idle state).
> > >
> > > This patchset updates both fast and slow paths with this optimization.
> > >
> > > Testing is done with the help of rt-app currently and here are the
> > > details:
> > >
> > > - Tested on Octacore Hikey platform (all CPUs change frequency
> > >   together).
> > >
> > > - rt-app json [1] creates few tasks and we monitor the scheduling
> > >   latency for them by looking at "wu_lat" field (usec).
> > >
> > > - The histograms are created using
> > >   https://github.com/adkein/textogram: textogram -a 0 -z 1000 -n 10
> > >
> > > - the stats are accumulated using: https://github.com/nferraz/st
> >
> > Hi Viresh,
> >
> > Thanks for the great work! Could you give the whole commad-line for us testing?
>
> The rt-app json [1] can be run using:
>
> $ rt-app sched-idle.json
>
> This will create couple of files named rt-app-cfs_thread-*.log and
> rt-app-idle_thread-*.log. I looked mostly at the cfs files here as that's what
> we were looking for. We will be interested only in the last column of these
> files, which says "wu_lat". Remove everything apart from that column (and remove
> the first row as well, which has field names) from all cfs files (or write a
> sed/awk command to do it for you.
>
> After that I you can generate the numbers (mean/max/min/etc) using:
>
> $ st rt-app-cfs_thread-*.log

Thanks for pointing out this.

    Wanpeng

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-12-10 11:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-26  5:06 [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs Viresh Kumar
2019-06-26  5:06 ` [PATCH V3 1/2] sched: Start tracking SCHED_IDLE tasks count in cfs_rq Viresh Kumar
2019-07-25 16:15   ` [tip:sched/core] sched/fair: " tip-bot for Viresh Kumar
2019-06-26  5:06 ` [PATCH V3 2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found Viresh Kumar
2019-06-29  1:16   ` Subhra Mazumdar
2019-07-01  8:03     ` Viresh Kumar
2019-07-01 22:08       ` Subhra Mazumdar
2019-07-02  8:35         ` Peter Zijlstra
2019-07-02 16:32           ` Subhra Mazumdar
2019-07-25 16:16   ` [tip:sched/core] sched/fair: Fall back " tip-bot for Viresh Kumar
2019-07-01 13:43 ` [PATCH V3 0/2] sched/fair: Fallback to sched-idle CPU in absence of idle CPUs Peter Zijlstra
2019-07-03  9:13   ` Viresh Kumar
2019-12-09  3:50 ` Wanpeng Li
2019-12-10  6:33   ` Viresh Kumar
2019-12-10 11:15     ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).