linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] sched/core: Optimize load balance of core scheduling
@ 2022-06-28  7:57 Cruz Zhao
  2022-06-28  7:57 ` [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie Cruz Zhao
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Cruz Zhao @ 2022-06-28  7:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot
  Cc: linux-kernel

If the tasks with the same cookie are equal on SMT siblings, they can make
pairs when pick next task, and forceidle can be avoided.

In order to achieve this goal, we have to count how many tasks with this
cookie are in the runqueue. When counting we found a bug that task won't
enqueue into core tree when we update cookie of an uncookie'd task, so we
fix this bug first.

Cruz Zhao (3):
  sched/core: Fix the bug that task won't enqueue into core tree when
    update cookie
  sched/core: Introduce nr_running percpu for each cookie
  sched/core: Make tasks with the same cookie pairs on SMT siblings

 kernel/sched/core.c       |  7 +++++
 kernel/sched/core_sched.c | 18 ++++++------
 kernel/sched/fair.c       |  4 +--
 kernel/sched/sched.h      | 74 +++++++++++++++++++++++++++++++++++++++++++----
 4 files changed, 86 insertions(+), 17 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie
  2022-06-28  7:57 [PATCH 0/3] sched/core: Optimize load balance of core scheduling Cruz Zhao
@ 2022-06-28  7:57 ` Cruz Zhao
  2022-07-03 14:19   ` cruzzhao
                     ` (2 more replies)
  2022-06-28  7:57 ` [PATCH 2/3] sched/core: Introduce nr_running percpu for each cookie Cruz Zhao
  2022-06-28  7:57 ` [PATCH 3/3] sched/core: Make tasks with the same cookie pairs on SMT siblings Cruz Zhao
  2 siblings, 3 replies; 14+ messages in thread
From: Cruz Zhao @ 2022-06-28  7:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot
  Cc: linux-kernel

In function sched_core_update_cookie(), a task will enqueue into the
core tree only when it enqueued before, that is, if an uncookied task
is cookied, it will not enqueue into the core tree until it enqueue
again, which will result in unnecessary force idle.

Here follows the scenario:
  CPU x and CPU y are a pair of SMT siblings.
  1. Start task a running on CPU x without sleeping, and task b and
     task c running on CPU y without sleeping.
  2. We create a cookie and share it to task a and task b, and then
     we create another cookie and share it to task c.
  3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched

And we will find out that core_forceidle_sum of task a takes 30%
time of the sampling period, which shouldn't happen as task a and b
have the same cookie.

Then we migrate task a to CPU x', migrate task b and c to CPU y', where
CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we
will found out that core_forceidle_sum of task a and b are almost zero.

To solve this problem, we enqueue the task into the core tree if it's
on rq.

Fixes: 6e33cad0af49("sched: Trivial core scheduling cookie management")
Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
---
 kernel/sched/core_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
index 38a2cec..ba2466c 100644
--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -75,7 +75,7 @@ static unsigned long sched_core_update_cookie(struct task_struct *p,
 	old_cookie = p->core_cookie;
 	p->core_cookie = cookie;
 
-	if (enqueued)
+	if (task_on_rq_queued(p))
 		sched_core_enqueue(rq, p);
 
 	/*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] sched/core: Introduce nr_running percpu for each cookie
  2022-06-28  7:57 [PATCH 0/3] sched/core: Optimize load balance of core scheduling Cruz Zhao
  2022-06-28  7:57 ` [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie Cruz Zhao
@ 2022-06-28  7:57 ` Cruz Zhao
  2022-07-04  8:56   ` Peter Zijlstra
  2022-07-04  9:45   ` Peter Zijlstra
  2022-06-28  7:57 ` [PATCH 3/3] sched/core: Make tasks with the same cookie pairs on SMT siblings Cruz Zhao
  2 siblings, 2 replies; 14+ messages in thread
From: Cruz Zhao @ 2022-06-28  7:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot
  Cc: linux-kernel

Introduce a percpu count to struct sched_core_cookie, which indicates how
many tasks with this cookie in the runqueue of this cpu.

Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
---
 kernel/sched/core.c       |  7 +++++++
 kernel/sched/core_sched.c | 16 ++++++++--------
 kernel/sched/sched.h      |  9 +++++++++
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 263d764..9f71042 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -235,21 +235,28 @@ static inline int rb_sched_core_cmp(const void *key, const struct rb_node *node)
 
 void sched_core_enqueue(struct rq *rq, struct task_struct *p)
 {
+	struct sched_core_cookie *ck = (struct sched_core_cookie *)p->core_cookie;
+
 	rq->core->core_task_seq++;
 
 	if (!p->core_cookie)
 		return;
 
 	rb_add(&p->core_node, &rq->core_tree, rb_sched_core_less);
+
+	*per_cpu_ptr(ck->nr_running, rq->cpu) += 1;
 }
 
 void sched_core_dequeue(struct rq *rq, struct task_struct *p, int flags)
 {
+	struct sched_core_cookie *ck = (struct sched_core_cookie *)p->core_cookie;
+
 	rq->core->core_task_seq++;
 
 	if (sched_core_enqueued(p)) {
 		rb_erase(&p->core_node, &rq->core_tree);
 		RB_CLEAR_NODE(&p->core_node);
+		*per_cpu_ptr(ck->nr_running, rq->cpu) -= 1;
 	}
 
 	/*
diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
index ba2466c..65ab9fcb 100644
--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -1,20 +1,19 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
-/*
- * A simple wrapper around refcount. An allocated sched_core_cookie's
- * address is used to compute the cookie of the task.
- */
-struct sched_core_cookie {
-	refcount_t refcnt;
-};
-
 static unsigned long sched_core_alloc_cookie(void)
 {
 	struct sched_core_cookie *ck = kmalloc(sizeof(*ck), GFP_KERNEL);
+	int cpu;
+
 	if (!ck)
 		return 0;
 
 	refcount_set(&ck->refcnt, 1);
+
+	ck->nr_running = alloc_percpu(unsigned int);
+	for_each_possible_cpu(cpu)
+		*per_cpu_ptr(ck->nr_running, cpu) = 0;
+
 	sched_core_get();
 
 	return (unsigned long)ck;
@@ -25,6 +24,7 @@ static void sched_core_put_cookie(unsigned long cookie)
 	struct sched_core_cookie *ptr = (void *)cookie;
 
 	if (ptr && refcount_dec_and_test(&ptr->refcnt)) {
+		free_percpu(ptr->nr_running);
 		kfree(ptr);
 		sched_core_put();
 	}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5b14b6b..d852c67 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1186,6 +1186,15 @@ static inline raw_spinlock_t *__rq_lockp(struct rq *rq)
 bool cfs_prio_less(struct task_struct *a, struct task_struct *b, bool fi);
 
 /*
+ * A simple wrapper around refcount. An allocated sched_core_cookie's
+ * address is used to compute the cookie of the task.
+ */
+struct sched_core_cookie {
+	refcount_t refcnt;
+	unsigned int __percpu *nr_running;
+};
+
+/*
  * Helpers to check if the CPU's core cookie matches with the task's cookie
  * when core scheduling is enabled.
  * A special case is that the task's cookie always matches with CPU's core
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] sched/core: Make tasks with the same cookie pairs on SMT siblings
  2022-06-28  7:57 [PATCH 0/3] sched/core: Optimize load balance of core scheduling Cruz Zhao
  2022-06-28  7:57 ` [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie Cruz Zhao
  2022-06-28  7:57 ` [PATCH 2/3] sched/core: Introduce nr_running percpu for each cookie Cruz Zhao
@ 2022-06-28  7:57 ` Cruz Zhao
  2022-07-04  9:43   ` Peter Zijlstra
  2 siblings, 1 reply; 14+ messages in thread
From: Cruz Zhao @ 2022-06-28  7:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot
  Cc: linux-kernel

If the number of tasks in the runqueue of SMT siblings are equal, we call
the core balanced, otherwise unbalanced. If the core is balanced, everytime
we pick next task, we can pick a pair of tasks with the same cookie for
each SMT sibling, and forceidle will be avoided.

 - Migrate the task if source core and destination core can balance
     If ck->nr_running of src_cpu is the highest among the source core, and
     ck->nr_running of dst_cpu is the lowest among the destination core,
     migrate the task.

 - Select cookie matched idle CPU or idle CPU with the lowest
   ck->nr_running among the core
     In the fast path of task wakeup, if ck->nr_running of the cpu is the
     lowest among the core, we can select this cpu to wake up.

 - Find cookie matched idlest CPU or cookie matched CPU with the lowest
   ck->nr_running among the core
     In the slow path of task wakeup, if ck->nr_running of the cpu is the
     lowest among the core, we can select this cpu to wake up.

Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
---
 kernel/sched/fair.c  |  4 ++--
 kernel/sched/sched.h | 65 +++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 61 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 78795a9..c18a716 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6096,7 +6096,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 	for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
 		struct rq *rq = cpu_rq(i);
 
-		if (!sched_core_cookie_match(rq, p))
+		if (!sched_core_cookie_match(NULL, rq, p))
 			continue;
 
 		if (sched_idle_cpu(i))
@@ -7681,7 +7681,7 @@ static int task_hot(struct task_struct *p, struct lb_env *env)
 	 * Don't migrate task if the task's cookie does not match
 	 * with the destination CPU's core cookie.
 	 */
-	if (!sched_core_cookie_match(cpu_rq(env->dst_cpu), p))
+	if (!(sched_core_cookie_match(env->src_rq, env->dst_rq, p)))
 		return 1;
 
 	if (sysctl_sched_migration_cost == 0)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d852c67..ee0e558 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1195,6 +1195,56 @@ struct sched_core_cookie {
 };
 
 /*
+ * When tasks with the same cookie can make pairs on SMT siblings, forceidle can be
+ * avoided a lot, so when wake up and load balance, we try to make and keep the pairs
+ * with the same cookie on SMT siblings.
+ */
+static inline bool
+sched_core_make_pair_balance(struct rq *src_rq, struct rq *dst_rq, struct task_struct *p)
+{
+	struct sched_core_cookie *ck = (struct sched_core_cookie *)p->core_cookie;
+	unsigned int src_cpu, dst_cpu, t;
+	unsigned int src_nr_running, dst_nr_running;
+
+	if (!ck)
+		return true;
+
+	/*
+	 * When load balance, if ck->nr_running on src_cpu is less than that on SMT
+	 * siblings, don't migrate the task.
+	 */
+	if (src_rq) {
+		if (!sched_core_enabled(src_rq))
+			return true;
+		src_cpu = cpu_of(src_rq);
+		src_nr_running = *per_cpu_ptr(ck->nr_running, src_cpu);
+		for_each_cpu(t, cpu_smt_mask(src_cpu)) {
+			if (t == src_cpu)
+				continue;
+			if (*per_cpu_ptr(ck->nr_running, t) >= src_nr_running)
+				return false;
+		}
+
+	}
+
+	/*
+	 * If task p can make pair the cookied task with p->core_cookie on the
+	 * dst core, we can wake up task p on dst_rq, or migrate it to dst_rq.
+	 */
+	dst_cpu = cpu_of(dst_rq);
+	dst_nr_running = *per_cpu_ptr(ck->nr_running, dst_cpu);
+	for_each_cpu(t, cpu_smt_mask(dst_cpu)) {
+		if (t == dst_cpu)
+			continue;
+		if (*per_cpu_ptr(ck->nr_running, t) <= dst_nr_running)
+			return false;
+	}
+
+	return true;
+}
+
+
+/*
  * Helpers to check if the CPU's core cookie matches with the task's cookie
  * when core scheduling is enabled.
  * A special case is that the task's cookie always matches with CPU's core
@@ -1206,19 +1256,21 @@ static inline bool sched_cpu_cookie_match(struct rq *rq, struct task_struct *p)
 	if (!sched_core_enabled(rq))
 		return true;
 
-	return rq->core->core_cookie == p->core_cookie;
+	return rq->core->core_cookie == p->core_cookie ||
+		sched_core_make_pair_balance(NULL, rq, p);
 }
 
-static inline bool sched_core_cookie_match(struct rq *rq, struct task_struct *p)
+static inline bool
+sched_core_cookie_match(struct rq *src_rq, struct rq *dst_rq, struct task_struct *p)
 {
 	bool idle_core = true;
 	int cpu;
 
 	/* Ignore cookie match if core scheduler is not enabled on the CPU. */
-	if (!sched_core_enabled(rq))
+	if (!sched_core_enabled(dst_rq))
 		return true;
 
-	for_each_cpu(cpu, cpu_smt_mask(cpu_of(rq))) {
+	for_each_cpu(cpu, cpu_smt_mask(cpu_of(dst_rq))) {
 		if (!available_idle_cpu(cpu)) {
 			idle_core = false;
 			break;
@@ -1229,7 +1281,8 @@ static inline bool sched_core_cookie_match(struct rq *rq, struct task_struct *p)
 	 * A CPU in an idle core is always the best choice for tasks with
 	 * cookies.
 	 */
-	return idle_core || rq->core->core_cookie == p->core_cookie;
+	return idle_core || dst_rq->core->core_cookie == p->core_cookie ||
+		sched_core_make_pair_balance(src_rq, dst_rq, p);
 }
 
 static inline bool sched_group_cookie_match(struct rq *rq,
@@ -1243,7 +1296,7 @@ static inline bool sched_group_cookie_match(struct rq *rq,
 		return true;
 
 	for_each_cpu_and(cpu, sched_group_span(group), p->cpus_ptr) {
-		if (sched_core_cookie_match(rq, p))
+		if (sched_core_cookie_match(NULL, rq, p))
 			return true;
 	}
 	return false;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie
  2022-06-28  7:57 ` [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie Cruz Zhao
@ 2022-07-03 14:19   ` cruzzhao
  2022-07-04  8:53   ` Peter Zijlstra
  2022-07-21  8:44   ` [tip: sched/core] " tip-bot2 for Cruz Zhao
  2 siblings, 0 replies; 14+ messages in thread
From: cruzzhao @ 2022-07-03 14:19 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot
  Cc: linux-kernel

Ping... This bug really affects the performance, and is this bugfix correct?

在 2022/6/28 下午3:57, Cruz Zhao 写道:
> In function sched_core_update_cookie(), a task will enqueue into the
> core tree only when it enqueued before, that is, if an uncookied task
> is cookied, it will not enqueue into the core tree until it enqueue
> again, which will result in unnecessary force idle.
> 
> Here follows the scenario:
>   CPU x and CPU y are a pair of SMT siblings.
>   1. Start task a running on CPU x without sleeping, and task b and
>      task c running on CPU y without sleeping.
>   2. We create a cookie and share it to task a and task b, and then
>      we create another cookie and share it to task c.
>   3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched
> 
> And we will find out that core_forceidle_sum of task a takes 30%
> time of the sampling period, which shouldn't happen as task a and b
> have the same cookie.
> 
> Then we migrate task a to CPU x', migrate task b and c to CPU y', where
> CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we
> will found out that core_forceidle_sum of task a and b are almost zero.
> 
> To solve this problem, we enqueue the task into the core tree if it's
> on rq.
> 
> Fixes: 6e33cad0af49("sched: Trivial core scheduling cookie management")
> Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
> ---
>  kernel/sched/core_sched.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
> index 38a2cec..ba2466c 100644
> --- a/kernel/sched/core_sched.c
> +++ b/kernel/sched/core_sched.c
> @@ -75,7 +75,7 @@ static unsigned long sched_core_update_cookie(struct task_struct *p,
>  	old_cookie = p->core_cookie;
>  	p->core_cookie = cookie;
>  
> -	if (enqueued)
> +	if (task_on_rq_queued(p))
>  		sched_core_enqueue(rq, p);
>  
>  	/*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie
  2022-06-28  7:57 ` [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie Cruz Zhao
  2022-07-03 14:19   ` cruzzhao
@ 2022-07-04  8:53   ` Peter Zijlstra
  2022-07-14 11:37     ` Peter Zijlstra
  2022-07-15  5:08     ` cruzzhao
  2022-07-21  8:44   ` [tip: sched/core] " tip-bot2 for Cruz Zhao
  2 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2022-07-04  8:53 UTC (permalink / raw)
  To: Cruz Zhao
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, linux-kernel

On Tue, Jun 28, 2022 at 03:57:23PM +0800, Cruz Zhao wrote:
> In function sched_core_update_cookie(), a task will enqueue into the
> core tree only when it enqueued before, that is, if an uncookied task
> is cookied, it will not enqueue into the core tree until it enqueue
> again, which will result in unnecessary force idle.
> 
> Here follows the scenario:
>   CPU x and CPU y are a pair of SMT siblings.
>   1. Start task a running on CPU x without sleeping, and task b and
>      task c running on CPU y without sleeping.
>   2. We create a cookie and share it to task a and task b, and then
>      we create another cookie and share it to task c.
>   3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched
> 
> And we will find out that core_forceidle_sum of task a takes 30%
> time of the sampling period, which shouldn't happen as task a and b
> have the same cookie.
> 
> Then we migrate task a to CPU x', migrate task b and c to CPU y', where
> CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we
> will found out that core_forceidle_sum of task a and b are almost zero.
> 
> To solve this problem, we enqueue the task into the core tree if it's
> on rq.
> 
> Fixes: 6e33cad0af49("sched: Trivial core scheduling cookie management")
> Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
> ---
>  kernel/sched/core_sched.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
> index 38a2cec..ba2466c 100644
> --- a/kernel/sched/core_sched.c
> +++ b/kernel/sched/core_sched.c
> @@ -75,7 +75,7 @@ static unsigned long sched_core_update_cookie(struct task_struct *p,
>  	old_cookie = p->core_cookie;
>  	p->core_cookie = cookie;
>  
> -	if (enqueued)
> +	if (task_on_rq_queued(p))
>  		sched_core_enqueue(rq, p);
>  
>  	/*

Yeah; I suppose that's true. However if we want to consider the
asymmetric case, we should be complete and also consider the case where
we clear the cookie.

And if you remove the second use of @enqueued, having that variable is
rather redudant, which then leaves me with something like this.

---
Subject: sched/core: Fix the bug that task won't enqueue into core tree when update cookie
From: Cruz Zhao <CruzZhao@linux.alibaba.com>
Date: Tue, 28 Jun 2022 15:57:23 +0800

From: Cruz Zhao <CruzZhao@linux.alibaba.com>

In function sched_core_update_cookie(), a task will enqueue into the
core tree only when it enqueued before, that is, if an uncookied task
is cookied, it will not enqueue into the core tree until it enqueue
again, which will result in unnecessary force idle.

Here follows the scenario:
  CPU x and CPU y are a pair of SMT siblings.
  1. Start task a running on CPU x without sleeping, and task b and
     task c running on CPU y without sleeping.
  2. We create a cookie and share it to task a and task b, and then
     we create another cookie and share it to task c.
  3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched

And we will find out that core_forceidle_sum of task a takes 30%
time of the sampling period, which shouldn't happen as task a and b
have the same cookie.

Then we migrate task a to CPU x', migrate task b and c to CPU y', where
CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we
will found out that core_forceidle_sum of task a and b are almost zero.

To solve this problem, we enqueue the task into the core tree if it's
on rq.

Fixes: 6e33cad0af49("sched: Trivial core scheduling cookie management")
Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1656403045-100840-2-git-send-email-CruzZhao@linux.alibaba.com
---
 kernel/sched/core_sched.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -56,7 +56,6 @@ static unsigned long sched_core_update_c
 	unsigned long old_cookie;
 	struct rq_flags rf;
 	struct rq *rq;
-	bool enqueued;
 
 	rq = task_rq_lock(p, &rf);
 
@@ -68,14 +67,16 @@ static unsigned long sched_core_update_c
 	 */
 	SCHED_WARN_ON((p->core_cookie || cookie) && !sched_core_enabled(rq));
 
-	enqueued = sched_core_enqueued(p);
-	if (enqueued)
+	if (sched_core_enqueued(p))
 		sched_core_dequeue(rq, p, DEQUEUE_SAVE);
 
 	old_cookie = p->core_cookie;
 	p->core_cookie = cookie;
 
-	if (enqueued)
+	/*
+	 * Consider the cases: !prev_cookie and !cookie.
+	 */
+	if (cookie && task_on_rq_queued(p))
 		sched_core_enqueue(rq, p);
 
 	/*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] sched/core: Introduce nr_running percpu for each cookie
  2022-06-28  7:57 ` [PATCH 2/3] sched/core: Introduce nr_running percpu for each cookie Cruz Zhao
@ 2022-07-04  8:56   ` Peter Zijlstra
  2022-07-04  9:45   ` Peter Zijlstra
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2022-07-04  8:56 UTC (permalink / raw)
  To: Cruz Zhao
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, linux-kernel

On Tue, Jun 28, 2022 at 03:57:24PM +0800, Cruz Zhao wrote:
> Introduce a percpu count to struct sched_core_cookie, which indicates how
> many tasks with this cookie in the runqueue of this cpu.

*why* ?!?

Changelog should always motivate why a change is done. The patch itself
mostly shows what it does, it doesn't explain why.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched/core: Make tasks with the same cookie pairs on SMT siblings
  2022-06-28  7:57 ` [PATCH 3/3] sched/core: Make tasks with the same cookie pairs on SMT siblings Cruz Zhao
@ 2022-07-04  9:43   ` Peter Zijlstra
  2022-07-06  8:03     ` cruzzhao
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2022-07-04  9:43 UTC (permalink / raw)
  To: Cruz Zhao
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, linux-kernel

On Tue, Jun 28, 2022 at 03:57:25PM +0800, Cruz Zhao wrote:
> If the number of tasks in the runqueue of SMT siblings are equal, we call
> the core balanced, otherwise unbalanced. If the core is balanced, everytime
> we pick next task, we can pick a pair of tasks with the same cookie for
> each SMT sibling, and forceidle will be avoided.
> 
>  - Migrate the task if source core and destination core can balance
>      If ck->nr_running of src_cpu is the highest among the source core, and
>      ck->nr_running of dst_cpu is the lowest among the destination core,
>      migrate the task.
> 
>  - Select cookie matched idle CPU or idle CPU with the lowest
>    ck->nr_running among the core
>      In the fast path of task wakeup, if ck->nr_running of the cpu is the
>      lowest among the core, we can select this cpu to wake up.
> 
>  - Find cookie matched idlest CPU or cookie matched CPU with the lowest
>    ck->nr_running among the core
>      In the slow path of task wakeup, if ck->nr_running of the cpu is the
>      lowest among the core, we can select this cpu to wake up.
> 
> Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
> ---
>  kernel/sched/fair.c  |  4 ++--
>  kernel/sched/sched.h | 65 +++++++++++++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 61 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 78795a9..c18a716 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6096,7 +6096,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>  	for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
>  		struct rq *rq = cpu_rq(i);
>  
> -		if (!sched_core_cookie_match(rq, p))
> +		if (!sched_core_cookie_match(NULL, rq, p))
>  			continue;
>  
>  		if (sched_idle_cpu(i))
> @@ -7681,7 +7681,7 @@ static int task_hot(struct task_struct *p, struct lb_env *env)
>  	 * Don't migrate task if the task's cookie does not match
>  	 * with the destination CPU's core cookie.
>  	 */
> -	if (!sched_core_cookie_match(cpu_rq(env->dst_cpu), p))
> +	if (!(sched_core_cookie_match(env->src_rq, env->dst_rq, p)))
>  		return 1;

superfluous () added.

>  
>  	if (sysctl_sched_migration_cost == 0)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index d852c67..ee0e558 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1195,6 +1195,56 @@ struct sched_core_cookie {
>  };
>  
>  /*
> + * When tasks with the same cookie can make pairs on SMT siblings, forceidle can be
> + * avoided a lot, so when wake up and load balance, we try to make and keep the pairs
> + * with the same cookie on SMT siblings.
> + */
> +static inline bool
> +sched_core_make_pair_balance(struct rq *src_rq, struct rq *dst_rq, struct task_struct *p)
> +{
> +	struct sched_core_cookie *ck = (struct sched_core_cookie *)p->core_cookie;
> +	unsigned int src_cpu, dst_cpu, t;
> +	unsigned int src_nr_running, dst_nr_running;
> +
> +	if (!ck)
> +		return true;
> +
> +	/*
> +	 * When load balance, if ck->nr_running on src_cpu is less than that on SMT
> +	 * siblings, don't migrate the task.
> +	 */
> +	if (src_rq) {
> +		if (!sched_core_enabled(src_rq))
> +			return true;
> +		src_cpu = cpu_of(src_rq);
> +		src_nr_running = *per_cpu_ptr(ck->nr_running, src_cpu);
> +		for_each_cpu(t, cpu_smt_mask(src_cpu)) {
> +			if (t == src_cpu)
> +				continue;
> +			if (*per_cpu_ptr(ck->nr_running, t) >= src_nr_running)
> +				return false;
> +		}
> +
> +	}
> +
> +	/*
> +	 * If task p can make pair the cookied task with p->core_cookie on the
> +	 * dst core, we can wake up task p on dst_rq, or migrate it to dst_rq.
> +	 */
> +	dst_cpu = cpu_of(dst_rq);
> +	dst_nr_running = *per_cpu_ptr(ck->nr_running, dst_cpu);
> +	for_each_cpu(t, cpu_smt_mask(dst_cpu)) {
> +		if (t == dst_cpu)
> +			continue;
> +		if (*per_cpu_ptr(ck->nr_running, t) <= dst_nr_running)
> +			return false;
> +	}
> +
> +	return true;
> +}

I don't see how this doesn't destroy regular load balancing.
Specifically the case where there are very few tasks of each cookie.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] sched/core: Introduce nr_running percpu for each cookie
  2022-06-28  7:57 ` [PATCH 2/3] sched/core: Introduce nr_running percpu for each cookie Cruz Zhao
  2022-07-04  8:56   ` Peter Zijlstra
@ 2022-07-04  9:45   ` Peter Zijlstra
  2022-07-06  7:45     ` cruzzhao
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2022-07-04  9:45 UTC (permalink / raw)
  To: Cruz Zhao
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, linux-kernel

On Tue, Jun 28, 2022 at 03:57:24PM +0800, Cruz Zhao wrote:

>  static unsigned long sched_core_alloc_cookie(void)
>  {
>  	struct sched_core_cookie *ck = kmalloc(sizeof(*ck), GFP_KERNEL);
> +	int cpu;
> +
>  	if (!ck)
>  		return 0;
>  
>  	refcount_set(&ck->refcnt, 1);
> +
> +	ck->nr_running = alloc_percpu(unsigned int);

	if (!ck->nr_running)
		// do something

> +	for_each_possible_cpu(cpu)
> +		*per_cpu_ptr(ck->nr_running, cpu) = 0;

So I really, as in *really* dislike how this blows up the size of
cookies. Esp. with 100s of CPUs not actually being rare these days.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] sched/core: Introduce nr_running percpu for each cookie
  2022-07-04  9:45   ` Peter Zijlstra
@ 2022-07-06  7:45     ` cruzzhao
  0 siblings, 0 replies; 14+ messages in thread
From: cruzzhao @ 2022-07-06  7:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, linux-kernel



在 2022/7/4 下午5:45, Peter Zijlstra 写道:
> On Tue, Jun 28, 2022 at 03:57:24PM +0800, Cruz Zhao wrote:
> 
>>  static unsigned long sched_core_alloc_cookie(void)
>>  {
>>  	struct sched_core_cookie *ck = kmalloc(sizeof(*ck), GFP_KERNEL);
>> +	int cpu;
>> +
>>  	if (!ck)
>>  		return 0;
>>  
>>  	refcount_set(&ck->refcnt, 1);
>> +
>> +	ck->nr_running = alloc_percpu(unsigned int);
> 
> 	if (!ck->nr_running)
> 		// do something
> 
>> +	for_each_possible_cpu(cpu)
>> +		*per_cpu_ptr(ck->nr_running, cpu) = 0;
> 
> So I really, as in *really* dislike how this blows up the size of
> cookies. Esp. with 100s of CPUs not actually being rare these days.

My idea is to get the distribution of cookie'd tasks on each runqueue
through ck->nr_running, so as to facilitate optimization of load
balance. Sorry for not stating this in the change log.

This does blow up the size of cookies in scenarios with a large number
of CPUs, and I'll try to get around this problem.

Many thanks for reviewing.
Best wishes,
Cruz Zhao

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched/core: Make tasks with the same cookie pairs on SMT siblings
  2022-07-04  9:43   ` Peter Zijlstra
@ 2022-07-06  8:03     ` cruzzhao
  0 siblings, 0 replies; 14+ messages in thread
From: cruzzhao @ 2022-07-06  8:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, linux-kernel



在 2022/7/4 下午5:43, Peter Zijlstra 写道:
> On Tue, Jun 28, 2022 at 03:57:25PM +0800, Cruz Zhao wrote:
>> If the number of tasks in the runqueue of SMT siblings are equal, we call
>> the core balanced, otherwise unbalanced. If the core is balanced, everytime
>> we pick next task, we can pick a pair of tasks with the same cookie for
>> each SMT sibling, and forceidle will be avoided.
>>
>>  - Migrate the task if source core and destination core can balance
>>      If ck->nr_running of src_cpu is the highest among the source core, and
>>      ck->nr_running of dst_cpu is the lowest among the destination core,
>>      migrate the task.
>>
>>  - Select cookie matched idle CPU or idle CPU with the lowest
>>    ck->nr_running among the core
>>      In the fast path of task wakeup, if ck->nr_running of the cpu is the
>>      lowest among the core, we can select this cpu to wake up.
>>
>>  - Find cookie matched idlest CPU or cookie matched CPU with the lowest
>>    ck->nr_running among the core
>>      In the slow path of task wakeup, if ck->nr_running of the cpu is the
>>      lowest among the core, we can select this cpu to wake up.
>>
>> Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
>> ---
>>  kernel/sched/fair.c  |  4 ++--
>>  kernel/sched/sched.h | 65 +++++++++++++++++++++++++++++++++++++++++++++++-----
>>  2 files changed, 61 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 78795a9..c18a716 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6096,7 +6096,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>>  	for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
>>  		struct rq *rq = cpu_rq(i);
>>  
>> -		if (!sched_core_cookie_match(rq, p))
>> +		if (!sched_core_cookie_match(NULL, rq, p))
>>  			continue;
>>  
>>  		if (sched_idle_cpu(i))
>> @@ -7681,7 +7681,7 @@ static int task_hot(struct task_struct *p, struct lb_env *env)
>>  	 * Don't migrate task if the task's cookie does not match
>>  	 * with the destination CPU's core cookie.
>>  	 */
>> -	if (!sched_core_cookie_match(cpu_rq(env->dst_cpu), p))
>> +	if (!(sched_core_cookie_match(env->src_rq, env->dst_rq, p)))
>>  		return 1;
> 
> superfluous () added.
> 
>>  
>>  	if (sysctl_sched_migration_cost == 0)
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index d852c67..ee0e558 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -1195,6 +1195,56 @@ struct sched_core_cookie {
>>  };
>>  
>>  /*
>> + * When tasks with the same cookie can make pairs on SMT siblings, forceidle can be
>> + * avoided a lot, so when wake up and load balance, we try to make and keep the pairs
>> + * with the same cookie on SMT siblings.
>> + */
>> +static inline bool
>> +sched_core_make_pair_balance(struct rq *src_rq, struct rq *dst_rq, struct task_struct *p)
>> +{
>> +	struct sched_core_cookie *ck = (struct sched_core_cookie *)p->core_cookie;
>> +	unsigned int src_cpu, dst_cpu, t;
>> +	unsigned int src_nr_running, dst_nr_running;
>> +
>> +	if (!ck)
>> +		return true;
>> +
>> +	/*
>> +	 * When load balance, if ck->nr_running on src_cpu is less than that on SMT
>> +	 * siblings, don't migrate the task.
>> +	 */
>> +	if (src_rq) {
>> +		if (!sched_core_enabled(src_rq))
>> +			return true;
>> +		src_cpu = cpu_of(src_rq);
>> +		src_nr_running = *per_cpu_ptr(ck->nr_running, src_cpu);
>> +		for_each_cpu(t, cpu_smt_mask(src_cpu)) {
>> +			if (t == src_cpu)
>> +				continue;
>> +			if (*per_cpu_ptr(ck->nr_running, t) >= src_nr_running)
>> +				return false;
>> +		}
>> +
>> +	}
>> +
>> +	/*
>> +	 * If task p can make pair the cookied task with p->core_cookie on the
>> +	 * dst core, we can wake up task p on dst_rq, or migrate it to dst_rq.
>> +	 */
>> +	dst_cpu = cpu_of(dst_rq);
>> +	dst_nr_running = *per_cpu_ptr(ck->nr_running, dst_cpu);
>> +	for_each_cpu(t, cpu_smt_mask(dst_cpu)) {
>> +		if (t == dst_cpu)
>> +			continue;
>> +		if (*per_cpu_ptr(ck->nr_running, t) <= dst_nr_running)
>> +			return false;
>> +	}
>> +
>> +	return true;
>> +}
> 
> I don't see how this doesn't destroy regular load balancing.
> Specifically the case where there are very few tasks of each cookie.

If sched_core_make_pair_balance() returns true, we choose cpu_of(dst_rq)
as backup, and in the case of there is no idle or cookie matched CPU, we
choose backup to wakeup or migrate to. If there are more than one backup
CPU, we choose the most unbalanced one. Is this solution more reasonable?

Many thanks for reviewing.
Best wishes,
Cruz Zhao

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie
  2022-07-04  8:53   ` Peter Zijlstra
@ 2022-07-14 11:37     ` Peter Zijlstra
  2022-07-15  5:08     ` cruzzhao
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2022-07-14 11:37 UTC (permalink / raw)
  To: Cruz Zhao
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, linux-kernel

On Mon, Jul 04, 2022 at 10:53:20AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 28, 2022 at 03:57:23PM +0800, Cruz Zhao wrote:
> > In function sched_core_update_cookie(), a task will enqueue into the
> > core tree only when it enqueued before, that is, if an uncookied task
> > is cookied, it will not enqueue into the core tree until it enqueue
> > again, which will result in unnecessary force idle.
> > 
> > Here follows the scenario:
> >   CPU x and CPU y are a pair of SMT siblings.
> >   1. Start task a running on CPU x without sleeping, and task b and
> >      task c running on CPU y without sleeping.
> >   2. We create a cookie and share it to task a and task b, and then
> >      we create another cookie and share it to task c.
> >   3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched
> > 
> > And we will find out that core_forceidle_sum of task a takes 30%
> > time of the sampling period, which shouldn't happen as task a and b
> > have the same cookie.
> > 
> > Then we migrate task a to CPU x', migrate task b and c to CPU y', where
> > CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we
> > will found out that core_forceidle_sum of task a and b are almost zero.
> > 
> > To solve this problem, we enqueue the task into the core tree if it's
> > on rq.
> > 
> > Fixes: 6e33cad0af49("sched: Trivial core scheduling cookie management")
> > Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
> > ---
> >  kernel/sched/core_sched.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
> > index 38a2cec..ba2466c 100644
> > --- a/kernel/sched/core_sched.c
> > +++ b/kernel/sched/core_sched.c
> > @@ -75,7 +75,7 @@ static unsigned long sched_core_update_cookie(struct task_struct *p,
> >  	old_cookie = p->core_cookie;
> >  	p->core_cookie = cookie;
> >  
> > -	if (enqueued)
> > +	if (task_on_rq_queued(p))
> >  		sched_core_enqueue(rq, p);
> >  
> >  	/*
> 
> Yeah; I suppose that's true. However if we want to consider the
> asymmetric case, we should be complete and also consider the case where
> we clear the cookie.
> 
> And if you remove the second use of @enqueued, having that variable is
> rather redudant, which then leaves me with something like this.

Can you please confirm the below works for you so I can queue it?

> ---
> Subject: sched/core: Fix the bug that task won't enqueue into core tree when update cookie
> From: Cruz Zhao <CruzZhao@linux.alibaba.com>
> Date: Tue, 28 Jun 2022 15:57:23 +0800
> 
> From: Cruz Zhao <CruzZhao@linux.alibaba.com>
> 
> In function sched_core_update_cookie(), a task will enqueue into the
> core tree only when it enqueued before, that is, if an uncookied task
> is cookied, it will not enqueue into the core tree until it enqueue
> again, which will result in unnecessary force idle.
> 
> Here follows the scenario:
>   CPU x and CPU y are a pair of SMT siblings.
>   1. Start task a running on CPU x without sleeping, and task b and
>      task c running on CPU y without sleeping.
>   2. We create a cookie and share it to task a and task b, and then
>      we create another cookie and share it to task c.
>   3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched
> 
> And we will find out that core_forceidle_sum of task a takes 30%
> time of the sampling period, which shouldn't happen as task a and b
> have the same cookie.
> 
> Then we migrate task a to CPU x', migrate task b and c to CPU y', where
> CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we
> will found out that core_forceidle_sum of task a and b are almost zero.
> 
> To solve this problem, we enqueue the task into the core tree if it's
> on rq.
> 
> Fixes: 6e33cad0af49("sched: Trivial core scheduling cookie management")
> Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/1656403045-100840-2-git-send-email-CruzZhao@linux.alibaba.com
> ---
>  kernel/sched/core_sched.c |    9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> --- a/kernel/sched/core_sched.c
> +++ b/kernel/sched/core_sched.c
> @@ -56,7 +56,6 @@ static unsigned long sched_core_update_c
>  	unsigned long old_cookie;
>  	struct rq_flags rf;
>  	struct rq *rq;
> -	bool enqueued;
>  
>  	rq = task_rq_lock(p, &rf);
>  
> @@ -68,14 +67,16 @@ static unsigned long sched_core_update_c
>  	 */
>  	SCHED_WARN_ON((p->core_cookie || cookie) && !sched_core_enabled(rq));
>  
> -	enqueued = sched_core_enqueued(p);
> -	if (enqueued)
> +	if (sched_core_enqueued(p))
>  		sched_core_dequeue(rq, p, DEQUEUE_SAVE);
>  
>  	old_cookie = p->core_cookie;
>  	p->core_cookie = cookie;
>  
> -	if (enqueued)
> +	/*
> +	 * Consider the cases: !prev_cookie and !cookie.
> +	 */
> +	if (cookie && task_on_rq_queued(p))
>  		sched_core_enqueue(rq, p);
>  
>  	/*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie
  2022-07-04  8:53   ` Peter Zijlstra
  2022-07-14 11:37     ` Peter Zijlstra
@ 2022-07-15  5:08     ` cruzzhao
  1 sibling, 0 replies; 14+ messages in thread
From: cruzzhao @ 2022-07-15  5:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, linux-kernel



在 2022/7/4 下午4:53, Peter Zijlstra 写道:

> Subject: sched/core: Fix the bug that task won't enqueue into core tree when update cookie
> From: Cruz Zhao <CruzZhao@linux.alibaba.com>
> Date: Tue, 28 Jun 2022 15:57:23 +0800
> 
> From: Cruz Zhao <CruzZhao@linux.alibaba.com>
> 
> In function sched_core_update_cookie(), a task will enqueue into the
> core tree only when it enqueued before, that is, if an uncookied task
> is cookied, it will not enqueue into the core tree until it enqueue
> again, which will result in unnecessary force idle.
> 
> Here follows the scenario:
>   CPU x and CPU y are a pair of SMT siblings.
>   1. Start task a running on CPU x without sleeping, and task b and
>      task c running on CPU y without sleeping.
>   2. We create a cookie and share it to task a and task b, and then
>      we create another cookie and share it to task c.
>   3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched
> 
> And we will find out that core_forceidle_sum of task a takes 30%
> time of the sampling period, which shouldn't happen as task a and b
> have the same cookie.
> 
> Then we migrate task a to CPU x', migrate task b and c to CPU y', where
> CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we
> will found out that core_forceidle_sum of task a and b are almost zero.
> 
> To solve this problem, we enqueue the task into the core tree if it's
> on rq.
> 
> Fixes: 6e33cad0af49("sched: Trivial core scheduling cookie management")
> Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/1656403045-100840-2-git-send-email-CruzZhao@linux.alibaba.com
> ---
>  kernel/sched/core_sched.c |    9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> --- a/kernel/sched/core_sched.c
> +++ b/kernel/sched/core_sched.c
> @@ -56,7 +56,6 @@ static unsigned long sched_core_update_c
>  	unsigned long old_cookie;
>  	struct rq_flags rf;
>  	struct rq *rq;
> -	bool enqueued;
>  
>  	rq = task_rq_lock(p, &rf);
>  
> @@ -68,14 +67,16 @@ static unsigned long sched_core_update_c
>  	 */
>  	SCHED_WARN_ON((p->core_cookie || cookie) && !sched_core_enabled(rq));
>  
> -	enqueued = sched_core_enqueued(p);
> -	if (enqueued)
> +	if (sched_core_enqueued(p))
>  		sched_core_dequeue(rq, p, DEQUEUE_SAVE);
>  
>  	old_cookie = p->core_cookie;
>  	p->core_cookie = cookie;
>  
> -	if (enqueued)
> +	/*
> +	 * Consider the cases: !prev_cookie and !cookie.
> +	 */
> +	if (cookie && task_on_rq_queued(p))
>  		sched_core_enqueue(rq, p);
>  
>  	/*
LGTM.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip: sched/core] sched/core: Fix the bug that task won't enqueue into core tree when update cookie
  2022-06-28  7:57 ` [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie Cruz Zhao
  2022-07-03 14:19   ` cruzzhao
  2022-07-04  8:53   ` Peter Zijlstra
@ 2022-07-21  8:44   ` tip-bot2 for Cruz Zhao
  2 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Cruz Zhao @ 2022-07-21  8:44 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Cruz Zhao, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     91caa5ae242465c3ab9fd473e50170faa7e944f4
Gitweb:        https://git.kernel.org/tip/91caa5ae242465c3ab9fd473e50170faa7e944f4
Author:        Cruz Zhao <CruzZhao@linux.alibaba.com>
AuthorDate:    Tue, 28 Jun 2022 15:57:23 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 21 Jul 2022 10:39:39 +02:00

sched/core: Fix the bug that task won't enqueue into core tree when update cookie

In function sched_core_update_cookie(), a task will enqueue into the
core tree only when it enqueued before, that is, if an uncookied task
is cookied, it will not enqueue into the core tree until it enqueue
again, which will result in unnecessary force idle.

Here follows the scenario:
  CPU x and CPU y are a pair of SMT siblings.
  1. Start task a running on CPU x without sleeping, and task b and
     task c running on CPU y without sleeping.
  2. We create a cookie and share it to task a and task b, and then
     we create another cookie and share it to task c.
  3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched

And we will find out that core_forceidle_sum of task a takes 30%
time of the sampling period, which shouldn't happen as task a and b
have the same cookie.

Then we migrate task a to CPU x', migrate task b and c to CPU y', where
CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we
will found out that core_forceidle_sum of task a and b are almost zero.

To solve this problem, we enqueue the task into the core tree if it's
on rq.

Fixes: 6e33cad0af49("sched: Trivial core scheduling cookie management")
Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1656403045-100840-2-git-send-email-CruzZhao@linux.alibaba.com
---
 kernel/sched/core_sched.c |  9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
index 5103502..93878cb 100644
--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -56,7 +56,6 @@ static unsigned long sched_core_update_cookie(struct task_struct *p,
 	unsigned long old_cookie;
 	struct rq_flags rf;
 	struct rq *rq;
-	bool enqueued;
 
 	rq = task_rq_lock(p, &rf);
 
@@ -68,14 +67,16 @@ static unsigned long sched_core_update_cookie(struct task_struct *p,
 	 */
 	SCHED_WARN_ON((p->core_cookie || cookie) && !sched_core_enabled(rq));
 
-	enqueued = sched_core_enqueued(p);
-	if (enqueued)
+	if (sched_core_enqueued(p))
 		sched_core_dequeue(rq, p, DEQUEUE_SAVE);
 
 	old_cookie = p->core_cookie;
 	p->core_cookie = cookie;
 
-	if (enqueued)
+	/*
+	 * Consider the cases: !prev_cookie and !cookie.
+	 */
+	if (cookie && task_on_rq_queued(p))
 		sched_core_enqueue(rq, p);
 
 	/*

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-07-21  8:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-28  7:57 [PATCH 0/3] sched/core: Optimize load balance of core scheduling Cruz Zhao
2022-06-28  7:57 ` [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie Cruz Zhao
2022-07-03 14:19   ` cruzzhao
2022-07-04  8:53   ` Peter Zijlstra
2022-07-14 11:37     ` Peter Zijlstra
2022-07-15  5:08     ` cruzzhao
2022-07-21  8:44   ` [tip: sched/core] " tip-bot2 for Cruz Zhao
2022-06-28  7:57 ` [PATCH 2/3] sched/core: Introduce nr_running percpu for each cookie Cruz Zhao
2022-07-04  8:56   ` Peter Zijlstra
2022-07-04  9:45   ` Peter Zijlstra
2022-07-06  7:45     ` cruzzhao
2022-06-28  7:57 ` [PATCH 3/3] sched/core: Make tasks with the same cookie pairs on SMT siblings Cruz Zhao
2022-07-04  9:43   ` Peter Zijlstra
2022-07-06  8:03     ` cruzzhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).