[PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup
@ 2022-08-01  4:27 Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 01/10] sched/fair: maintain task se depth in set_task_rq() Chengming Zhou
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-01  4:27 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

Hi all,

This patch series is optimization and cleanup for task load tracking
when task migrate CPU/cgroup or switched_from/to_fair().

There are three types of detach/attach_entity_load_avg (except fork and
exit case) for a fair task:
1. task migrate CPU (on_rq migrate or wake_up migrate)
2. task migrate cgroup (detach and attach)
3. task switched_from/to_fair (detach later attach)

patch 01-03 cleanup the task change cgroup case by remove cpu_cgrp_subsys->fork(),
since we already do the same thing in sched_cgroup_fork().

patch 05 optimize the task migrate CPU case by combine detach into dequeue.

patch 06 fix another detach on unattached task case which has been woken up
by try_to_wake_up() but is waiting for actually being woken up by
sched_ttwu_pending().

patch 07 remove unnecessary limitation that we would fail when change
cgroup of forked task which hasn't been woken up by wake_up_new_task().

patch 08 refactor detach/attach_entity_cfs_rq by using update_load_avg()
DO_DETACH and DO_ATTACH flags.

patch 09-10 optimize post_init_entity_util_avg() for fair task and skip
setting util_avg and runnable_avg for !fair task.

Thanks!

Changes in v3:
 - One big change is this series don't freeze PELT sum/avg values to be
   used as initial values when re-entering fair any more, since these
   PELT values become much less relevant.
 - Reorder patches and collect tags from Vincent and Dietmar. Thanks!
 - Fix detach on unattached task which has been woken up by try_to_wake_up()
   but is waiting for actually being woken up by sched_ttwu_pending().
 - Delete TASK_NEW which limit forked task from changing cgroup.
 - Don't init util_avg and runnable_avg for !fair taks at fork time.

Changes in v2:
 - split task se depth maintenance into a separate patch3, suggested
   by Peter.
 - reorder patch6-7 before patch8-9, since we need update_load_avg()
   to do conditional attach/detach to avoid corner cases like twice
   attach problem.

Chengming Zhou (10):
  sched/fair: maintain task se depth in set_task_rq()
  sched/fair: remove redundant cpu_cgrp_subsys->fork()
  sched/fair: reset sched_avg last_update_time before set_task_rq()
  sched/fair: update comments in enqueue/dequeue_entity()
  sched/fair: combine detach into dequeue when migrating task
  sched/fair: fix another detach on unattached task corner case
  sched/fair: allow changing cgroup of new forked task
  sched/fair: refactor detach/attach_entity_cfs_rq using
    update_load_avg()
  sched/fair: defer task sched_avg attach to enqueue_entity()
  sched/fair: don't init util/runnable_avg for !fair task

 include/linux/sched.h |   5 +-
 kernel/sched/core.c   |  57 ++--------
 kernel/sched/fair.c   | 242 ++++++++++++++++++------------------------
 kernel/sched/sched.h  |   6 +-
 4 files changed, 119 insertions(+), 191 deletions(-)

-- 
2.36.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 01/10] sched/fair: maintain task se depth in set_task_rq()
  2022-08-01  4:27 [PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup Chengming Zhou
@ 2022-08-01  4:27 ` Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 02/10] sched/fair: remove redundant cpu_cgrp_subsys->fork() Chengming Zhou
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-01  4:27 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

Previously we only maintain task se depth in task_move_group_fair(),
if a !fair task change task group, its se depth will not be updated,
so commit eb7a59b2c888 ("sched/fair: Reset se-depth when task switched to FAIR")
fix the problem by updating se depth in switched_to_fair() too.

Then commit daa59407b558 ("sched/fair: Unify switched_{from,to}_fair()
and task_move_group_fair()") unified these two functions, moved se.depth
setting to attach_task_cfs_rq(), which further into attach_entity_cfs_rq()
with commit df217913e72e ("sched/fair: Factorize attach/detach entity").

This patch move task se depth maintenance from attach_entity_cfs_rq()
to set_task_rq(), which will be called when CPU/cgroup change, so its
depth will always be correct.

This patch is preparation for the next patch.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c  | 8 --------
 kernel/sched/sched.h | 1 +
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2fc47257ae91..77cd2bad17a8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11566,14 +11566,6 @@ static void attach_entity_cfs_rq(struct sched_entity *se)
 {
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
-#ifdef CONFIG_FAIR_GROUP_SCHED
-	/*
-	 * Since the real-depth could have been changed (only FAIR
-	 * class maintain depth value), reset depth properly.
-	 */
-	se->depth = se->parent ? se->parent->depth + 1 : 0;
-#endif
-
 	/* Synchronize entity with its cfs_rq */
 	update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
 	attach_entity_load_avg(cfs_rq, se);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index aad7f5ee9666..8cc3eb7b86cd 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1940,6 +1940,7 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
 	set_task_rq_fair(&p->se, p->se.cfs_rq, tg->cfs_rq[cpu]);
 	p->se.cfs_rq = tg->cfs_rq[cpu];
 	p->se.parent = tg->se[cpu];
+	p->se.depth = tg->se[cpu] ? tg->se[cpu]->depth + 1 : 0;
 #endif
 
 #ifdef CONFIG_RT_GROUP_SCHED
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 02/10] sched/fair: remove redundant cpu_cgrp_subsys->fork()
  2022-08-01  4:27 [PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 01/10] sched/fair: maintain task se depth in set_task_rq() Chengming Zhou
@ 2022-08-01  4:27 ` Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 03/10] sched/fair: reset sched_avg last_update_time before set_task_rq() Chengming Zhou
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-01  4:27 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

We use cpu_cgrp_subsys->fork() to set task group for the new fair task
in cgroup_post_fork().

Since commit b1e8206582f9 ("sched: Fix yet more sched_fork() races")
has already set_task_rq() for the new fair task in sched_cgroup_fork(),
so cpu_cgrp_subsys->fork() can be removed.

  cgroup_can_fork()	--> pin parent's sched_task_group
  sched_cgroup_fork()
    __set_task_cpu()
      set_task_rq()
  cgroup_post_fork()
    ss->fork() := cpu_cgroup_fork()
      sched_change_group(..., TASK_SET_GROUP)
        task_set_group_fair()
          set_task_rq()  --> can be removed

After this patch's change, task_change_group_fair() only need to
care about task cgroup migration, make the code much simplier.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/core.c  | 27 ++++-----------------------
 kernel/sched/fair.c  | 23 +----------------------
 kernel/sched/sched.h |  5 +----
 3 files changed, 6 insertions(+), 49 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5555e49c4e12..614d7180c99e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -481,8 +481,7 @@ sched_core_dequeue(struct rq *rq, struct task_struct *p, int flags) { }
  *				p->se.load, p->rt_priority,
  *				p->dl.dl_{runtime, deadline, period, flags, bw, density}
  *  - sched_setnuma():		p->numa_preferred_nid
- *  - sched_move_task()/
- *    cpu_cgroup_fork():	p->sched_task_group
+ *  - sched_move_task():	p->sched_task_group
  *  - uclamp_update_active()	p->uclamp*
  *
  * p->state <- TASK_*:
@@ -10125,7 +10124,7 @@ void sched_release_group(struct task_group *tg)
 	spin_unlock_irqrestore(&task_group_lock, flags);
 }
 
-static void sched_change_group(struct task_struct *tsk, int type)
+static void sched_change_group(struct task_struct *tsk)
 {
 	struct task_group *tg;
 
@@ -10141,7 +10140,7 @@ static void sched_change_group(struct task_struct *tsk, int type)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	if (tsk->sched_class->task_change_group)
-		tsk->sched_class->task_change_group(tsk, type);
+		tsk->sched_class->task_change_group(tsk);
 	else
 #endif
 		set_task_rq(tsk, task_cpu(tsk));
@@ -10172,7 +10171,7 @@ void sched_move_task(struct task_struct *tsk)
 	if (running)
 		put_prev_task(rq, tsk);
 
-	sched_change_group(tsk, TASK_MOVE_GROUP);
+	sched_change_group(tsk);
 
 	if (queued)
 		enqueue_task(rq, tsk, queue_flags);
@@ -10250,23 +10249,6 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
 	sched_unregister_group(tg);
 }
 
-/*
- * This is called before wake_up_new_task(), therefore we really only
- * have to set its group bits, all the other stuff does not apply.
- */
-static void cpu_cgroup_fork(struct task_struct *task)
-{
-	struct rq_flags rf;
-	struct rq *rq;
-
-	rq = task_rq_lock(task, &rf);
-
-	update_rq_clock(rq);
-	sched_change_group(task, TASK_SET_GROUP);
-
-	task_rq_unlock(rq, task, &rf);
-}
-
 static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
 {
 	struct task_struct *task;
@@ -11132,7 +11114,6 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 	.css_released	= cpu_cgroup_css_released,
 	.css_free	= cpu_cgroup_css_free,
 	.css_extra_stat_show = cpu_extra_stat_show,
-	.fork		= cpu_cgroup_fork,
 	.can_attach	= cpu_cgroup_can_attach,
 	.attach		= cpu_cgroup_attach,
 	.legacy_cftypes	= cpu_legacy_files,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 77cd2bad17a8..89626b115660 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11661,15 +11661,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-static void task_set_group_fair(struct task_struct *p)
-{
-	struct sched_entity *se = &p->se;
-
-	set_task_rq(p, task_cpu(p));
-	se->depth = se->parent ? se->parent->depth + 1 : 0;
-}
-
-static void task_move_group_fair(struct task_struct *p)
+static void task_change_group_fair(struct task_struct *p)
 {
 	detach_task_cfs_rq(p);
 	set_task_rq(p, task_cpu(p));
@@ -11681,19 +11673,6 @@ static void task_move_group_fair(struct task_struct *p)
 	attach_task_cfs_rq(p);
 }
 
-static void task_change_group_fair(struct task_struct *p, int type)
-{
-	switch (type) {
-	case TASK_SET_GROUP:
-		task_set_group_fair(p);
-		break;
-
-	case TASK_MOVE_GROUP:
-		task_move_group_fair(p);
-		break;
-	}
-}
-
 void free_fair_sched_group(struct task_group *tg)
 {
 	int i;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 8cc3eb7b86cd..19e0076e4245 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2203,11 +2203,8 @@ struct sched_class {
 
 	void (*update_curr)(struct rq *rq);
 
-#define TASK_SET_GROUP		0
-#define TASK_MOVE_GROUP		1
-
 #ifdef CONFIG_FAIR_GROUP_SCHED
-	void (*task_change_group)(struct task_struct *p, int type);
+	void (*task_change_group)(struct task_struct *p);
 #endif
 };
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 03/10] sched/fair: reset sched_avg last_update_time before set_task_rq()
  2022-08-01  4:27 [PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 01/10] sched/fair: maintain task se depth in set_task_rq() Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 02/10] sched/fair: remove redundant cpu_cgrp_subsys->fork() Chengming Zhou
@ 2022-08-01  4:27 ` Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 04/10] sched/fair: update comments in enqueue/dequeue_entity() Chengming Zhou
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-01  4:27 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

set_task_rq() -> set_task_rq_fair() will try to synchronize the blocked
task's sched_avg when migrate, which is not needed for already detached
task.

task_change_group_fair() will detached the task sched_avg from prev cfs_rq
first, so reset sched_avg last_update_time before set_task_rq() to avoid that.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 89626b115660..948b4cd2a024 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11664,12 +11664,12 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 static void task_change_group_fair(struct task_struct *p)
 {
 	detach_task_cfs_rq(p);
-	set_task_rq(p, task_cpu(p));
 
 #ifdef CONFIG_SMP
 	/* Tell se's cfs_rq has been changed -- migrated */
 	p->se.avg.last_update_time = 0;
 #endif
+	set_task_rq(p, task_cpu(p));
 	attach_task_cfs_rq(p);
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 04/10] sched/fair: update comments in enqueue/dequeue_entity()
  2022-08-01  4:27 [PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (2 preceding siblings ...)
  2022-08-01  4:27 ` [PATCH v3 03/10] sched/fair: reset sched_avg last_update_time before set_task_rq() Chengming Zhou
@ 2022-08-01  4:27 ` Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 05/10] sched/fair: combine detach into dequeue when migrating task Chengming Zhou
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-01  4:27 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

When reading the sched_avg related code, I found the comments in
enqueue/dequeue_entity() are not updated with the current code.

We don't add/subtract entity's runnable_avg from cfs_rq->runnable_avg
during enqueue/dequeue_entity(), those are done only for attach/detach.

This patch updates the comments to reflect the current code working.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 948b4cd2a024..956aed56ac1e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4434,7 +4434,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	/*
 	 * When enqueuing a sched_entity, we must:
 	 *   - Update loads to have both entity and cfs_rq synced with now.
-	 *   - Add its load to cfs_rq->runnable_avg
+	 *   - For group_entity, update its runnable_weight to reflect the new
+	 *     h_nr_running of its group cfs_rq.
 	 *   - For group_entity, update its weight to reflect the new share of
 	 *     its group cfs_rq
 	 *   - Add its new weight to cfs_rq->load.weight
@@ -4519,7 +4520,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	/*
 	 * When dequeuing a sched_entity, we must:
 	 *   - Update loads to have both entity and cfs_rq synced with now.
-	 *   - Subtract its load from the cfs_rq->runnable_avg.
+	 *   - For group_entity, update its runnable_weight to reflect the new
+	 *     h_nr_running of its group cfs_rq.
 	 *   - Subtract its previous weight from cfs_rq->load.weight.
 	 *   - For group entity, update its weight to reflect the new share
 	 *     of its group cfs_rq.
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 05/10] sched/fair: combine detach into dequeue when migrating task
  2022-08-01  4:27 [PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (3 preceding siblings ...)
  2022-08-01  4:27 ` [PATCH v3 04/10] sched/fair: update comments in enqueue/dequeue_entity() Chengming Zhou
@ 2022-08-01  4:27 ` Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 06/10] sched/fair: fix another detach on unattached task corner case Chengming Zhou
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-01  4:27 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

When we are migrating task out of the CPU, we can combine detach and
propagation into dequeue_entity() to save the detach_entity_cfs_rq()
in migrate_task_rq_fair().

This optimization is like combining DO_ATTACH in the enqueue_entity()
when migrating task to the CPU. So we don't have to traverse the CFS tree
extra time to do the detach_entity_cfs_rq() -> propagate_entity_cfs_rq(),
which wouldn't be called anymore with this patch's change.

detach_task()
  deactivate_task()
    dequeue_task_fair()
      for_each_sched_entity(se)
        dequeue_entity()
          update_load_avg() /* (1) */
            detach_entity_load_avg()

  set_task_cpu()
    migrate_task_rq_fair()
      detach_entity_cfs_rq() /* (2) */
        update_load_avg();
        detach_entity_load_avg();
        propagate_entity_cfs_rq();
          for_each_sched_entity()
            update_load_avg()

This patch save the detach_entity_cfs_rq() called in (2) by doing
the detach_entity_load_avg() for a CPU migrating task inside (1)
(the task being the first se in the loop)

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 956aed56ac1e..ba8b937854b4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4003,6 +4003,7 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 #define UPDATE_TG	0x1
 #define SKIP_AGE_LOAD	0x2
 #define DO_ATTACH	0x4
+#define DO_DETACH	0x8
 
 /* Update task and its cfs_rq load average */
 static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
@@ -4031,7 +4032,13 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 		 */
 		attach_entity_load_avg(cfs_rq, se);
 		update_tg_load_avg(cfs_rq);
-
+	} else if (flags & DO_DETACH) {
+		/*
+		 * DO_DETACH means we're here from dequeue_entity()
+		 * and we are migrating task out of the CPU.
+		 */
+		detach_entity_load_avg(cfs_rq, se);
+		update_tg_load_avg(cfs_rq);
 	} else if (decayed) {
 		cfs_rq_util_change(cfs_rq, 0);
 
@@ -4292,6 +4299,7 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
 #define UPDATE_TG	0x0
 #define SKIP_AGE_LOAD	0x0
 #define DO_ATTACH	0x0
+#define DO_DETACH	0x0
 
 static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int not_used1)
 {
@@ -4512,6 +4520,11 @@ static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq);
 static void
 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 {
+	int action = UPDATE_TG;
+
+	if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
+		action |= DO_DETACH;
+
 	/*
 	 * Update run-time statistics of the 'current'.
 	 */
@@ -4526,7 +4539,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 *   - For group entity, update its weight to reflect the new share
 	 *     of its group cfs_rq.
 	 */
-	update_load_avg(cfs_rq, se, UPDATE_TG);
+	update_load_avg(cfs_rq, se, action);
 	se_update_runnable(se);
 
 	update_stats_dequeue_fair(cfs_rq, se, flags);
@@ -7081,8 +7094,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 	return new_cpu;
 }
 
-static void detach_entity_cfs_rq(struct sched_entity *se);
-
 /*
  * Called immediately before a task is migrated to a new CPU; task_cpu(p) and
  * cfs_rq_of(p) references at time of call are still valid and identify the
@@ -7104,15 +7115,7 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
 		se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
 	}
 
-	if (p->on_rq == TASK_ON_RQ_MIGRATING) {
-		/*
-		 * In case of TASK_ON_RQ_MIGRATING we in fact hold the 'old'
-		 * rq->lock and can modify state directly.
-		 */
-		lockdep_assert_rq_held(task_rq(p));
-		detach_entity_cfs_rq(se);
-
-	} else {
+	if (!task_on_rq_migrating(p)) {
 		remove_entity_load_avg(se);
 
 		/*
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 06/10] sched/fair: fix another detach on unattached task corner case
  2022-08-01  4:27 [PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (4 preceding siblings ...)
  2022-08-01  4:27 ` [PATCH v3 05/10] sched/fair: combine detach into dequeue when migrating task Chengming Zhou
@ 2022-08-01  4:27 ` Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 07/10] sched/fair: allow changing cgroup of new forked task Chengming Zhou
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-01  4:27 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
fixed two load tracking problems for new task, including detach on
unattached new task problem.

There still left another detach on unattached task problem for the task
which has been woken up by try_to_wake_up() and waiting for actually
being woken up by sched_ttwu_pending().

try_to_wake_up(p)
  cpu = select_task_rq(p)
  if (task_cpu(p) != cpu)
    set_task_cpu(p, cpu)
      migrate_task_rq_fair()
        remove_entity_load_avg()       --> unattached
        se->avg.last_update_time = 0;
      __set_task_cpu()
  ttwu_queue(p, cpu)
    ttwu_queue_wakelist()
      __ttwu_queue_wakelist()

task_change_group_fair()
  detach_task_cfs_rq()
    detach_entity_cfs_rq()
      detach_entity_load_avg()   --> detach on unattached task
  set_task_rq()
  attach_task_cfs_rq()
    attach_entity_cfs_rq()
      attach_entity_load_avg()

The reason of this problem is similar, we should check in detach_entity_cfs_rq()
that se->avg.last_update_time != 0, before do detach_entity_load_avg().

This patch move detach/attach_entity_cfs_rq() functions upper to be
together with other load tracking functions to avoid to use another
CONFIG_SMP, which also simplify the code.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/fair.c | 133 ++++++++++++++++++++++----------------------
 1 file changed, 68 insertions(+), 65 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ba8b937854b4..a32da4e71ddf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -874,9 +874,6 @@ void init_entity_runnable_average(struct sched_entity *se)
 void post_init_entity_util_avg(struct task_struct *p)
 {
 }
-static void update_tg_load_avg(struct cfs_rq *cfs_rq)
-{
-}
 #endif /* CONFIG_SMP */
 
 /*
@@ -3176,6 +3173,7 @@ void reweight_task(struct task_struct *p, int prio)
 	load->inv_weight = sched_prio_to_wmult[prio];
 }
 
+static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq);
 static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
@@ -4022,7 +4020,6 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	decayed |= propagate_entity_load_avg(se);
 
 	if (!se->avg.last_update_time && (flags & DO_ATTACH)) {
-
 		/*
 		 * DO_ATTACH means we're here from enqueue_entity().
 		 * !last_update_time means we've passed through
@@ -4085,6 +4082,71 @@ static void remove_entity_load_avg(struct sched_entity *se)
 	raw_spin_unlock_irqrestore(&cfs_rq->removed.lock, flags);
 }
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
+/*
+ * Propagate the changes of the sched_entity across the tg tree to make it
+ * visible to the root
+ */
+static void propagate_entity_cfs_rq(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+	if (cfs_rq_throttled(cfs_rq))
+		return;
+
+	if (!throttled_hierarchy(cfs_rq))
+		list_add_leaf_cfs_rq(cfs_rq);
+
+	/* Start to propagate at parent */
+	se = se->parent;
+
+	for_each_sched_entity(se) {
+		cfs_rq = cfs_rq_of(se);
+
+		update_load_avg(cfs_rq, se, UPDATE_TG);
+
+		if (cfs_rq_throttled(cfs_rq))
+			break;
+
+		if (!throttled_hierarchy(cfs_rq))
+			list_add_leaf_cfs_rq(cfs_rq);
+	}
+}
+#else
+static void propagate_entity_cfs_rq(struct sched_entity *se) { }
+#endif
+
+static void detach_entity_cfs_rq(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+	/*
+	 * In case the task sched_avg hasn't been attached:
+	 * - A forked task which hasn't been woken up by wake_up_new_task().
+	 * - A task which has been woken up by try_to_wake_up() but is
+	 *   waiting for actually being woken up by sched_ttwu_pending().
+	 */
+	if (!se->avg.last_update_time)
+		return;
+
+	/* Catch up with the cfs_rq and remove our load when we leave */
+	update_load_avg(cfs_rq, se, 0);
+	detach_entity_load_avg(cfs_rq, se);
+	update_tg_load_avg(cfs_rq);
+	propagate_entity_cfs_rq(se);
+}
+
+static void attach_entity_cfs_rq(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+	/* Synchronize entity with its cfs_rq */
+	update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
+	attach_entity_load_avg(cfs_rq, se);
+	update_tg_load_avg(cfs_rq);
+	propagate_entity_cfs_rq(se);
+}
+
 static inline unsigned long cfs_rq_runnable_avg(struct cfs_rq *cfs_rq)
 {
 	return cfs_rq->avg.runnable_avg;
@@ -4307,11 +4369,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 }
 
 static inline void remove_entity_load_avg(struct sched_entity *se) {}
-
-static inline void
-attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
-static inline void
-detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
+static inline void detach_entity_cfs_rq(struct sched_entity *se) {}
+static inline void attach_entity_cfs_rq(struct sched_entity *se) {}
 
 static inline int newidle_balance(struct rq *rq, struct rq_flags *rf)
 {
@@ -11522,62 +11581,6 @@ static inline bool vruntime_normalized(struct task_struct *p)
 	return false;
 }
 
-#ifdef CONFIG_FAIR_GROUP_SCHED
-/*
- * Propagate the changes of the sched_entity across the tg tree to make it
- * visible to the root
- */
-static void propagate_entity_cfs_rq(struct sched_entity *se)
-{
-	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-
-	if (cfs_rq_throttled(cfs_rq))
-		return;
-
-	if (!throttled_hierarchy(cfs_rq))
-		list_add_leaf_cfs_rq(cfs_rq);
-
-	/* Start to propagate at parent */
-	se = se->parent;
-
-	for_each_sched_entity(se) {
-		cfs_rq = cfs_rq_of(se);
-
-		update_load_avg(cfs_rq, se, UPDATE_TG);
-
-		if (cfs_rq_throttled(cfs_rq))
-			break;
-
-		if (!throttled_hierarchy(cfs_rq))
-			list_add_leaf_cfs_rq(cfs_rq);
-	}
-}
-#else
-static void propagate_entity_cfs_rq(struct sched_entity *se) { }
-#endif
-
-static void detach_entity_cfs_rq(struct sched_entity *se)
-{
-	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-
-	/* Catch up with the cfs_rq and remove our load when we leave */
-	update_load_avg(cfs_rq, se, 0);
-	detach_entity_load_avg(cfs_rq, se);
-	update_tg_load_avg(cfs_rq);
-	propagate_entity_cfs_rq(se);
-}
-
-static void attach_entity_cfs_rq(struct sched_entity *se)
-{
-	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-
-	/* Synchronize entity with its cfs_rq */
-	update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
-	attach_entity_load_avg(cfs_rq, se);
-	update_tg_load_avg(cfs_rq);
-	propagate_entity_cfs_rq(se);
-}
-
 static void detach_task_cfs_rq(struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 07/10] sched/fair: allow changing cgroup of new forked task
  2022-08-01  4:27 [PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (5 preceding siblings ...)
  2022-08-01  4:27 ` [PATCH v3 06/10] sched/fair: fix another detach on unattached task corner case Chengming Zhou
@ 2022-08-01  4:27 ` Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 08/10] sched/fair: refactor detach/attach_entity_cfs_rq using update_load_avg() Chengming Zhou
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-01  4:27 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
introduce a TASK_NEW state and an unnessary limitation that would fail
when changing cgroup of new forked task.

Because at that time, we can't handle task_change_group_fair() for new
forked fair task which hasn't been woken up by wake_up_new_task(),
which will cause detach on an unattached task sched_avg problem.

This patch delete this unnessary limitation by adding check before do
attach_entity_cfs_rq().

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 include/linux/sched.h |  5 ++---
 kernel/sched/core.c   | 30 +++++++-----------------------
 kernel/sched/fair.c   |  7 ++++++-
 3 files changed, 15 insertions(+), 27 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 88b8817b827d..b504e55bbf7a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -95,10 +95,9 @@ struct task_group;
 #define TASK_WAKEKILL			0x0100
 #define TASK_WAKING			0x0200
 #define TASK_NOLOAD			0x0400
-#define TASK_NEW			0x0800
 /* RT specific auxilliary flag to mark RT lock waiters */
-#define TASK_RTLOCK_WAIT		0x1000
-#define TASK_STATE_MAX			0x2000
+#define TASK_RTLOCK_WAIT		0x0800
+#define TASK_STATE_MAX			0x1000
 
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 614d7180c99e..220bce5e73e0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4500,11 +4500,11 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 {
 	__sched_fork(clone_flags, p);
 	/*
-	 * We mark the process as NEW here. This guarantees that
+	 * We mark the process as running here. This guarantees that
 	 * nobody will actually run it, and a signal or other external
 	 * event cannot wake it up and insert it on the runqueue either.
 	 */
-	p->__state = TASK_NEW;
+	p->__state = TASK_RUNNING;
 
 	/*
 	 * Make sure we do not leak PI boosting priority to the child.
@@ -4622,7 +4622,6 @@ void wake_up_new_task(struct task_struct *p)
 	struct rq *rq;
 
 	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
-	WRITE_ONCE(p->__state, TASK_RUNNING);
 #ifdef CONFIG_SMP
 	/*
 	 * Fork balancing, do it here and not earlier because:
@@ -10249,36 +10248,19 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
 	sched_unregister_group(tg);
 }
 
+#ifdef CONFIG_RT_GROUP_SCHED
 static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
 {
 	struct task_struct *task;
 	struct cgroup_subsys_state *css;
-	int ret = 0;
 
 	cgroup_taskset_for_each(task, css, tset) {
-#ifdef CONFIG_RT_GROUP_SCHED
 		if (!sched_rt_can_attach(css_tg(css), task))
 			return -EINVAL;
-#endif
-		/*
-		 * Serialize against wake_up_new_task() such that if it's
-		 * running, we're sure to observe its full state.
-		 */
-		raw_spin_lock_irq(&task->pi_lock);
-		/*
-		 * Avoid calling sched_move_task() before wake_up_new_task()
-		 * has happened. This would lead to problems with PELT, due to
-		 * move wanting to detach+attach while we're not attached yet.
-		 */
-		if (READ_ONCE(task->__state) == TASK_NEW)
-			ret = -EINVAL;
-		raw_spin_unlock_irq(&task->pi_lock);
-
-		if (ret)
-			break;
 	}
-	return ret;
+	return 0;
 }
+#endif
 
 static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 {
@@ -11114,7 +11096,9 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 	.css_released	= cpu_cgroup_css_released,
 	.css_free	= cpu_cgroup_css_free,
 	.css_extra_stat_show = cpu_extra_stat_show,
+#ifdef CONFIG_RT_GROUP_SCHED
 	.can_attach	= cpu_cgroup_can_attach,
+#endif
 	.attach		= cpu_cgroup_attach,
 	.legacy_cftypes	= cpu_legacy_files,
 	.dfl_cftypes	= cpu_files,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a32da4e71ddf..ad20a939227d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11603,7 +11603,12 @@ static void attach_task_cfs_rq(struct task_struct *p)
 	struct sched_entity *se = &p->se;
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
-	attach_entity_cfs_rq(se);
+	/*
+	 * We couldn't detach or attach a forked task which
+	 * hasn't been woken up by wake_up_new_task().
+	 */
+	if (p->on_rq || se->sum_exec_runtime)
+		attach_entity_cfs_rq(se);
 
 	if (!vruntime_normalized(p))
 		se->vruntime += cfs_rq->min_vruntime;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 08/10] sched/fair: refactor detach/attach_entity_cfs_rq using update_load_avg()
  2022-08-01  4:27 [PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (6 preceding siblings ...)
  2022-08-01  4:27 ` [PATCH v3 07/10] sched/fair: allow changing cgroup of new forked task Chengming Zhou
@ 2022-08-01  4:27 ` Chengming Zhou
  2022-08-01  8:07   ` kernel test robot
  2022-08-01 23:22   ` kernel test robot
  2022-08-01  4:27 ` [PATCH v3 09/10] sched/fair: defer task sched_avg attach to enqueue_entity() Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 10/10] sched/fair: don't init util/runnable_avg for !fair task Chengming Zhou
  9 siblings, 2 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-01  4:27 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

Since update_load_avg() now support DO_ATTACH and DO_DETACH flags to
attach or detach entity sched_avg to the cfs_rq, we can using it to
refactor detach/attach_entity_cfs_rq() functions.

Note we can attach a task with last_update_time!=0 from switched_to_fair()
since we want to decay sched_avg when running in !fair class.

So this patch move last_update_time condition check to enqueue_entity()
for task which migrate CPU or change cgroup.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/fair.c | 68 ++++++++++++++++++---------------------------
 1 file changed, 27 insertions(+), 41 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ad20a939227d..b8cb826bd755 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4019,21 +4019,10 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	decayed  = update_cfs_rq_load_avg(now, cfs_rq);
 	decayed |= propagate_entity_load_avg(se);
 
-	if (!se->avg.last_update_time && (flags & DO_ATTACH)) {
-		/*
-		 * DO_ATTACH means we're here from enqueue_entity().
-		 * !last_update_time means we've passed through
-		 * migrate_task_rq_fair() indicating we migrated.
-		 *
-		 * IOW we're enqueueing a task on a new CPU.
-		 */
+	if (flags & DO_ATTACH) {
 		attach_entity_load_avg(cfs_rq, se);
 		update_tg_load_avg(cfs_rq);
 	} else if (flags & DO_DETACH) {
-		/*
-		 * DO_DETACH means we're here from dequeue_entity()
-		 * and we are migrating task out of the CPU.
-		 */
 		detach_entity_load_avg(cfs_rq, se);
 		update_tg_load_avg(cfs_rq);
 	} else if (decayed) {
@@ -4082,44 +4071,31 @@ static void remove_entity_load_avg(struct sched_entity *se)
 	raw_spin_unlock_irqrestore(&cfs_rq->removed.lock, flags);
 }
 
-#ifdef CONFIG_FAIR_GROUP_SCHED
 /*
  * Propagate the changes of the sched_entity across the tg tree to make it
  * visible to the root
  */
-static void propagate_entity_cfs_rq(struct sched_entity *se)
+static void propagate_entity_cfs_rq(struct sched_entity *se, int flags)
 {
-	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-
-	if (cfs_rq_throttled(cfs_rq))
-		return;
-
-	if (!throttled_hierarchy(cfs_rq))
-		list_add_leaf_cfs_rq(cfs_rq);
-
-	/* Start to propagate at parent */
-	se = se->parent;
+	struct cfs_rq *cfs_rq;
 
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
 
-		update_load_avg(cfs_rq, se, UPDATE_TG);
+		update_load_avg(cfs_rq, se, flags);
 
 		if (cfs_rq_throttled(cfs_rq))
 			break;
 
 		if (!throttled_hierarchy(cfs_rq))
 			list_add_leaf_cfs_rq(cfs_rq);
+
+		flags = UPDATE_TG;
 	}
 }
-#else
-static void propagate_entity_cfs_rq(struct sched_entity *se) { }
-#endif
 
 static void detach_entity_cfs_rq(struct sched_entity *se)
 {
-	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-
 	/*
 	 * In case the task sched_avg hasn't been attached:
 	 * - A forked task which hasn't been woken up by wake_up_new_task().
@@ -4130,21 +4106,18 @@ static void detach_entity_cfs_rq(struct sched_entity *se)
 		return;
 
 	/* Catch up with the cfs_rq and remove our load when we leave */
-	update_load_avg(cfs_rq, se, 0);
-	detach_entity_load_avg(cfs_rq, se);
-	update_tg_load_avg(cfs_rq);
-	propagate_entity_cfs_rq(se);
+	propagate_entity_cfs_rq(se, DO_DETACH | UPDATE_TG);
 }
 
 static void attach_entity_cfs_rq(struct sched_entity *se)
 {
-	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+	int flags = DO_ATTACH | UPDATE_TG;
+
+	if (!sched_feat(ATTACH_AGE_LOAD))
+		flags |= SKIP_AGE_LOAD;
 
-	/* Synchronize entity with its cfs_rq */
-	update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
-	attach_entity_load_avg(cfs_rq, se);
-	update_tg_load_avg(cfs_rq);
-	propagate_entity_cfs_rq(se);
+	/* Synchronize entity with its cfs_rq and attach our load */
+	propagate_entity_cfs_rq(se, flags);
 }
 
 static inline unsigned long cfs_rq_runnable_avg(struct cfs_rq *cfs_rq)
@@ -4479,6 +4452,15 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 {
 	bool renorm = !(flags & ENQUEUE_WAKEUP) || (flags & ENQUEUE_MIGRATED);
 	bool curr = cfs_rq->curr == se;
+	int action = UPDATE_TG;
+
+	/*
+	 * !last_update_time means we've passed through migrate_task_rq_fair()
+	 * or task_change_group_fair() indicating we migrated cfs_rq. IOW we're
+	 * enqueueing a task on a new CPU or moving task to a new cgroup.
+	 */
+	if (!se->avg.last_update_time)
+		action |= DO_ATTACH;
 
 	/*
 	 * If we're the current task, we must renormalise before calling
@@ -4507,7 +4489,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 *     its group cfs_rq
 	 *   - Add its new weight to cfs_rq->load.weight
 	 */
-	update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH);
+	update_load_avg(cfs_rq, se, action);
 	se_update_runnable(se);
 	update_cfs_group(se);
 	account_entity_enqueue(cfs_rq, se);
@@ -4581,6 +4563,10 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 {
 	int action = UPDATE_TG;
 
+	/*
+	 * When we are migrating task out of the CPU, we should
+	 * detach entity sched_avg from the cfs_rq.
+	 */
 	if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
 		action |= DO_DETACH;
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 09/10] sched/fair: defer task sched_avg attach to enqueue_entity()
  2022-08-01  4:27 [PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (7 preceding siblings ...)
  2022-08-01  4:27 ` [PATCH v3 08/10] sched/fair: refactor detach/attach_entity_cfs_rq using update_load_avg() Chengming Zhou
@ 2022-08-01  4:27 ` Chengming Zhou
  2022-08-01  4:27 ` [PATCH v3 10/10] sched/fair: don't init util/runnable_avg for !fair task Chengming Zhou
  9 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-01  4:27 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

When wake_up_new_task(), we would use post_init_entity_util_avg()
to init util_avg/runnable_avg based on cpu's util_avg at that time,
then attach task sched_avg to cfs_rq.

Since enqueue_entity() would always attach any unattached task entity,
so we can defer this work to enqueue_entity().

post_init_entity_util_avg(p)
  attach_entity_cfs_rq()  --> (1)
activate_task(rq, p)
  enqueue_task() := enqueue_task_fair()
  enqueue_entity()
    update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH)
      if (!se->avg.last_update_time && (flags & DO_ATTACH))
        attach_entity_load_avg()  --> (2)

This patch defer attach from (1) to (2)

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/fair.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b8cb826bd755..18e3dff606db 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -799,8 +799,6 @@ void init_entity_runnable_average(struct sched_entity *se)
 	/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
 }
 
-static void attach_entity_cfs_rq(struct sched_entity *se);
-
 /*
  * With new tasks being created, their initial util_avgs are extrapolated
  * based on the cfs_rq's current util_avg:
@@ -863,8 +861,6 @@ void post_init_entity_util_avg(struct task_struct *p)
 		se->avg.last_update_time = cfs_rq_clock_pelt(cfs_rq);
 		return;
 	}
-
-	attach_entity_cfs_rq(se);
 }
 
 #else /* !CONFIG_SMP */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 10/10] sched/fair: don't init util/runnable_avg for !fair task
  2022-08-01  4:27 [PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (8 preceding siblings ...)
  2022-08-01  4:27 ` [PATCH v3 09/10] sched/fair: defer task sched_avg attach to enqueue_entity() Chengming Zhou
@ 2022-08-01  4:27 ` Chengming Zhou
  9 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-01  4:27 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

post_init_entity_util_avg() init task util_avg according to the cpu util_avg
at the time of fork, which will decay when switched_to_fair() some time later,
we'd better to not set them at all in the case of !fair task.

Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/fair.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 18e3dff606db..071c159605e7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -833,20 +833,6 @@ void post_init_entity_util_avg(struct task_struct *p)
 	long cpu_scale = arch_scale_cpu_capacity(cpu_of(rq_of(cfs_rq)));
 	long cap = (long)(cpu_scale - cfs_rq->avg.util_avg) / 2;
 
-	if (cap > 0) {
-		if (cfs_rq->avg.util_avg != 0) {
-			sa->util_avg  = cfs_rq->avg.util_avg * se->load.weight;
-			sa->util_avg /= (cfs_rq->avg.load_avg + 1);
-
-			if (sa->util_avg > cap)
-				sa->util_avg = cap;
-		} else {
-			sa->util_avg = cap;
-		}
-	}
-
-	sa->runnable_avg = sa->util_avg;
-
 	if (p->sched_class != &fair_sched_class) {
 		/*
 		 * For !fair tasks do:
@@ -861,6 +847,20 @@ void post_init_entity_util_avg(struct task_struct *p)
 		se->avg.last_update_time = cfs_rq_clock_pelt(cfs_rq);
 		return;
 	}
+
+	if (cap > 0) {
+		if (cfs_rq->avg.util_avg != 0) {
+			sa->util_avg  = cfs_rq->avg.util_avg * se->load.weight;
+			sa->util_avg /= (cfs_rq->avg.load_avg + 1);
+
+			if (sa->util_avg > cap)
+				sa->util_avg = cap;
+		} else {
+			sa->util_avg = cap;
+		}
+	}
+
+	sa->runnable_avg = sa->util_avg;
 }
 
 #else /* !CONFIG_SMP */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 08/10] sched/fair: refactor detach/attach_entity_cfs_rq using update_load_avg()
  2022-08-01  4:27 ` [PATCH v3 08/10] sched/fair: refactor detach/attach_entity_cfs_rq using update_load_avg() Chengming Zhou
@ 2022-08-01  8:07   ` kernel test robot
  2022-08-05  2:11     ` Chengming Zhou
  2022-08-01 23:22   ` kernel test robot
  1 sibling, 1 reply; 14+ messages in thread
From: kernel test robot @ 2022-08-01  8:07 UTC (permalink / raw)
  To: Chengming Zhou, mingo, peterz, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: kbuild-all, linux-kernel, Chengming Zhou

Hi Chengming,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on next-20220728]
[cannot apply to linus/master v5.19]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Chengming-Zhou/sched-fair-task-load-tracking-optimization-and-cleanup/20220801-122957
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 8da3d9b8590bc178752d4b72938745e9a6c4c416
config: um-i386_defconfig (https://download.01.org/0day-ci/archive/20220801/202208011647.2KU7IF9Y-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/336247ff1d2b402a18689fd891d79e99d8b444fc
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Chengming-Zhou/sched-fair-task-load-tracking-optimization-and-cleanup/20220801-122957
        git checkout 336247ff1d2b402a18689fd891d79e99d8b444fc
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=um SUBARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/sched/fair.c:672:5: warning: no previous prototype for 'sched_update_scaling' [-Wmissing-prototypes]
     672 | int sched_update_scaling(void)
         |     ^~~~~~~~~~~~~~~~~~~~
   kernel/sched/fair.c: In function 'enqueue_entity':
>> kernel/sched/fair.c:4462:16: error: 'struct sched_entity' has no member named 'avg'
    4462 |         if (!se->avg.last_update_time)
         |                ^~


vim +4462 kernel/sched/fair.c

  4419	
  4420	/*
  4421	 * MIGRATION
  4422	 *
  4423	 *	dequeue
  4424	 *	  update_curr()
  4425	 *	    update_min_vruntime()
  4426	 *	  vruntime -= min_vruntime
  4427	 *
  4428	 *	enqueue
  4429	 *	  update_curr()
  4430	 *	    update_min_vruntime()
  4431	 *	  vruntime += min_vruntime
  4432	 *
  4433	 * this way the vruntime transition between RQs is done when both
  4434	 * min_vruntime are up-to-date.
  4435	 *
  4436	 * WAKEUP (remote)
  4437	 *
  4438	 *	->migrate_task_rq_fair() (p->state == TASK_WAKING)
  4439	 *	  vruntime -= min_vruntime
  4440	 *
  4441	 *	enqueue
  4442	 *	  update_curr()
  4443	 *	    update_min_vruntime()
  4444	 *	  vruntime += min_vruntime
  4445	 *
  4446	 * this way we don't have the most up-to-date min_vruntime on the originating
  4447	 * CPU and an up-to-date min_vruntime on the destination CPU.
  4448	 */
  4449	
  4450	static void
  4451	enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
  4452	{
  4453		bool renorm = !(flags & ENQUEUE_WAKEUP) || (flags & ENQUEUE_MIGRATED);
  4454		bool curr = cfs_rq->curr == se;
  4455		int action = UPDATE_TG;
  4456	
  4457		/*
  4458		 * !last_update_time means we've passed through migrate_task_rq_fair()
  4459		 * or task_change_group_fair() indicating we migrated cfs_rq. IOW we're
  4460		 * enqueueing a task on a new CPU or moving task to a new cgroup.
  4461		 */
> 4462		if (!se->avg.last_update_time)
  4463			action |= DO_ATTACH;
  4464	
  4465		/*
  4466		 * If we're the current task, we must renormalise before calling
  4467		 * update_curr().
  4468		 */
  4469		if (renorm && curr)
  4470			se->vruntime += cfs_rq->min_vruntime;
  4471	
  4472		update_curr(cfs_rq);
  4473	
  4474		/*
  4475		 * Otherwise, renormalise after, such that we're placed at the current
  4476		 * moment in time, instead of some random moment in the past. Being
  4477		 * placed in the past could significantly boost this task to the
  4478		 * fairness detriment of existing tasks.
  4479		 */
  4480		if (renorm && !curr)
  4481			se->vruntime += cfs_rq->min_vruntime;
  4482	
  4483		/*
  4484		 * When enqueuing a sched_entity, we must:
  4485		 *   - Update loads to have both entity and cfs_rq synced with now.
  4486		 *   - For group_entity, update its runnable_weight to reflect the new
  4487		 *     h_nr_running of its group cfs_rq.
  4488		 *   - For group_entity, update its weight to reflect the new share of
  4489		 *     its group cfs_rq
  4490		 *   - Add its new weight to cfs_rq->load.weight
  4491		 */
  4492		update_load_avg(cfs_rq, se, action);
  4493		se_update_runnable(se);
  4494		update_cfs_group(se);
  4495		account_entity_enqueue(cfs_rq, se);
  4496	
  4497		if (flags & ENQUEUE_WAKEUP)
  4498			place_entity(cfs_rq, se, 0);
  4499	
  4500		check_schedstat_required();
  4501		update_stats_enqueue_fair(cfs_rq, se, flags);
  4502		check_spread(cfs_rq, se);
  4503		if (!curr)
  4504			__enqueue_entity(cfs_rq, se);
  4505		se->on_rq = 1;
  4506	
  4507		if (cfs_rq->nr_running == 1) {
  4508			check_enqueue_throttle(cfs_rq);
  4509			if (!throttled_hierarchy(cfs_rq))
  4510				list_add_leaf_cfs_rq(cfs_rq);
  4511		}
  4512	}
  4513	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 08/10] sched/fair: refactor detach/attach_entity_cfs_rq using update_load_avg()
  2022-08-01  4:27 ` [PATCH v3 08/10] sched/fair: refactor detach/attach_entity_cfs_rq using update_load_avg() Chengming Zhou
  2022-08-01  8:07   ` kernel test robot
@ 2022-08-01 23:22   ` kernel test robot
  1 sibling, 0 replies; 14+ messages in thread
From: kernel test robot @ 2022-08-01 23:22 UTC (permalink / raw)
  To: Chengming Zhou, mingo, peterz, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: llvm, kbuild-all, linux-kernel, Chengming Zhou

Hi Chengming,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on linus/master next-20220728]
[cannot apply to v5.19]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Chengming-Zhou/sched-fair-task-load-tracking-optimization-and-cleanup/20220801-122957
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 8da3d9b8590bc178752d4b72938745e9a6c4c416
config: hexagon-randconfig-r012-20220731 (https://download.01.org/0day-ci/archive/20220802/202208020758.Ff3SOjvD-lkp@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 52cd00cabf479aa7eb6dbb063b7ba41ea57bce9e)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/336247ff1d2b402a18689fd891d79e99d8b444fc
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Chengming-Zhou/sched-fair-task-load-tracking-optimization-and-cleanup/20220801-122957
        git checkout 336247ff1d2b402a18689fd891d79e99d8b444fc
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash kernel/sched/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> kernel/sched/fair.c:4462:11: error: no member named 'avg' in 'struct sched_entity'
           if (!se->avg.last_update_time)
                ~~  ^
   1 error generated.


vim +4462 kernel/sched/fair.c

  4419	
  4420	/*
  4421	 * MIGRATION
  4422	 *
  4423	 *	dequeue
  4424	 *	  update_curr()
  4425	 *	    update_min_vruntime()
  4426	 *	  vruntime -= min_vruntime
  4427	 *
  4428	 *	enqueue
  4429	 *	  update_curr()
  4430	 *	    update_min_vruntime()
  4431	 *	  vruntime += min_vruntime
  4432	 *
  4433	 * this way the vruntime transition between RQs is done when both
  4434	 * min_vruntime are up-to-date.
  4435	 *
  4436	 * WAKEUP (remote)
  4437	 *
  4438	 *	->migrate_task_rq_fair() (p->state == TASK_WAKING)
  4439	 *	  vruntime -= min_vruntime
  4440	 *
  4441	 *	enqueue
  4442	 *	  update_curr()
  4443	 *	    update_min_vruntime()
  4444	 *	  vruntime += min_vruntime
  4445	 *
  4446	 * this way we don't have the most up-to-date min_vruntime on the originating
  4447	 * CPU and an up-to-date min_vruntime on the destination CPU.
  4448	 */
  4449	
  4450	static void
  4451	enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
  4452	{
  4453		bool renorm = !(flags & ENQUEUE_WAKEUP) || (flags & ENQUEUE_MIGRATED);
  4454		bool curr = cfs_rq->curr == se;
  4455		int action = UPDATE_TG;
  4456	
  4457		/*
  4458		 * !last_update_time means we've passed through migrate_task_rq_fair()
  4459		 * or task_change_group_fair() indicating we migrated cfs_rq. IOW we're
  4460		 * enqueueing a task on a new CPU or moving task to a new cgroup.
  4461		 */
> 4462		if (!se->avg.last_update_time)
  4463			action |= DO_ATTACH;
  4464	
  4465		/*
  4466		 * If we're the current task, we must renormalise before calling
  4467		 * update_curr().
  4468		 */
  4469		if (renorm && curr)
  4470			se->vruntime += cfs_rq->min_vruntime;
  4471	
  4472		update_curr(cfs_rq);
  4473	
  4474		/*
  4475		 * Otherwise, renormalise after, such that we're placed at the current
  4476		 * moment in time, instead of some random moment in the past. Being
  4477		 * placed in the past could significantly boost this task to the
  4478		 * fairness detriment of existing tasks.
  4479		 */
  4480		if (renorm && !curr)
  4481			se->vruntime += cfs_rq->min_vruntime;
  4482	
  4483		/*
  4484		 * When enqueuing a sched_entity, we must:
  4485		 *   - Update loads to have both entity and cfs_rq synced with now.
  4486		 *   - For group_entity, update its runnable_weight to reflect the new
  4487		 *     h_nr_running of its group cfs_rq.
  4488		 *   - For group_entity, update its weight to reflect the new share of
  4489		 *     its group cfs_rq
  4490		 *   - Add its new weight to cfs_rq->load.weight
  4491		 */
  4492		update_load_avg(cfs_rq, se, action);
  4493		se_update_runnable(se);
  4494		update_cfs_group(se);
  4495		account_entity_enqueue(cfs_rq, se);
  4496	
  4497		if (flags & ENQUEUE_WAKEUP)
  4498			place_entity(cfs_rq, se, 0);
  4499	
  4500		check_schedstat_required();
  4501		update_stats_enqueue_fair(cfs_rq, se, flags);
  4502		check_spread(cfs_rq, se);
  4503		if (!curr)
  4504			__enqueue_entity(cfs_rq, se);
  4505		se->on_rq = 1;
  4506	
  4507		if (cfs_rq->nr_running == 1) {
  4508			check_enqueue_throttle(cfs_rq);
  4509			if (!throttled_hierarchy(cfs_rq))
  4510				list_add_leaf_cfs_rq(cfs_rq);
  4511		}
  4512	}
  4513	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 08/10] sched/fair: refactor detach/attach_entity_cfs_rq using update_load_avg()
  2022-08-01  8:07   ` kernel test robot
@ 2022-08-05  2:11     ` Chengming Zhou
  0 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-05  2:11 UTC (permalink / raw)
  To: kernel test robot
  Cc: kbuild-all, linux-kernel, mingo, peterz, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, vschneid

On 2022/8/1 16:07, kernel test robot wrote:
> Hi Chengming,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on tip/sched/core]
> [also build test ERROR on next-20220728]
> [cannot apply to linus/master v5.19]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Chengming-Zhou/sched-fair-task-load-tracking-optimization-and-cleanup/20220801-122957
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 8da3d9b8590bc178752d4b72938745e9a6c4c416
> config: um-i386_defconfig (https://download.01.org/0day-ci/archive/20220801/202208011647.2KU7IF9Y-lkp@intel.com/config)
> compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
> reproduce (this is a W=1 build):
>         # https://github.com/intel-lab-lkp/linux/commit/336247ff1d2b402a18689fd891d79e99d8b444fc
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Chengming-Zhou/sched-fair-task-load-tracking-optimization-and-cleanup/20220801-122957
>         git checkout 336247ff1d2b402a18689fd891d79e99d8b444fc
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         make W=1 O=build_dir ARCH=um SUBARCH=i386 SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
>    kernel/sched/fair.c:672:5: warning: no previous prototype for 'sched_update_scaling' [-Wmissing-prototypes]
>      672 | int sched_update_scaling(void)
>          |     ^~~~~~~~~~~~~~~~~~~~
>    kernel/sched/fair.c: In function 'enqueue_entity':
>>> kernel/sched/fair.c:4462:16: error: 'struct sched_entity' has no member named 'avg'
>     4462 |         if (!se->avg.last_update_time)
>          |                ^~
> 

Thanks for the test report!

It seems because sched_entity has no member avg on !CONFIG_SMP,
I think we'd better drop this patch for now since it's just code
refactor, not real improvement.

> 
> vim +4462 kernel/sched/fair.c
> 
>   4419	
>   4420	/*
>   4421	 * MIGRATION
>   4422	 *
>   4423	 *	dequeue
>   4424	 *	  update_curr()
>   4425	 *	    update_min_vruntime()
>   4426	 *	  vruntime -= min_vruntime
>   4427	 *
>   4428	 *	enqueue
>   4429	 *	  update_curr()
>   4430	 *	    update_min_vruntime()
>   4431	 *	  vruntime += min_vruntime
>   4432	 *
>   4433	 * this way the vruntime transition between RQs is done when both
>   4434	 * min_vruntime are up-to-date.
>   4435	 *
>   4436	 * WAKEUP (remote)
>   4437	 *
>   4438	 *	->migrate_task_rq_fair() (p->state == TASK_WAKING)
>   4439	 *	  vruntime -= min_vruntime
>   4440	 *
>   4441	 *	enqueue
>   4442	 *	  update_curr()
>   4443	 *	    update_min_vruntime()
>   4444	 *	  vruntime += min_vruntime
>   4445	 *
>   4446	 * this way we don't have the most up-to-date min_vruntime on the originating
>   4447	 * CPU and an up-to-date min_vruntime on the destination CPU.
>   4448	 */
>   4449	
>   4450	static void
>   4451	enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>   4452	{
>   4453		bool renorm = !(flags & ENQUEUE_WAKEUP) || (flags & ENQUEUE_MIGRATED);
>   4454		bool curr = cfs_rq->curr == se;
>   4455		int action = UPDATE_TG;
>   4456	
>   4457		/*
>   4458		 * !last_update_time means we've passed through migrate_task_rq_fair()
>   4459		 * or task_change_group_fair() indicating we migrated cfs_rq. IOW we're
>   4460		 * enqueueing a task on a new CPU or moving task to a new cgroup.
>   4461		 */
>> 4462		if (!se->avg.last_update_time)
>   4463			action |= DO_ATTACH;
>   4464	
>   4465		/*
>   4466		 * If we're the current task, we must renormalise before calling
>   4467		 * update_curr().
>   4468		 */
>   4469		if (renorm && curr)
>   4470			se->vruntime += cfs_rq->min_vruntime;
>   4471	
>   4472		update_curr(cfs_rq);
>   4473	
>   4474		/*
>   4475		 * Otherwise, renormalise after, such that we're placed at the current
>   4476		 * moment in time, instead of some random moment in the past. Being
>   4477		 * placed in the past could significantly boost this task to the
>   4478		 * fairness detriment of existing tasks.
>   4479		 */
>   4480		if (renorm && !curr)
>   4481			se->vruntime += cfs_rq->min_vruntime;
>   4482	
>   4483		/*
>   4484		 * When enqueuing a sched_entity, we must:
>   4485		 *   - Update loads to have both entity and cfs_rq synced with now.
>   4486		 *   - For group_entity, update its runnable_weight to reflect the new
>   4487		 *     h_nr_running of its group cfs_rq.
>   4488		 *   - For group_entity, update its weight to reflect the new share of
>   4489		 *     its group cfs_rq
>   4490		 *   - Add its new weight to cfs_rq->load.weight
>   4491		 */
>   4492		update_load_avg(cfs_rq, se, action);
>   4493		se_update_runnable(se);
>   4494		update_cfs_group(se);
>   4495		account_entity_enqueue(cfs_rq, se);
>   4496	
>   4497		if (flags & ENQUEUE_WAKEUP)
>   4498			place_entity(cfs_rq, se, 0);
>   4499	
>   4500		check_schedstat_required();
>   4501		update_stats_enqueue_fair(cfs_rq, se, flags);
>   4502		check_spread(cfs_rq, se);
>   4503		if (!curr)
>   4504			__enqueue_entity(cfs_rq, se);
>   4505		se->on_rq = 1;
>   4506	
>   4507		if (cfs_rq->nr_running == 1) {
>   4508			check_enqueue_throttle(cfs_rq);
>   4509			if (!throttled_hierarchy(cfs_rq))
>   4510				list_add_leaf_cfs_rq(cfs_rq);
>   4511		}
>   4512	}
>   4513	
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-08-05  2:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-01  4:27 [PATCH v3 00/10] sched/fair: task load tracking optimization and cleanup Chengming Zhou
2022-08-01  4:27 ` [PATCH v3 01/10] sched/fair: maintain task se depth in set_task_rq() Chengming Zhou
2022-08-01  4:27 ` [PATCH v3 02/10] sched/fair: remove redundant cpu_cgrp_subsys->fork() Chengming Zhou
2022-08-01  4:27 ` [PATCH v3 03/10] sched/fair: reset sched_avg last_update_time before set_task_rq() Chengming Zhou
2022-08-01  4:27 ` [PATCH v3 04/10] sched/fair: update comments in enqueue/dequeue_entity() Chengming Zhou
2022-08-01  4:27 ` [PATCH v3 05/10] sched/fair: combine detach into dequeue when migrating task Chengming Zhou
2022-08-01  4:27 ` [PATCH v3 06/10] sched/fair: fix another detach on unattached task corner case Chengming Zhou
2022-08-01  4:27 ` [PATCH v3 07/10] sched/fair: allow changing cgroup of new forked task Chengming Zhou
2022-08-01  4:27 ` [PATCH v3 08/10] sched/fair: refactor detach/attach_entity_cfs_rq using update_load_avg() Chengming Zhou
2022-08-01  8:07   ` kernel test robot
2022-08-05  2:11     ` Chengming Zhou
2022-08-01 23:22   ` kernel test robot
2022-08-01  4:27 ` [PATCH v3 09/10] sched/fair: defer task sched_avg attach to enqueue_entity() Chengming Zhou
2022-08-01  4:27 ` [PATCH v3 10/10] sched/fair: don't init util/runnable_avg for !fair task Chengming Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).