[PATCH v4 0/9] sched/fair: task load tracking optimization and cleanup

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/9] sched/fair: task load tracking optimization and cleanup
@ 2022-08-08 12:57 Chengming Zhou
  2022-08-08 12:57 ` [PATCH v4 1/9] sched/fair: maintain task se depth in set_task_rq() Chengming Zhou
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-08-08 12:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

Hi all,

This patch series is optimization and cleanup for task load tracking when
task migrate CPU/cgroup or switched_from/to_fair(), based on tip/sched/core.

There are three types of detach/attach_entity_load_avg (except fork and exit)
for a fair task:
1. task migrate CPU (on_rq migrate or wake_up migrate)
2. task migrate cgroup (detach and attach)
3. task switched_from/to_fair (detach later attach)

patch 1-3 cleanup the task change cgroup case by remove cpu_cgrp_subsys->fork(),
since we already do the same thing in sched_cgroup_fork().

patch 5/9 optimize the task migrate CPU case by combine detach into dequeue.

patch 6/9 fix another detach on unattached task case which has been woken up
by try_to_wake_up() but is waiting for actually being woken up by
sched_ttwu_pending().

patch 7/9 remove unnecessary limitation that we would fail when change
cgroup of forked task which hasn't been woken up by wake_up_new_task().

patch 8-9 optimize post_init_entity_util_avg() for fair task and skip
setting util_avg and runnable_avg for !fair task at the fork time.

Thanks!

Changes in v4:
 - Drop detach/attach_entity_cfs_rq() refactor patch in the last version.
 - Move new forked task check to task_change_group_fair().

Changes in v3:
 - One big change is this series don't freeze PELT sum/avg values to be
   used as initial values when re-entering fair any more, since these
   PELT values become much less relevant.
 - Reorder patches and collect tags from Vincent and Dietmar. Thanks!
 - Fix detach on unattached task which has been woken up by try_to_wake_up()
   but is waiting for actually being woken up by sched_ttwu_pending().
 - Delete TASK_NEW which limit forked task from changing cgroup.
 - Don't init util_avg and runnable_avg for !fair taks at fork time.

Changes in v2:
 - split task se depth maintenance into a separate patch3, suggested
   by Peter.
 - reorder patch6-7 before patch8-9, since we need update_load_avg()
   to do conditional attach/detach to avoid corner cases like twice
   attach problem.

Chengming Zhou (9):
  sched/fair: maintain task se depth in set_task_rq()
  sched/fair: remove redundant cpu_cgrp_subsys->fork()
  sched/fair: reset sched_avg last_update_time before set_task_rq()
  sched/fair: update comments in enqueue/dequeue_entity()
  sched/fair: combine detach into dequeue when migrating task
  sched/fair: fix another detach on unattached task corner case
  sched/fair: allow changing cgroup of new forked task
  sched/fair: defer task sched_avg attach to enqueue_entity()
  sched/fair: don't init util/runnable_avg for !fair task

 include/linux/sched.h |   5 +-
 kernel/sched/core.c   |  57 ++--------
 kernel/sched/fair.c   | 234 ++++++++++++++++++++----------------------
 kernel/sched/sched.h  |   6 +-
 4 files changed, 124 insertions(+), 178 deletions(-)

-- 
2.36.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 1/9] sched/fair: maintain task se depth in set_task_rq()
  2022-08-08 12:57 [PATCH v4 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
@ 2022-08-08 12:57 ` Chengming Zhou
  2022-08-08 12:57 ` [PATCH v4 2/9] sched/fair: remove redundant cpu_cgrp_subsys->fork() Chengming Zhou
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-08-08 12:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

Previously we only maintain task se depth in task_move_group_fair(),
if a !fair task change task group, its se depth will not be updated,
so commit eb7a59b2c888 ("sched/fair: Reset se-depth when task switched to FAIR")
fix the problem by updating se depth in switched_to_fair() too.

Then commit daa59407b558 ("sched/fair: Unify switched_{from,to}_fair()
and task_move_group_fair()") unified these two functions, moved se.depth
setting to attach_task_cfs_rq(), which further into attach_entity_cfs_rq()
with commit df217913e72e ("sched/fair: Factorize attach/detach entity").

This patch move task se depth maintenance from attach_entity_cfs_rq()
to set_task_rq(), which will be called when CPU/cgroup change, so its
depth will always be correct.

This patch is preparation for the next patch.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c  | 8 --------
 kernel/sched/sched.h | 1 +
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da388657d5ac..a3b0f8b1029e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11562,14 +11562,6 @@ static void attach_entity_cfs_rq(struct sched_entity *se)
 {
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
-#ifdef CONFIG_FAIR_GROUP_SCHED
-	/*
-	 * Since the real-depth could have been changed (only FAIR
-	 * class maintain depth value), reset depth properly.
-	 */
-	se->depth = se->parent ? se->parent->depth + 1 : 0;
-#endif
-
 	/* Synchronize entity with its cfs_rq */
 	update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
 	attach_entity_load_avg(cfs_rq, se);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3ccd35c22f0f..4c4822141026 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1930,6 +1930,7 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
 	set_task_rq_fair(&p->se, p->se.cfs_rq, tg->cfs_rq[cpu]);
 	p->se.cfs_rq = tg->cfs_rq[cpu];
 	p->se.parent = tg->se[cpu];
+	p->se.depth = tg->se[cpu] ? tg->se[cpu]->depth + 1 : 0;
 #endif
 
 #ifdef CONFIG_RT_GROUP_SCHED
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 2/9] sched/fair: remove redundant cpu_cgrp_subsys->fork()
  2022-08-08 12:57 [PATCH v4 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
  2022-08-08 12:57 ` [PATCH v4 1/9] sched/fair: maintain task se depth in set_task_rq() Chengming Zhou
@ 2022-08-08 12:57 ` Chengming Zhou
  2022-08-08 12:57 ` [PATCH v4 3/9] sched/fair: reset sched_avg last_update_time before set_task_rq() Chengming Zhou
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-08-08 12:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

We use cpu_cgrp_subsys->fork() to set task group for the new fair task
in cgroup_post_fork().

Since commit b1e8206582f9 ("sched: Fix yet more sched_fork() races")
has already set_task_rq() for the new fair task in sched_cgroup_fork(),
so cpu_cgrp_subsys->fork() can be removed.

  cgroup_can_fork()	--> pin parent's sched_task_group
  sched_cgroup_fork()
    __set_task_cpu()
      set_task_rq()
  cgroup_post_fork()
    ss->fork() := cpu_cgroup_fork()
      sched_change_group(..., TASK_SET_GROUP)
        task_set_group_fair()
          set_task_rq()  --> can be removed

After this patch's change, task_change_group_fair() only need to
care about task cgroup migration, make the code much simplier.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/core.c  | 27 ++++-----------------------
 kernel/sched/fair.c  | 23 +----------------------
 kernel/sched/sched.h |  5 +----
 3 files changed, 6 insertions(+), 49 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 64c08993221b..e74e79f783af 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -481,8 +481,7 @@ sched_core_dequeue(struct rq *rq, struct task_struct *p, int flags) { }
  *				p->se.load, p->rt_priority,
  *				p->dl.dl_{runtime, deadline, period, flags, bw, density}
  *  - sched_setnuma():		p->numa_preferred_nid
- *  - sched_move_task()/
- *    cpu_cgroup_fork():	p->sched_task_group
+ *  - sched_move_task():	p->sched_task_group
  *  - uclamp_update_active()	p->uclamp*
  *
  * p->state <- TASK_*:
@@ -10114,7 +10113,7 @@ void sched_release_group(struct task_group *tg)
 	spin_unlock_irqrestore(&task_group_lock, flags);
 }
 
-static void sched_change_group(struct task_struct *tsk, int type)
+static void sched_change_group(struct task_struct *tsk)
 {
 	struct task_group *tg;
 
@@ -10130,7 +10129,7 @@ static void sched_change_group(struct task_struct *tsk, int type)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	if (tsk->sched_class->task_change_group)
-		tsk->sched_class->task_change_group(tsk, type);
+		tsk->sched_class->task_change_group(tsk);
 	else
 #endif
 		set_task_rq(tsk, task_cpu(tsk));
@@ -10161,7 +10160,7 @@ void sched_move_task(struct task_struct *tsk)
 	if (running)
 		put_prev_task(rq, tsk);
 
-	sched_change_group(tsk, TASK_MOVE_GROUP);
+	sched_change_group(tsk);
 
 	if (queued)
 		enqueue_task(rq, tsk, queue_flags);
@@ -10239,23 +10238,6 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
 	sched_unregister_group(tg);
 }
 
-/*
- * This is called before wake_up_new_task(), therefore we really only
- * have to set its group bits, all the other stuff does not apply.
- */
-static void cpu_cgroup_fork(struct task_struct *task)
-{
-	struct rq_flags rf;
-	struct rq *rq;
-
-	rq = task_rq_lock(task, &rf);
-
-	update_rq_clock(rq);
-	sched_change_group(task, TASK_SET_GROUP);
-
-	task_rq_unlock(rq, task, &rf);
-}
-
 static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
 {
 	struct task_struct *task;
@@ -11121,7 +11103,6 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 	.css_released	= cpu_cgroup_css_released,
 	.css_free	= cpu_cgroup_css_free,
 	.css_extra_stat_show = cpu_extra_stat_show,
-	.fork		= cpu_cgroup_fork,
 	.can_attach	= cpu_cgroup_can_attach,
 	.attach		= cpu_cgroup_attach,
 	.legacy_cftypes	= cpu_legacy_files,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a3b0f8b1029e..2c0eb2a4e341 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11657,15 +11657,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-static void task_set_group_fair(struct task_struct *p)
-{
-	struct sched_entity *se = &p->se;
-
-	set_task_rq(p, task_cpu(p));
-	se->depth = se->parent ? se->parent->depth + 1 : 0;
-}
-
-static void task_move_group_fair(struct task_struct *p)
+static void task_change_group_fair(struct task_struct *p)
 {
 	detach_task_cfs_rq(p);
 	set_task_rq(p, task_cpu(p));
@@ -11677,19 +11669,6 @@ static void task_move_group_fair(struct task_struct *p)
 	attach_task_cfs_rq(p);
 }
 
-static void task_change_group_fair(struct task_struct *p, int type)
-{
-	switch (type) {
-	case TASK_SET_GROUP:
-		task_set_group_fair(p);
-		break;
-
-	case TASK_MOVE_GROUP:
-		task_move_group_fair(p);
-		break;
-	}
-}
-
 void free_fair_sched_group(struct task_group *tg)
 {
 	int i;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4c4822141026..74130a69d365 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2193,11 +2193,8 @@ struct sched_class {
 
 	void (*update_curr)(struct rq *rq);
 
-#define TASK_SET_GROUP		0
-#define TASK_MOVE_GROUP		1
-
 #ifdef CONFIG_FAIR_GROUP_SCHED
-	void (*task_change_group)(struct task_struct *p, int type);
+	void (*task_change_group)(struct task_struct *p);
 #endif
 };
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 3/9] sched/fair: reset sched_avg last_update_time before set_task_rq()
  2022-08-08 12:57 [PATCH v4 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
  2022-08-08 12:57 ` [PATCH v4 1/9] sched/fair: maintain task se depth in set_task_rq() Chengming Zhou
  2022-08-08 12:57 ` [PATCH v4 2/9] sched/fair: remove redundant cpu_cgrp_subsys->fork() Chengming Zhou
@ 2022-08-08 12:57 ` Chengming Zhou
  2022-08-08 12:57 ` [PATCH v4 4/9] sched/fair: update comments in enqueue/dequeue_entity() Chengming Zhou
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-08-08 12:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

set_task_rq() -> set_task_rq_fair() will try to synchronize the blocked
task's sched_avg when migrate, which is not needed for already detached
task.

task_change_group_fair() will detached the task sched_avg from prev cfs_rq
first, so reset sched_avg last_update_time before set_task_rq() to avoid that.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2c0eb2a4e341..e4c0929a6e71 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11660,12 +11660,12 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 static void task_change_group_fair(struct task_struct *p)
 {
 	detach_task_cfs_rq(p);
-	set_task_rq(p, task_cpu(p));
 
 #ifdef CONFIG_SMP
 	/* Tell se's cfs_rq has been changed -- migrated */
 	p->se.avg.last_update_time = 0;
 #endif
+	set_task_rq(p, task_cpu(p));
 	attach_task_cfs_rq(p);
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 4/9] sched/fair: update comments in enqueue/dequeue_entity()
  2022-08-08 12:57 [PATCH v4 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (2 preceding siblings ...)
  2022-08-08 12:57 ` [PATCH v4 3/9] sched/fair: reset sched_avg last_update_time before set_task_rq() Chengming Zhou
@ 2022-08-08 12:57 ` Chengming Zhou
  2022-08-08 12:57 ` [PATCH v4 5/9] sched/fair: combine detach into dequeue when migrating task Chengming Zhou
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-08-08 12:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

When reading the sched_avg related code, I found the comments in
enqueue/dequeue_entity() are not updated with the current code.

We don't add/subtract entity's runnable_avg from cfs_rq->runnable_avg
during enqueue/dequeue_entity(), those are done only for attach/detach.

This patch updates the comments to reflect the current code working.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e4c0929a6e71..52de8302b336 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4434,7 +4434,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	/*
 	 * When enqueuing a sched_entity, we must:
 	 *   - Update loads to have both entity and cfs_rq synced with now.
-	 *   - Add its load to cfs_rq->runnable_avg
+	 *   - For group_entity, update its runnable_weight to reflect the new
+	 *     h_nr_running of its group cfs_rq.
 	 *   - For group_entity, update its weight to reflect the new share of
 	 *     its group cfs_rq
 	 *   - Add its new weight to cfs_rq->load.weight
@@ -4519,7 +4520,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	/*
 	 * When dequeuing a sched_entity, we must:
 	 *   - Update loads to have both entity and cfs_rq synced with now.
-	 *   - Subtract its load from the cfs_rq->runnable_avg.
+	 *   - For group_entity, update its runnable_weight to reflect the new
+	 *     h_nr_running of its group cfs_rq.
 	 *   - Subtract its previous weight from cfs_rq->load.weight.
 	 *   - For group entity, update its weight to reflect the new share
 	 *     of its group cfs_rq.
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 5/9] sched/fair: combine detach into dequeue when migrating task
  2022-08-08 12:57 [PATCH v4 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (3 preceding siblings ...)
  2022-08-08 12:57 ` [PATCH v4 4/9] sched/fair: update comments in enqueue/dequeue_entity() Chengming Zhou
@ 2022-08-08 12:57 ` Chengming Zhou
  2022-08-08 12:57 ` [PATCH v4 6/9] sched/fair: fix another detach on unattached task corner case Chengming Zhou
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-08-08 12:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

When we are migrating task out of the CPU, we can combine detach and
propagation into dequeue_entity() to save the detach_entity_cfs_rq()
in migrate_task_rq_fair().

This optimization is like combining DO_ATTACH in the enqueue_entity()
when migrating task to the CPU. So we don't have to traverse the CFS tree
extra time to do the detach_entity_cfs_rq() -> propagate_entity_cfs_rq(),
which wouldn't be called anymore with this patch's change.

detach_task()
  deactivate_task()
    dequeue_task_fair()
      for_each_sched_entity(se)
        dequeue_entity()
          update_load_avg() /* (1) */
            detach_entity_load_avg()

  set_task_cpu()
    migrate_task_rq_fair()
      detach_entity_cfs_rq() /* (2) */
        update_load_avg();
        detach_entity_load_avg();
        propagate_entity_cfs_rq();
          for_each_sched_entity()
            update_load_avg()

This patch save the detach_entity_cfs_rq() called in (2) by doing
the detach_entity_load_avg() for a CPU migrating task inside (1)
(the task being the first se in the loop)

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 52de8302b336..f52e7dc7f22d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4003,6 +4003,7 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 #define UPDATE_TG	0x1
 #define SKIP_AGE_LOAD	0x2
 #define DO_ATTACH	0x4
+#define DO_DETACH	0x8
 
 /* Update task and its cfs_rq load average */
 static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
@@ -4032,6 +4033,13 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 		attach_entity_load_avg(cfs_rq, se);
 		update_tg_load_avg(cfs_rq);
 
+	} else if (flags & DO_DETACH) {
+		/*
+		 * DO_DETACH means we're here from dequeue_entity()
+		 * and we are migrating task out of the CPU.
+		 */
+		detach_entity_load_avg(cfs_rq, se);
+		update_tg_load_avg(cfs_rq);
 	} else if (decayed) {
 		cfs_rq_util_change(cfs_rq, 0);
 
@@ -4292,6 +4300,7 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
 #define UPDATE_TG	0x0
 #define SKIP_AGE_LOAD	0x0
 #define DO_ATTACH	0x0
+#define DO_DETACH	0x0
 
 static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int not_used1)
 {
@@ -4512,6 +4521,11 @@ static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq);
 static void
 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 {
+	int action = UPDATE_TG;
+
+	if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
+		action |= DO_DETACH;
+
 	/*
 	 * Update run-time statistics of the 'current'.
 	 */
@@ -4526,7 +4540,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 *   - For group entity, update its weight to reflect the new share
 	 *     of its group cfs_rq.
 	 */
-	update_load_avg(cfs_rq, se, UPDATE_TG);
+	update_load_avg(cfs_rq, se, action);
 	se_update_runnable(se);
 
 	update_stats_dequeue_fair(cfs_rq, se, flags);
@@ -7078,8 +7092,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 	return new_cpu;
 }
 
-static void detach_entity_cfs_rq(struct sched_entity *se);
-
 /*
  * Called immediately before a task is migrated to a new CPU; task_cpu(p) and
  * cfs_rq_of(p) references at time of call are still valid and identify the
@@ -7101,15 +7113,7 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
 		se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
 	}
 
-	if (p->on_rq == TASK_ON_RQ_MIGRATING) {
-		/*
-		 * In case of TASK_ON_RQ_MIGRATING we in fact hold the 'old'
-		 * rq->lock and can modify state directly.
-		 */
-		lockdep_assert_rq_held(task_rq(p));
-		detach_entity_cfs_rq(se);
-
-	} else {
+	if (!task_on_rq_migrating(p)) {
 		remove_entity_load_avg(se);
 
 		/*
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 6/9] sched/fair: fix another detach on unattached task corner case
  2022-08-08 12:57 [PATCH v4 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (4 preceding siblings ...)
  2022-08-08 12:57 ` [PATCH v4 5/9] sched/fair: combine detach into dequeue when migrating task Chengming Zhou
@ 2022-08-08 12:57 ` Chengming Zhou
  2022-08-17 15:01   ` Vincent Guittot
  2022-08-08 12:57 ` [PATCH v4 7/9] sched/fair: allow changing cgroup of new forked task Chengming Zhou
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Chengming Zhou @ 2022-08-08 12:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
fixed two load tracking problems for new task, including detach on
unattached new task problem.

There still left another detach on unattached task problem for the task
which has been woken up by try_to_wake_up() and waiting for actually
being woken up by sched_ttwu_pending().

try_to_wake_up(p)
  cpu = select_task_rq(p)
  if (task_cpu(p) != cpu)
    set_task_cpu(p, cpu)
      migrate_task_rq_fair()
        remove_entity_load_avg()       --> unattached
        se->avg.last_update_time = 0;
      __set_task_cpu()
  ttwu_queue(p, cpu)
    ttwu_queue_wakelist()
      __ttwu_queue_wakelist()

task_change_group_fair()
  detach_task_cfs_rq()
    detach_entity_cfs_rq()
      detach_entity_load_avg()   --> detach on unattached task
  set_task_rq()
  attach_task_cfs_rq()
    attach_entity_cfs_rq()
      attach_entity_load_avg()

The reason of this problem is similar, we should check in detach_entity_cfs_rq()
that se->avg.last_update_time != 0, before do detach_entity_load_avg().

This patch move detach/attach_entity_cfs_rq() functions up to be together
with other load tracking functions to avoid to use another #ifdef CONFIG_SMP.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/fair.c | 132 +++++++++++++++++++++++---------------------
 1 file changed, 68 insertions(+), 64 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f52e7dc7f22d..4bc76d95a99d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -874,9 +874,6 @@ void init_entity_runnable_average(struct sched_entity *se)
 void post_init_entity_util_avg(struct task_struct *p)
 {
 }
-static void update_tg_load_avg(struct cfs_rq *cfs_rq)
-{
-}
 #endif /* CONFIG_SMP */
 
 /*
@@ -3176,6 +3173,7 @@ void reweight_task(struct task_struct *p, int prio)
 	load->inv_weight = sched_prio_to_wmult[prio];
 }
 
+static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq);
 static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
@@ -4086,6 +4084,71 @@ static void remove_entity_load_avg(struct sched_entity *se)
 	raw_spin_unlock_irqrestore(&cfs_rq->removed.lock, flags);
 }
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
+/*
+ * Propagate the changes of the sched_entity across the tg tree to make it
+ * visible to the root
+ */
+static void propagate_entity_cfs_rq(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+	if (cfs_rq_throttled(cfs_rq))
+		return;
+
+	if (!throttled_hierarchy(cfs_rq))
+		list_add_leaf_cfs_rq(cfs_rq);
+
+	/* Start to propagate at parent */
+	se = se->parent;
+
+	for_each_sched_entity(se) {
+		cfs_rq = cfs_rq_of(se);
+
+		update_load_avg(cfs_rq, se, UPDATE_TG);
+
+		if (cfs_rq_throttled(cfs_rq))
+			break;
+
+		if (!throttled_hierarchy(cfs_rq))
+			list_add_leaf_cfs_rq(cfs_rq);
+	}
+}
+#else
+static void propagate_entity_cfs_rq(struct sched_entity *se) { }
+#endif
+
+static void detach_entity_cfs_rq(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+	/*
+	 * In case the task sched_avg hasn't been attached:
+	 * - A forked task which hasn't been woken up by wake_up_new_task().
+	 * - A task which has been woken up by try_to_wake_up() but is
+	 *   waiting for actually being woken up by sched_ttwu_pending().
+	 */
+	if (!se->avg.last_update_time)
+		return;
+
+	/* Catch up with the cfs_rq and remove our load when we leave */
+	update_load_avg(cfs_rq, se, 0);
+	detach_entity_load_avg(cfs_rq, se);
+	update_tg_load_avg(cfs_rq);
+	propagate_entity_cfs_rq(se);
+}
+
+static void attach_entity_cfs_rq(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+	/* Synchronize entity with its cfs_rq */
+	update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
+	attach_entity_load_avg(cfs_rq, se);
+	update_tg_load_avg(cfs_rq);
+	propagate_entity_cfs_rq(se);
+}
+
 static inline unsigned long cfs_rq_runnable_avg(struct cfs_rq *cfs_rq)
 {
 	return cfs_rq->avg.runnable_avg;
@@ -4308,11 +4371,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 }
 
 static inline void remove_entity_load_avg(struct sched_entity *se) {}
-
-static inline void
-attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
-static inline void
-detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
+static inline void detach_entity_cfs_rq(struct sched_entity *se) {}
+static inline void attach_entity_cfs_rq(struct sched_entity *se) {}
 
 static inline int newidle_balance(struct rq *rq, struct rq_flags *rf)
 {
@@ -11519,62 +11579,6 @@ static inline bool vruntime_normalized(struct task_struct *p)
 	return false;
 }
 
-#ifdef CONFIG_FAIR_GROUP_SCHED
-/*
- * Propagate the changes of the sched_entity across the tg tree to make it
- * visible to the root
- */
-static void propagate_entity_cfs_rq(struct sched_entity *se)
-{
-	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-
-	if (cfs_rq_throttled(cfs_rq))
-		return;
-
-	if (!throttled_hierarchy(cfs_rq))
-		list_add_leaf_cfs_rq(cfs_rq);
-
-	/* Start to propagate at parent */
-	se = se->parent;
-
-	for_each_sched_entity(se) {
-		cfs_rq = cfs_rq_of(se);
-
-		update_load_avg(cfs_rq, se, UPDATE_TG);
-
-		if (cfs_rq_throttled(cfs_rq))
-			break;
-
-		if (!throttled_hierarchy(cfs_rq))
-			list_add_leaf_cfs_rq(cfs_rq);
-	}
-}
-#else
-static void propagate_entity_cfs_rq(struct sched_entity *se) { }
-#endif
-
-static void detach_entity_cfs_rq(struct sched_entity *se)
-{
-	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-
-	/* Catch up with the cfs_rq and remove our load when we leave */
-	update_load_avg(cfs_rq, se, 0);
-	detach_entity_load_avg(cfs_rq, se);
-	update_tg_load_avg(cfs_rq);
-	propagate_entity_cfs_rq(se);
-}
-
-static void attach_entity_cfs_rq(struct sched_entity *se)
-{
-	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-
-	/* Synchronize entity with its cfs_rq */
-	update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
-	attach_entity_load_avg(cfs_rq, se);
-	update_tg_load_avg(cfs_rq);
-	propagate_entity_cfs_rq(se);
-}
-
 static void detach_task_cfs_rq(struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 7/9] sched/fair: allow changing cgroup of new forked task
  2022-08-08 12:57 [PATCH v4 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (5 preceding siblings ...)
  2022-08-08 12:57 ` [PATCH v4 6/9] sched/fair: fix another detach on unattached task corner case Chengming Zhou
@ 2022-08-08 12:57 ` Chengming Zhou
  2022-08-08 14:57   ` Chengming Zhou
                     ` (2 more replies)
  2022-08-08 12:57 ` [PATCH v4 8/9] sched/fair: defer task sched_avg attach to enqueue_entity() Chengming Zhou
  2022-08-08 12:57 ` [PATCH v4 9/9] sched/fair: don't init util/runnable_avg for !fair task Chengming Zhou
  8 siblings, 3 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-08-08 12:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
introduce a TASK_NEW state and an unnessary limitation that would fail
when changing cgroup of new forked task.

Because at that time, we can't handle task_change_group_fair() for new
forked fair task which hasn't been woken up by wake_up_new_task(),
which will cause detach on an unattached task sched_avg problem.

This patch delete this unnessary limitation by adding check before do
detach or attach in task_change_group_fair().

So cpu_cgrp_subsys.can_attach() has nothing to do for fair tasks,
only define it in #ifdef CONFIG_RT_GROUP_SCHED.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 include/linux/sched.h |  5 ++---
 kernel/sched/core.c   | 30 +++++++-----------------------
 kernel/sched/fair.c   |  7 +++++++
 3 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 88b8817b827d..b504e55bbf7a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -95,10 +95,9 @@ struct task_group;
 #define TASK_WAKEKILL			0x0100
 #define TASK_WAKING			0x0200
 #define TASK_NOLOAD			0x0400
-#define TASK_NEW			0x0800
 /* RT specific auxilliary flag to mark RT lock waiters */
-#define TASK_RTLOCK_WAIT		0x1000
-#define TASK_STATE_MAX			0x2000
+#define TASK_RTLOCK_WAIT		0x0800
+#define TASK_STATE_MAX			0x1000
 
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e74e79f783af..d5faa1700bd7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4500,11 +4500,11 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 {
 	__sched_fork(clone_flags, p);
 	/*
-	 * We mark the process as NEW here. This guarantees that
+	 * We mark the process as running here. This guarantees that
 	 * nobody will actually run it, and a signal or other external
 	 * event cannot wake it up and insert it on the runqueue either.
 	 */
-	p->__state = TASK_NEW;
+	p->__state = TASK_RUNNING;
 
 	/*
 	 * Make sure we do not leak PI boosting priority to the child.
@@ -4622,7 +4622,6 @@ void wake_up_new_task(struct task_struct *p)
 	struct rq *rq;
 
 	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
-	WRITE_ONCE(p->__state, TASK_RUNNING);
 #ifdef CONFIG_SMP
 	/*
 	 * Fork balancing, do it here and not earlier because:
@@ -10238,36 +10237,19 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
 	sched_unregister_group(tg);
 }
 
+#ifdef CONFIG_RT_GROUP_SCHED
 static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
 {
 	struct task_struct *task;
 	struct cgroup_subsys_state *css;
-	int ret = 0;
 
 	cgroup_taskset_for_each(task, css, tset) {
-#ifdef CONFIG_RT_GROUP_SCHED
 		if (!sched_rt_can_attach(css_tg(css), task))
 			return -EINVAL;
-#endif
-		/*
-		 * Serialize against wake_up_new_task() such that if it's
-		 * running, we're sure to observe its full state.
-		 */
-		raw_spin_lock_irq(&task->pi_lock);
-		/*
-		 * Avoid calling sched_move_task() before wake_up_new_task()
-		 * has happened. This would lead to problems with PELT, due to
-		 * move wanting to detach+attach while we're not attached yet.
-		 */
-		if (READ_ONCE(task->__state) == TASK_NEW)
-			ret = -EINVAL;
-		raw_spin_unlock_irq(&task->pi_lock);
-
-		if (ret)
-			break;
 	}
-	return ret;
+	return 0;
 }
+#endif
 
 static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 {
@@ -11103,7 +11085,9 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 	.css_released	= cpu_cgroup_css_released,
 	.css_free	= cpu_cgroup_css_free,
 	.css_extra_stat_show = cpu_extra_stat_show,
+#ifdef CONFIG_RT_GROUP_SCHED
 	.can_attach	= cpu_cgroup_can_attach,
+#endif
 	.attach		= cpu_cgroup_attach,
 	.legacy_cftypes	= cpu_legacy_files,
 	.dfl_cftypes	= cpu_files,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4bc76d95a99d..90aba33a3780 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11669,6 +11669,13 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 #ifdef CONFIG_FAIR_GROUP_SCHED
 static void task_change_group_fair(struct task_struct *p)
 {
+	/*
+	 * We couldn't detach or attach a forked task which
+	 * hasn't been woken up by wake_up_new_task().
+	 */
+	if (!p->on_rq && !se->sum_exec_runtime)
+		return;
+
 	detach_task_cfs_rq(p);
 
 #ifdef CONFIG_SMP
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 8/9] sched/fair: defer task sched_avg attach to enqueue_entity()
  2022-08-08 12:57 [PATCH v4 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (6 preceding siblings ...)
  2022-08-08 12:57 ` [PATCH v4 7/9] sched/fair: allow changing cgroup of new forked task Chengming Zhou
@ 2022-08-08 12:57 ` Chengming Zhou
  2022-08-08 12:57 ` [PATCH v4 9/9] sched/fair: don't init util/runnable_avg for !fair task Chengming Zhou
  8 siblings, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-08-08 12:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

When wake_up_new_task(), we would use post_init_entity_util_avg()
to init util_avg/runnable_avg based on cpu's util_avg at that time,
then attach task sched_avg to cfs_rq.

Since enqueue_entity() would always attach any unattached task entity,
so we can defer this work to enqueue_entity().

post_init_entity_util_avg(p)
  attach_entity_cfs_rq()  --> (1)
activate_task(rq, p)
  enqueue_task() := enqueue_task_fair()
  enqueue_entity()
    update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH)
      if (!se->avg.last_update_time && (flags & DO_ATTACH))
        attach_entity_load_avg()  --> (2)

This patch defer attach from (1) to (2)

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/fair.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 90aba33a3780..2063e30b2a8f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -799,8 +799,6 @@ void init_entity_runnable_average(struct sched_entity *se)
 	/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
 }
 
-static void attach_entity_cfs_rq(struct sched_entity *se);
-
 /*
  * With new tasks being created, their initial util_avgs are extrapolated
  * based on the cfs_rq's current util_avg:
@@ -863,8 +861,6 @@ void post_init_entity_util_avg(struct task_struct *p)
 		se->avg.last_update_time = cfs_rq_clock_pelt(cfs_rq);
 		return;
 	}
-
-	attach_entity_cfs_rq(se);
 }
 
 #else /* !CONFIG_SMP */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 9/9] sched/fair: don't init util/runnable_avg for !fair task
  2022-08-08 12:57 [PATCH v4 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (7 preceding siblings ...)
  2022-08-08 12:57 ` [PATCH v4 8/9] sched/fair: defer task sched_avg attach to enqueue_entity() Chengming Zhou
@ 2022-08-08 12:57 ` Chengming Zhou
  8 siblings, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-08-08 12:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: linux-kernel, Chengming Zhou

post_init_entity_util_avg() init task util_avg according to the cpu util_avg
at the time of fork, which will decay when switched_to_fair() some time later,
we'd better to not set them at all in the case of !fair task.

Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/fair.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2063e30b2a8f..082174cb0e47 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -833,20 +833,6 @@ void post_init_entity_util_avg(struct task_struct *p)
 	long cpu_scale = arch_scale_cpu_capacity(cpu_of(rq_of(cfs_rq)));
 	long cap = (long)(cpu_scale - cfs_rq->avg.util_avg) / 2;
 
-	if (cap > 0) {
-		if (cfs_rq->avg.util_avg != 0) {
-			sa->util_avg  = cfs_rq->avg.util_avg * se->load.weight;
-			sa->util_avg /= (cfs_rq->avg.load_avg + 1);
-
-			if (sa->util_avg > cap)
-				sa->util_avg = cap;
-		} else {
-			sa->util_avg = cap;
-		}
-	}
-
-	sa->runnable_avg = sa->util_avg;
-
 	if (p->sched_class != &fair_sched_class) {
 		/*
 		 * For !fair tasks do:
@@ -861,6 +847,20 @@ void post_init_entity_util_avg(struct task_struct *p)
 		se->avg.last_update_time = cfs_rq_clock_pelt(cfs_rq);
 		return;
 	}
+
+	if (cap > 0) {
+		if (cfs_rq->avg.util_avg != 0) {
+			sa->util_avg  = cfs_rq->avg.util_avg * se->load.weight;
+			sa->util_avg /= (cfs_rq->avg.load_avg + 1);
+
+			if (sa->util_avg > cap)
+				sa->util_avg = cap;
+		} else {
+			sa->util_avg = cap;
+		}
+	}
+
+	sa->runnable_avg = sa->util_avg;
 }
 
 #else /* !CONFIG_SMP */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 7/9] sched/fair: allow changing cgroup of new forked task
  2022-08-08 12:57 ` [PATCH v4 7/9] sched/fair: allow changing cgroup of new forked task Chengming Zhou
@ 2022-08-08 14:57   ` Chengming Zhou
  2022-08-08 16:42   ` kernel test robot
  2022-08-15 21:11   ` Tejun Heo
  2 siblings, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-08-08 14:57 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid
  Cc: linux-kernel

On 2022/8/8 20:57, Chengming Zhou wrote:
> commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
> introduce a TASK_NEW state and an unnessary limitation that would fail
> when changing cgroup of new forked task.
> 
> Because at that time, we can't handle task_change_group_fair() for new
> forked fair task which hasn't been woken up by wake_up_new_task(),
> which will cause detach on an unattached task sched_avg problem.
> 
> This patch delete this unnessary limitation by adding check before do
> detach or attach in task_change_group_fair().
> 
> So cpu_cgrp_subsys.can_attach() has nothing to do for fair tasks,
> only define it in #ifdef CONFIG_RT_GROUP_SCHED.
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  include/linux/sched.h |  5 ++---
>  kernel/sched/core.c   | 30 +++++++-----------------------
>  kernel/sched/fair.c   |  7 +++++++
>  3 files changed, 16 insertions(+), 26 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 88b8817b827d..b504e55bbf7a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -95,10 +95,9 @@ struct task_group;
>  #define TASK_WAKEKILL			0x0100
>  #define TASK_WAKING			0x0200
>  #define TASK_NOLOAD			0x0400
> -#define TASK_NEW			0x0800
>  /* RT specific auxilliary flag to mark RT lock waiters */
> -#define TASK_RTLOCK_WAIT		0x1000
> -#define TASK_STATE_MAX			0x2000
> +#define TASK_RTLOCK_WAIT		0x0800
> +#define TASK_STATE_MAX			0x1000
>  
>  /* Convenience macros for the sake of set_current_state: */
>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e74e79f783af..d5faa1700bd7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4500,11 +4500,11 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
>  {
>  	__sched_fork(clone_flags, p);
>  	/*
> -	 * We mark the process as NEW here. This guarantees that
> +	 * We mark the process as running here. This guarantees that
>  	 * nobody will actually run it, and a signal or other external
>  	 * event cannot wake it up and insert it on the runqueue either.
>  	 */
> -	p->__state = TASK_NEW;
> +	p->__state = TASK_RUNNING;
>  
>  	/*
>  	 * Make sure we do not leak PI boosting priority to the child.
> @@ -4622,7 +4622,6 @@ void wake_up_new_task(struct task_struct *p)
>  	struct rq *rq;
>  
>  	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
> -	WRITE_ONCE(p->__state, TASK_RUNNING);
>  #ifdef CONFIG_SMP
>  	/*
>  	 * Fork balancing, do it here and not earlier because:
> @@ -10238,36 +10237,19 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
>  	sched_unregister_group(tg);
>  }
>  
> +#ifdef CONFIG_RT_GROUP_SCHED
>  static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
>  {
>  	struct task_struct *task;
>  	struct cgroup_subsys_state *css;
> -	int ret = 0;
>  
>  	cgroup_taskset_for_each(task, css, tset) {
> -#ifdef CONFIG_RT_GROUP_SCHED
>  		if (!sched_rt_can_attach(css_tg(css), task))
>  			return -EINVAL;
> -#endif
> -		/*
> -		 * Serialize against wake_up_new_task() such that if it's
> -		 * running, we're sure to observe its full state.
> -		 */
> -		raw_spin_lock_irq(&task->pi_lock);
> -		/*
> -		 * Avoid calling sched_move_task() before wake_up_new_task()
> -		 * has happened. This would lead to problems with PELT, due to
> -		 * move wanting to detach+attach while we're not attached yet.
> -		 */
> -		if (READ_ONCE(task->__state) == TASK_NEW)
> -			ret = -EINVAL;
> -		raw_spin_unlock_irq(&task->pi_lock);
> -
> -		if (ret)
> -			break;
>  	}
> -	return ret;
> +	return 0;
>  }
> +#endif
>  
>  static void cpu_cgroup_attach(struct cgroup_taskset *tset)
>  {
> @@ -11103,7 +11085,9 @@ struct cgroup_subsys cpu_cgrp_subsys = {
>  	.css_released	= cpu_cgroup_css_released,
>  	.css_free	= cpu_cgroup_css_free,
>  	.css_extra_stat_show = cpu_extra_stat_show,
> +#ifdef CONFIG_RT_GROUP_SCHED
>  	.can_attach	= cpu_cgroup_can_attach,
> +#endif
>  	.attach		= cpu_cgroup_attach,
>  	.legacy_cftypes	= cpu_legacy_files,
>  	.dfl_cftypes	= cpu_files,
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4bc76d95a99d..90aba33a3780 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11669,6 +11669,13 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  static void task_change_group_fair(struct task_struct *p)
>  {
> +	/*
> +	 * We couldn't detach or attach a forked task which
> +	 * hasn't been woken up by wake_up_new_task().
> +	 */
> +	if (!p->on_rq && !se->sum_exec_runtime)

should be: if (!p->on_rq && !p->se.sum_exec_runtime)
sorry for my carelessness...


> +		return;
> +
>  	detach_task_cfs_rq(p);
>  
>  #ifdef CONFIG_SMP

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 7/9] sched/fair: allow changing cgroup of new forked task
  2022-08-08 12:57 ` [PATCH v4 7/9] sched/fair: allow changing cgroup of new forked task Chengming Zhou
  2022-08-08 14:57   ` Chengming Zhou
@ 2022-08-08 16:42   ` kernel test robot
  2022-08-15 21:11   ` Tejun Heo
  2 siblings, 0 replies; 18+ messages in thread
From: kernel test robot @ 2022-08-08 16:42 UTC (permalink / raw)
  To: Chengming Zhou, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, vschneid
  Cc: kbuild-all, linux-kernel, Chengming Zhou

Hi Chengming,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on linus/master next-20220808]
[cannot apply to v5.19]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Chengming-Zhou/sched-fair-task-load-tracking-optimization-and-cleanup/20220808-210012
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 8648f92a66a323ed01903d2cbb248cdbe2f312d9
config: um-i386_defconfig (https://download.01.org/0day-ci/archive/20220809/202208090027.Lo1M3CoX-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/05baf61c579ea60e2b6447a012edcc5bf5f43835
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Chengming-Zhou/sched-fair-task-load-tracking-optimization-and-cleanup/20220808-210012
        git checkout 05baf61c579ea60e2b6447a012edcc5bf5f43835
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=um SUBARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/sched/fair.c:672:5: warning: no previous prototype for 'sched_update_scaling' [-Wmissing-prototypes]
     672 | int sched_update_scaling(void)
         |     ^~~~~~~~~~~~~~~~~~~~
   kernel/sched/fair.c: In function 'task_change_group_fair':
>> kernel/sched/fair.c:11676:27: error: 'se' undeclared (first use in this function); did you mean 'sem'?
   11676 |         if (!p->on_rq && !se->sum_exec_runtime)
         |                           ^~
         |                           sem
   kernel/sched/fair.c:11676:27: note: each undeclared identifier is reported only once for each function it appears in


vim +11676 kernel/sched/fair.c

 11668	
 11669	#ifdef CONFIG_FAIR_GROUP_SCHED
 11670	static void task_change_group_fair(struct task_struct *p)
 11671	{
 11672		/*
 11673		 * We couldn't detach or attach a forked task which
 11674		 * hasn't been woken up by wake_up_new_task().
 11675		 */
 11676		if (!p->on_rq && !se->sum_exec_runtime)
 11677			return;
 11678	
 11679		detach_task_cfs_rq(p);
 11680	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 7/9] sched/fair: allow changing cgroup of new forked task
  2022-08-08 12:57 ` [PATCH v4 7/9] sched/fair: allow changing cgroup of new forked task Chengming Zhou
  2022-08-08 14:57   ` Chengming Zhou
  2022-08-08 16:42   ` kernel test robot
@ 2022-08-15 21:11   ` Tejun Heo
  2022-08-16 13:14     ` Chengming Zhou
  2 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2022-08-15 21:11 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, vschneid, linux-kernel

On Mon, Aug 08, 2022 at 08:57:43PM +0800, Chengming Zhou wrote:
> commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
> introduce a TASK_NEW state and an unnessary limitation that would fail
> when changing cgroup of new forked task.
> 
> Because at that time, we can't handle task_change_group_fair() for new
> forked fair task which hasn't been woken up by wake_up_new_task(),
> which will cause detach on an unattached task sched_avg problem.
> 
> This patch delete this unnessary limitation by adding check before do
> detach or attach in task_change_group_fair().
> 
> So cpu_cgrp_subsys.can_attach() has nothing to do for fair tasks,
> only define it in #ifdef CONFIG_RT_GROUP_SCHED.
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

I don't know cfs enough to review this but it'd be really great to remove
this restriction.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 7/9] sched/fair: allow changing cgroup of new forked task
  2022-08-15 21:11   ` Tejun Heo
@ 2022-08-16 13:14     ` Chengming Zhou
  0 siblings, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-08-16 13:14 UTC (permalink / raw)
  To: Tejun Heo, peterz, vincent.guittot, dietmar.eggemann
  Cc: mingo, juri.lelli, rostedt, bsegall, vschneid, linux-kernel

On 2022/8/16 05:11, Tejun Heo wrote:
> On Mon, Aug 08, 2022 at 08:57:43PM +0800, Chengming Zhou wrote:
>> commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
>> introduce a TASK_NEW state and an unnessary limitation that would fail
>> when changing cgroup of new forked task.
>>
>> Because at that time, we can't handle task_change_group_fair() for new
>> forked fair task which hasn't been woken up by wake_up_new_task(),
>> which will cause detach on an unattached task sched_avg problem.
>>
>> This patch delete this unnessary limitation by adding check before do
>> detach or attach in task_change_group_fair().
>>
>> So cpu_cgrp_subsys.can_attach() has nothing to do for fair tasks,
>> only define it in #ifdef CONFIG_RT_GROUP_SCHED.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> 
> I don't know cfs enough to review this but it'd be really great to remove
> this restriction.

Thanks for your reply!

Friendly ping :-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 6/9] sched/fair: fix another detach on unattached task corner case
  2022-08-08 12:57 ` [PATCH v4 6/9] sched/fair: fix another detach on unattached task corner case Chengming Zhou
@ 2022-08-17 15:01   ` Vincent Guittot
  2022-08-17 15:04     ` Chengming Zhou
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Guittot @ 2022-08-17 15:01 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	vschneid, linux-kernel

On Mon, 8 Aug 2022 at 14:58, Chengming Zhou <zhouchengming@bytedance.com> wrote:
>
> commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
> fixed two load tracking problems for new task, including detach on
> unattached new task problem.
>
> There still left another detach on unattached task problem for the task
> which has been woken up by try_to_wake_up() and waiting for actually
> being woken up by sched_ttwu_pending().
>
> try_to_wake_up(p)
>   cpu = select_task_rq(p)
>   if (task_cpu(p) != cpu)
>     set_task_cpu(p, cpu)
>       migrate_task_rq_fair()
>         remove_entity_load_avg()       --> unattached
>         se->avg.last_update_time = 0;
>       __set_task_cpu()
>   ttwu_queue(p, cpu)
>     ttwu_queue_wakelist()
>       __ttwu_queue_wakelist()
>
> task_change_group_fair()
>   detach_task_cfs_rq()
>     detach_entity_cfs_rq()
>       detach_entity_load_avg()   --> detach on unattached task
>   set_task_rq()
>   attach_task_cfs_rq()
>     attach_entity_cfs_rq()
>       attach_entity_load_avg()
>
> The reason of this problem is similar, we should check in detach_entity_cfs_rq()
> that se->avg.last_update_time != 0, before do detach_entity_load_avg().
>
> This patch move detach/attach_entity_cfs_rq() functions up to be together
> with other load tracking functions to avoid to use another #ifdef CONFIG_SMP.
>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  kernel/sched/fair.c | 132 +++++++++++++++++++++++---------------------
>  1 file changed, 68 insertions(+), 64 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f52e7dc7f22d..4bc76d95a99d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -874,9 +874,6 @@ void init_entity_runnable_average(struct sched_entity *se)
>  void post_init_entity_util_avg(struct task_struct *p)
>  {
>  }
> -static void update_tg_load_avg(struct cfs_rq *cfs_rq)
> -{
> -}
>  #endif /* CONFIG_SMP */
>
>  /*
> @@ -3176,6 +3173,7 @@ void reweight_task(struct task_struct *p, int prio)
>         load->inv_weight = sched_prio_to_wmult[prio];
>  }
>
> +static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq);
>  static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
>
>  #ifdef CONFIG_FAIR_GROUP_SCHED
> @@ -4086,6 +4084,71 @@ static void remove_entity_load_avg(struct sched_entity *se)
>         raw_spin_unlock_irqrestore(&cfs_rq->removed.lock, flags);
>  }
>
> +#ifdef CONFIG_FAIR_GROUP_SCHED
> +/*
> + * Propagate the changes of the sched_entity across the tg tree to make it
> + * visible to the root
> + */
> +static void propagate_entity_cfs_rq(struct sched_entity *se)
> +{
> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> +
> +       if (cfs_rq_throttled(cfs_rq))
> +               return;
> +
> +       if (!throttled_hierarchy(cfs_rq))
> +               list_add_leaf_cfs_rq(cfs_rq);
> +
> +       /* Start to propagate at parent */
> +       se = se->parent;
> +
> +       for_each_sched_entity(se) {
> +               cfs_rq = cfs_rq_of(se);
> +
> +               update_load_avg(cfs_rq, se, UPDATE_TG);
> +
> +               if (cfs_rq_throttled(cfs_rq))
> +                       break;
> +
> +               if (!throttled_hierarchy(cfs_rq))
> +                       list_add_leaf_cfs_rq(cfs_rq);
> +       }
> +}
> +#else
> +static void propagate_entity_cfs_rq(struct sched_entity *se) { }
> +#endif
> +
> +static void detach_entity_cfs_rq(struct sched_entity *se)
> +{
> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> +
> +       /*
> +        * In case the task sched_avg hasn't been attached:
> +        * - A forked task which hasn't been woken up by wake_up_new_task().
> +        * - A task which has been woken up by try_to_wake_up() but is
> +        *   waiting for actually being woken up by sched_ttwu_pending().
> +        */
> +       if (!se->avg.last_update_time)
> +               return;

The 2 lines above and the associated comment are the only relevant
part of the patch, aren't they ?
Is everything else just code moving from one place to another one
without change ?

> +
> +       /* Catch up with the cfs_rq and remove our load when we leave */
> +       update_load_avg(cfs_rq, se, 0);
> +       detach_entity_load_avg(cfs_rq, se);
> +       update_tg_load_avg(cfs_rq);
> +       propagate_entity_cfs_rq(se);
> +}
> +
> +static void attach_entity_cfs_rq(struct sched_entity *se)
> +{
> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> +
> +       /* Synchronize entity with its cfs_rq */
> +       update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
> +       attach_entity_load_avg(cfs_rq, se);
> +       update_tg_load_avg(cfs_rq);
> +       propagate_entity_cfs_rq(se);
> +}
> +
>  static inline unsigned long cfs_rq_runnable_avg(struct cfs_rq *cfs_rq)
>  {
>         return cfs_rq->avg.runnable_avg;
> @@ -4308,11 +4371,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>  }
>
>  static inline void remove_entity_load_avg(struct sched_entity *se) {}
> -
> -static inline void
> -attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
> -static inline void
> -detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
> +static inline void detach_entity_cfs_rq(struct sched_entity *se) {}
> +static inline void attach_entity_cfs_rq(struct sched_entity *se) {}
>
>  static inline int newidle_balance(struct rq *rq, struct rq_flags *rf)
>  {
> @@ -11519,62 +11579,6 @@ static inline bool vruntime_normalized(struct task_struct *p)
>         return false;
>  }
>
> -#ifdef CONFIG_FAIR_GROUP_SCHED
> -/*
> - * Propagate the changes of the sched_entity across the tg tree to make it
> - * visible to the root
> - */
> -static void propagate_entity_cfs_rq(struct sched_entity *se)
> -{
> -       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> -
> -       if (cfs_rq_throttled(cfs_rq))
> -               return;
> -
> -       if (!throttled_hierarchy(cfs_rq))
> -               list_add_leaf_cfs_rq(cfs_rq);
> -
> -       /* Start to propagate at parent */
> -       se = se->parent;
> -
> -       for_each_sched_entity(se) {
> -               cfs_rq = cfs_rq_of(se);
> -
> -               update_load_avg(cfs_rq, se, UPDATE_TG);
> -
> -               if (cfs_rq_throttled(cfs_rq))
> -                       break;
> -
> -               if (!throttled_hierarchy(cfs_rq))
> -                       list_add_leaf_cfs_rq(cfs_rq);
> -       }
> -}
> -#else
> -static void propagate_entity_cfs_rq(struct sched_entity *se) { }
> -#endif
> -
> -static void detach_entity_cfs_rq(struct sched_entity *se)
> -{
> -       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> -
> -       /* Catch up with the cfs_rq and remove our load when we leave */
> -       update_load_avg(cfs_rq, se, 0);
> -       detach_entity_load_avg(cfs_rq, se);
> -       update_tg_load_avg(cfs_rq);
> -       propagate_entity_cfs_rq(se);
> -}
> -
> -static void attach_entity_cfs_rq(struct sched_entity *se)
> -{
> -       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> -
> -       /* Synchronize entity with its cfs_rq */
> -       update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
> -       attach_entity_load_avg(cfs_rq, se);
> -       update_tg_load_avg(cfs_rq);
> -       propagate_entity_cfs_rq(se);
> -}
> -
>  static void detach_task_cfs_rq(struct task_struct *p)
>  {
>         struct sched_entity *se = &p->se;
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 6/9] sched/fair: fix another detach on unattached task corner case
  2022-08-17 15:01   ` Vincent Guittot
@ 2022-08-17 15:04     ` Chengming Zhou
  2022-08-17 15:08       ` Vincent Guittot
  0 siblings, 1 reply; 18+ messages in thread
From: Chengming Zhou @ 2022-08-17 15:04 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	vschneid, linux-kernel

On 2022/8/17 23:01, Vincent Guittot wrote:
> On Mon, 8 Aug 2022 at 14:58, Chengming Zhou <zhouchengming@bytedance.com> wrote:
>>
>> commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
>> fixed two load tracking problems for new task, including detach on
>> unattached new task problem.
>>
>> There still left another detach on unattached task problem for the task
>> which has been woken up by try_to_wake_up() and waiting for actually
>> being woken up by sched_ttwu_pending().
>>
>> try_to_wake_up(p)
>>   cpu = select_task_rq(p)
>>   if (task_cpu(p) != cpu)
>>     set_task_cpu(p, cpu)
>>       migrate_task_rq_fair()
>>         remove_entity_load_avg()       --> unattached
>>         se->avg.last_update_time = 0;
>>       __set_task_cpu()
>>   ttwu_queue(p, cpu)
>>     ttwu_queue_wakelist()
>>       __ttwu_queue_wakelist()
>>
>> task_change_group_fair()
>>   detach_task_cfs_rq()
>>     detach_entity_cfs_rq()
>>       detach_entity_load_avg()   --> detach on unattached task
>>   set_task_rq()
>>   attach_task_cfs_rq()
>>     attach_entity_cfs_rq()
>>       attach_entity_load_avg()
>>
>> The reason of this problem is similar, we should check in detach_entity_cfs_rq()
>> that se->avg.last_update_time != 0, before do detach_entity_load_avg().
>>
>> This patch move detach/attach_entity_cfs_rq() functions up to be together
>> with other load tracking functions to avoid to use another #ifdef CONFIG_SMP.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>> ---
>>  kernel/sched/fair.c | 132 +++++++++++++++++++++++---------------------
>>  1 file changed, 68 insertions(+), 64 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index f52e7dc7f22d..4bc76d95a99d 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -874,9 +874,6 @@ void init_entity_runnable_average(struct sched_entity *se)
>>  void post_init_entity_util_avg(struct task_struct *p)
>>  {
>>  }
>> -static void update_tg_load_avg(struct cfs_rq *cfs_rq)
>> -{
>> -}
>>  #endif /* CONFIG_SMP */
>>
>>  /*
>> @@ -3176,6 +3173,7 @@ void reweight_task(struct task_struct *p, int prio)
>>         load->inv_weight = sched_prio_to_wmult[prio];
>>  }
>>
>> +static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq);
>>  static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
>>
>>  #ifdef CONFIG_FAIR_GROUP_SCHED
>> @@ -4086,6 +4084,71 @@ static void remove_entity_load_avg(struct sched_entity *se)
>>         raw_spin_unlock_irqrestore(&cfs_rq->removed.lock, flags);
>>  }
>>
>> +#ifdef CONFIG_FAIR_GROUP_SCHED
>> +/*
>> + * Propagate the changes of the sched_entity across the tg tree to make it
>> + * visible to the root
>> + */
>> +static void propagate_entity_cfs_rq(struct sched_entity *se)
>> +{
>> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> +
>> +       if (cfs_rq_throttled(cfs_rq))
>> +               return;
>> +
>> +       if (!throttled_hierarchy(cfs_rq))
>> +               list_add_leaf_cfs_rq(cfs_rq);
>> +
>> +       /* Start to propagate at parent */
>> +       se = se->parent;
>> +
>> +       for_each_sched_entity(se) {
>> +               cfs_rq = cfs_rq_of(se);
>> +
>> +               update_load_avg(cfs_rq, se, UPDATE_TG);
>> +
>> +               if (cfs_rq_throttled(cfs_rq))
>> +                       break;
>> +
>> +               if (!throttled_hierarchy(cfs_rq))
>> +                       list_add_leaf_cfs_rq(cfs_rq);
>> +       }
>> +}
>> +#else
>> +static void propagate_entity_cfs_rq(struct sched_entity *se) { }
>> +#endif
>> +
>> +static void detach_entity_cfs_rq(struct sched_entity *se)
>> +{
>> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> +
>> +       /*
>> +        * In case the task sched_avg hasn't been attached:
>> +        * - A forked task which hasn't been woken up by wake_up_new_task().
>> +        * - A task which has been woken up by try_to_wake_up() but is
>> +        *   waiting for actually being woken up by sched_ttwu_pending().
>> +        */
>> +       if (!se->avg.last_update_time)
>> +               return;
> 
> The 2 lines above and the associated comment are the only relevant
> part of the patch, aren't they ?
> Is everything else just code moving from one place to another one
> without change ?

Yes, everything else is just code movement.

Thanks!

> 
>> +
>> +       /* Catch up with the cfs_rq and remove our load when we leave */
>> +       update_load_avg(cfs_rq, se, 0);
>> +       detach_entity_load_avg(cfs_rq, se);
>> +       update_tg_load_avg(cfs_rq);
>> +       propagate_entity_cfs_rq(se);
>> +}
>> +
>> +static void attach_entity_cfs_rq(struct sched_entity *se)
>> +{
>> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> +
>> +       /* Synchronize entity with its cfs_rq */
>> +       update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
>> +       attach_entity_load_avg(cfs_rq, se);
>> +       update_tg_load_avg(cfs_rq);
>> +       propagate_entity_cfs_rq(se);
>> +}
>> +
>>  static inline unsigned long cfs_rq_runnable_avg(struct cfs_rq *cfs_rq)
>>  {
>>         return cfs_rq->avg.runnable_avg;
>> @@ -4308,11 +4371,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>>  }
>>
>>  static inline void remove_entity_load_avg(struct sched_entity *se) {}
>> -
>> -static inline void
>> -attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
>> -static inline void
>> -detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
>> +static inline void detach_entity_cfs_rq(struct sched_entity *se) {}
>> +static inline void attach_entity_cfs_rq(struct sched_entity *se) {}
>>
>>  static inline int newidle_balance(struct rq *rq, struct rq_flags *rf)
>>  {
>> @@ -11519,62 +11579,6 @@ static inline bool vruntime_normalized(struct task_struct *p)
>>         return false;
>>  }
>>
>> -#ifdef CONFIG_FAIR_GROUP_SCHED
>> -/*
>> - * Propagate the changes of the sched_entity across the tg tree to make it
>> - * visible to the root
>> - */
>> -static void propagate_entity_cfs_rq(struct sched_entity *se)
>> -{
>> -       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> -
>> -       if (cfs_rq_throttled(cfs_rq))
>> -               return;
>> -
>> -       if (!throttled_hierarchy(cfs_rq))
>> -               list_add_leaf_cfs_rq(cfs_rq);
>> -
>> -       /* Start to propagate at parent */
>> -       se = se->parent;
>> -
>> -       for_each_sched_entity(se) {
>> -               cfs_rq = cfs_rq_of(se);
>> -
>> -               update_load_avg(cfs_rq, se, UPDATE_TG);
>> -
>> -               if (cfs_rq_throttled(cfs_rq))
>> -                       break;
>> -
>> -               if (!throttled_hierarchy(cfs_rq))
>> -                       list_add_leaf_cfs_rq(cfs_rq);
>> -       }
>> -}
>> -#else
>> -static void propagate_entity_cfs_rq(struct sched_entity *se) { }
>> -#endif
>> -
>> -static void detach_entity_cfs_rq(struct sched_entity *se)
>> -{
>> -       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> -
>> -       /* Catch up with the cfs_rq and remove our load when we leave */
>> -       update_load_avg(cfs_rq, se, 0);
>> -       detach_entity_load_avg(cfs_rq, se);
>> -       update_tg_load_avg(cfs_rq);
>> -       propagate_entity_cfs_rq(se);
>> -}
>> -
>> -static void attach_entity_cfs_rq(struct sched_entity *se)
>> -{
>> -       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> -
>> -       /* Synchronize entity with its cfs_rq */
>> -       update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
>> -       attach_entity_load_avg(cfs_rq, se);
>> -       update_tg_load_avg(cfs_rq);
>> -       propagate_entity_cfs_rq(se);
>> -}
>> -
>>  static void detach_task_cfs_rq(struct task_struct *p)
>>  {
>>         struct sched_entity *se = &p->se;
>> --
>> 2.36.1
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 6/9] sched/fair: fix another detach on unattached task corner case
  2022-08-17 15:04     ` Chengming Zhou
@ 2022-08-17 15:08       ` Vincent Guittot
  2022-08-17 15:10         ` Chengming Zhou
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Guittot @ 2022-08-17 15:08 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	vschneid, linux-kernel

On Wed, 17 Aug 2022 at 17:04, Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> On 2022/8/17 23:01, Vincent Guittot wrote:
> > On Mon, 8 Aug 2022 at 14:58, Chengming Zhou <zhouchengming@bytedance.com> wrote:
> >>
> >> commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
> >> fixed two load tracking problems for new task, including detach on
> >> unattached new task problem.
> >>
> >> There still left another detach on unattached task problem for the task
> >> which has been woken up by try_to_wake_up() and waiting for actually
> >> being woken up by sched_ttwu_pending().
> >>
> >> try_to_wake_up(p)
> >>   cpu = select_task_rq(p)
> >>   if (task_cpu(p) != cpu)
> >>     set_task_cpu(p, cpu)
> >>       migrate_task_rq_fair()
> >>         remove_entity_load_avg()       --> unattached
> >>         se->avg.last_update_time = 0;
> >>       __set_task_cpu()
> >>   ttwu_queue(p, cpu)
> >>     ttwu_queue_wakelist()
> >>       __ttwu_queue_wakelist()
> >>
> >> task_change_group_fair()
> >>   detach_task_cfs_rq()
> >>     detach_entity_cfs_rq()
> >>       detach_entity_load_avg()   --> detach on unattached task
> >>   set_task_rq()
> >>   attach_task_cfs_rq()
> >>     attach_entity_cfs_rq()
> >>       attach_entity_load_avg()
> >>
> >> The reason of this problem is similar, we should check in detach_entity_cfs_rq()
> >> that se->avg.last_update_time != 0, before do detach_entity_load_avg().
> >>
> >> This patch move detach/attach_entity_cfs_rq() functions up to be together
> >> with other load tracking functions to avoid to use another #ifdef CONFIG_SMP.
> >>
> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> >> ---
> >>  kernel/sched/fair.c | 132 +++++++++++++++++++++++---------------------
> >>  1 file changed, 68 insertions(+), 64 deletions(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index f52e7dc7f22d..4bc76d95a99d 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -874,9 +874,6 @@ void init_entity_runnable_average(struct sched_entity *se)
> >>  void post_init_entity_util_avg(struct task_struct *p)
> >>  {
> >>  }
> >> -static void update_tg_load_avg(struct cfs_rq *cfs_rq)
> >> -{
> >> -}
> >>  #endif /* CONFIG_SMP */
> >>
> >>  /*
> >> @@ -3176,6 +3173,7 @@ void reweight_task(struct task_struct *p, int prio)
> >>         load->inv_weight = sched_prio_to_wmult[prio];
> >>  }
> >>
> >> +static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq);
> >>  static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
> >>
> >>  #ifdef CONFIG_FAIR_GROUP_SCHED
> >> @@ -4086,6 +4084,71 @@ static void remove_entity_load_avg(struct sched_entity *se)
> >>         raw_spin_unlock_irqrestore(&cfs_rq->removed.lock, flags);
> >>  }
> >>
> >> +#ifdef CONFIG_FAIR_GROUP_SCHED
> >> +/*
> >> + * Propagate the changes of the sched_entity across the tg tree to make it
> >> + * visible to the root
> >> + */
> >> +static void propagate_entity_cfs_rq(struct sched_entity *se)
> >> +{
> >> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >> +
> >> +       if (cfs_rq_throttled(cfs_rq))
> >> +               return;
> >> +
> >> +       if (!throttled_hierarchy(cfs_rq))
> >> +               list_add_leaf_cfs_rq(cfs_rq);
> >> +
> >> +       /* Start to propagate at parent */
> >> +       se = se->parent;
> >> +
> >> +       for_each_sched_entity(se) {
> >> +               cfs_rq = cfs_rq_of(se);
> >> +
> >> +               update_load_avg(cfs_rq, se, UPDATE_TG);
> >> +
> >> +               if (cfs_rq_throttled(cfs_rq))
> >> +                       break;
> >> +
> >> +               if (!throttled_hierarchy(cfs_rq))
> >> +                       list_add_leaf_cfs_rq(cfs_rq);
> >> +       }
> >> +}
> >> +#else
> >> +static void propagate_entity_cfs_rq(struct sched_entity *se) { }
> >> +#endif
> >> +
> >> +static void detach_entity_cfs_rq(struct sched_entity *se)
> >> +{
> >> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >> +
> >> +       /*
> >> +        * In case the task sched_avg hasn't been attached:
> >> +        * - A forked task which hasn't been woken up by wake_up_new_task().
> >> +        * - A task which has been woken up by try_to_wake_up() but is
> >> +        *   waiting for actually being woken up by sched_ttwu_pending().
> >> +        */
> >> +       if (!se->avg.last_update_time)
> >> +               return;
> >
> > The 2 lines above and the associated comment are the only relevant
> > part of the patch, aren't they ?
> > Is everything else just code moving from one place to another one
> > without change ?
>
> Yes, everything else is just code movement.

Could you remove such code movement ? It doesn't add any value to the
patch, does it ? But It makes the patch quite difficult to review and
I wasted a lot of time looking for what really changed in the code

Thanks

>
> Thanks!
>
> >
> >> +
> >> +       /* Catch up with the cfs_rq and remove our load when we leave */
> >> +       update_load_avg(cfs_rq, se, 0);
> >> +       detach_entity_load_avg(cfs_rq, se);
> >> +       update_tg_load_avg(cfs_rq);
> >> +       propagate_entity_cfs_rq(se);
> >> +}
> >> +
> >> +static void attach_entity_cfs_rq(struct sched_entity *se)
> >> +{
> >> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >> +
> >> +       /* Synchronize entity with its cfs_rq */
> >> +       update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
> >> +       attach_entity_load_avg(cfs_rq, se);
> >> +       update_tg_load_avg(cfs_rq);
> >> +       propagate_entity_cfs_rq(se);
> >> +}
> >> +
> >>  static inline unsigned long cfs_rq_runnable_avg(struct cfs_rq *cfs_rq)
> >>  {
> >>         return cfs_rq->avg.runnable_avg;
> >> @@ -4308,11 +4371,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> >>  }
> >>
> >>  static inline void remove_entity_load_avg(struct sched_entity *se) {}
> >> -
> >> -static inline void
> >> -attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
> >> -static inline void
> >> -detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
> >> +static inline void detach_entity_cfs_rq(struct sched_entity *se) {}
> >> +static inline void attach_entity_cfs_rq(struct sched_entity *se) {}
> >>
> >>  static inline int newidle_balance(struct rq *rq, struct rq_flags *rf)
> >>  {
> >> @@ -11519,62 +11579,6 @@ static inline bool vruntime_normalized(struct task_struct *p)
> >>         return false;
> >>  }
> >>
> >> -#ifdef CONFIG_FAIR_GROUP_SCHED
> >> -/*
> >> - * Propagate the changes of the sched_entity across the tg tree to make it
> >> - * visible to the root
> >> - */
> >> -static void propagate_entity_cfs_rq(struct sched_entity *se)
> >> -{
> >> -       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >> -
> >> -       if (cfs_rq_throttled(cfs_rq))
> >> -               return;
> >> -
> >> -       if (!throttled_hierarchy(cfs_rq))
> >> -               list_add_leaf_cfs_rq(cfs_rq);
> >> -
> >> -       /* Start to propagate at parent */
> >> -       se = se->parent;
> >> -
> >> -       for_each_sched_entity(se) {
> >> -               cfs_rq = cfs_rq_of(se);
> >> -
> >> -               update_load_avg(cfs_rq, se, UPDATE_TG);
> >> -
> >> -               if (cfs_rq_throttled(cfs_rq))
> >> -                       break;
> >> -
> >> -               if (!throttled_hierarchy(cfs_rq))
> >> -                       list_add_leaf_cfs_rq(cfs_rq);
> >> -       }
> >> -}
> >> -#else
> >> -static void propagate_entity_cfs_rq(struct sched_entity *se) { }
> >> -#endif
> >> -
> >> -static void detach_entity_cfs_rq(struct sched_entity *se)
> >> -{
> >> -       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >> -
> >> -       /* Catch up with the cfs_rq and remove our load when we leave */
> >> -       update_load_avg(cfs_rq, se, 0);
> >> -       detach_entity_load_avg(cfs_rq, se);
> >> -       update_tg_load_avg(cfs_rq);
> >> -       propagate_entity_cfs_rq(se);
> >> -}
> >> -
> >> -static void attach_entity_cfs_rq(struct sched_entity *se)
> >> -{
> >> -       struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >> -
> >> -       /* Synchronize entity with its cfs_rq */
> >> -       update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
> >> -       attach_entity_load_avg(cfs_rq, se);
> >> -       update_tg_load_avg(cfs_rq);
> >> -       propagate_entity_cfs_rq(se);
> >> -}
> >> -
> >>  static void detach_task_cfs_rq(struct task_struct *p)
> >>  {
> >>         struct sched_entity *se = &p->se;
> >> --
> >> 2.36.1
> >>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 6/9] sched/fair: fix another detach on unattached task corner case
  2022-08-17 15:08       ` Vincent Guittot
@ 2022-08-17 15:10         ` Chengming Zhou
  0 siblings, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-08-17 15:10 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	vschneid, linux-kernel

On 2022/8/17 23:08, Vincent Guittot wrote:
> On Wed, 17 Aug 2022 at 17:04, Chengming Zhou
> <zhouchengming@bytedance.com> wrote:
>>
>> On 2022/8/17 23:01, Vincent Guittot wrote:
>>> On Mon, 8 Aug 2022 at 14:58, Chengming Zhou <zhouchengming@bytedance.com> wrote:
>>>>
>>>> commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
>>>> fixed two load tracking problems for new task, including detach on
>>>> unattached new task problem.
>>>>
>>>> There still left another detach on unattached task problem for the task
>>>> which has been woken up by try_to_wake_up() and waiting for actually
>>>> being woken up by sched_ttwu_pending().
>>>>
>>>> try_to_wake_up(p)
>>>>   cpu = select_task_rq(p)
>>>>   if (task_cpu(p) != cpu)
>>>>     set_task_cpu(p, cpu)
>>>>       migrate_task_rq_fair()
>>>>         remove_entity_load_avg()       --> unattached
>>>>         se->avg.last_update_time = 0;
>>>>       __set_task_cpu()
>>>>   ttwu_queue(p, cpu)
>>>>     ttwu_queue_wakelist()
>>>>       __ttwu_queue_wakelist()
>>>>
>>>> task_change_group_fair()
>>>>   detach_task_cfs_rq()
>>>>     detach_entity_cfs_rq()
>>>>       detach_entity_load_avg()   --> detach on unattached task
>>>>   set_task_rq()
>>>>   attach_task_cfs_rq()
>>>>     attach_entity_cfs_rq()
>>>>       attach_entity_load_avg()
>>>>
>>>> The reason of this problem is similar, we should check in detach_entity_cfs_rq()
>>>> that se->avg.last_update_time != 0, before do detach_entity_load_avg().
>>>>
>>>> This patch move detach/attach_entity_cfs_rq() functions up to be together
>>>> with other load tracking functions to avoid to use another #ifdef CONFIG_SMP.
>>>>
>>>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>>>> ---
>>>>  kernel/sched/fair.c | 132 +++++++++++++++++++++++---------------------
>>>>  1 file changed, 68 insertions(+), 64 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index f52e7dc7f22d..4bc76d95a99d 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -874,9 +874,6 @@ void init_entity_runnable_average(struct sched_entity *se)
>>>>  void post_init_entity_util_avg(struct task_struct *p)
>>>>  {
>>>>  }
>>>> -static void update_tg_load_avg(struct cfs_rq *cfs_rq)
>>>> -{
>>>> -}
>>>>  #endif /* CONFIG_SMP */
>>>>
>>>>  /*
>>>> @@ -3176,6 +3173,7 @@ void reweight_task(struct task_struct *p, int prio)
>>>>         load->inv_weight = sched_prio_to_wmult[prio];
>>>>  }
>>>>
>>>> +static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq);
>>>>  static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
>>>>
>>>>  #ifdef CONFIG_FAIR_GROUP_SCHED
>>>> @@ -4086,6 +4084,71 @@ static void remove_entity_load_avg(struct sched_entity *se)
>>>>         raw_spin_unlock_irqrestore(&cfs_rq->removed.lock, flags);
>>>>  }
>>>>
>>>> +#ifdef CONFIG_FAIR_GROUP_SCHED
>>>> +/*
>>>> + * Propagate the changes of the sched_entity across the tg tree to make it
>>>> + * visible to the root
>>>> + */
>>>> +static void propagate_entity_cfs_rq(struct sched_entity *se)
>>>> +{
>>>> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>>>> +
>>>> +       if (cfs_rq_throttled(cfs_rq))
>>>> +               return;
>>>> +
>>>> +       if (!throttled_hierarchy(cfs_rq))
>>>> +               list_add_leaf_cfs_rq(cfs_rq);
>>>> +
>>>> +       /* Start to propagate at parent */
>>>> +       se = se->parent;
>>>> +
>>>> +       for_each_sched_entity(se) {
>>>> +               cfs_rq = cfs_rq_of(se);
>>>> +
>>>> +               update_load_avg(cfs_rq, se, UPDATE_TG);
>>>> +
>>>> +               if (cfs_rq_throttled(cfs_rq))
>>>> +                       break;
>>>> +
>>>> +               if (!throttled_hierarchy(cfs_rq))
>>>> +                       list_add_leaf_cfs_rq(cfs_rq);
>>>> +       }
>>>> +}
>>>> +#else
>>>> +static void propagate_entity_cfs_rq(struct sched_entity *se) { }
>>>> +#endif
>>>> +
>>>> +static void detach_entity_cfs_rq(struct sched_entity *se)
>>>> +{
>>>> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>>>> +
>>>> +       /*
>>>> +        * In case the task sched_avg hasn't been attached:
>>>> +        * - A forked task which hasn't been woken up by wake_up_new_task().
>>>> +        * - A task which has been woken up by try_to_wake_up() but is
>>>> +        *   waiting for actually being woken up by sched_ttwu_pending().
>>>> +        */
>>>> +       if (!se->avg.last_update_time)
>>>> +               return;
>>>
>>> The 2 lines above and the associated comment are the only relevant
>>> part of the patch, aren't they ?
>>> Is everything else just code moving from one place to another one
>>> without change ?
>>
>> Yes, everything else is just code movement.
> 
> Could you remove such code movement ? It doesn't add any value to the
> patch, does it ? But It makes the patch quite difficult to review and
> I wasted a lot of time looking for what really changed in the code
> 

Sorry about that. No problem, I will remove it.

Thanks!


> Thanks
> 
>>
>> Thanks!
>>
>>>
>>>> +
>>>> +       /* Catch up with the cfs_rq and remove our load when we leave */
>>>> +       update_load_avg(cfs_rq, se, 0);
>>>> +       detach_entity_load_avg(cfs_rq, se);
>>>> +       update_tg_load_avg(cfs_rq);
>>>> +       propagate_entity_cfs_rq(se);
>>>> +}
>>>> +
>>>> +static void attach_entity_cfs_rq(struct sched_entity *se)
>>>> +{
>>>> +       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>>>> +
>>>> +       /* Synchronize entity with its cfs_rq */
>>>> +       update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
>>>> +       attach_entity_load_avg(cfs_rq, se);
>>>> +       update_tg_load_avg(cfs_rq);
>>>> +       propagate_entity_cfs_rq(se);
>>>> +}
>>>> +
>>>>  static inline unsigned long cfs_rq_runnable_avg(struct cfs_rq *cfs_rq)
>>>>  {
>>>>         return cfs_rq->avg.runnable_avg;
>>>> @@ -4308,11 +4371,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>>>>  }
>>>>
>>>>  static inline void remove_entity_load_avg(struct sched_entity *se) {}
>>>> -
>>>> -static inline void
>>>> -attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
>>>> -static inline void
>>>> -detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
>>>> +static inline void detach_entity_cfs_rq(struct sched_entity *se) {}
>>>> +static inline void attach_entity_cfs_rq(struct sched_entity *se) {}
>>>>
>>>>  static inline int newidle_balance(struct rq *rq, struct rq_flags *rf)
>>>>  {
>>>> @@ -11519,62 +11579,6 @@ static inline bool vruntime_normalized(struct task_struct *p)
>>>>         return false;
>>>>  }
>>>>
>>>> -#ifdef CONFIG_FAIR_GROUP_SCHED
>>>> -/*
>>>> - * Propagate the changes of the sched_entity across the tg tree to make it
>>>> - * visible to the root
>>>> - */
>>>> -static void propagate_entity_cfs_rq(struct sched_entity *se)
>>>> -{
>>>> -       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>>>> -
>>>> -       if (cfs_rq_throttled(cfs_rq))
>>>> -               return;
>>>> -
>>>> -       if (!throttled_hierarchy(cfs_rq))
>>>> -               list_add_leaf_cfs_rq(cfs_rq);
>>>> -
>>>> -       /* Start to propagate at parent */
>>>> -       se = se->parent;
>>>> -
>>>> -       for_each_sched_entity(se) {
>>>> -               cfs_rq = cfs_rq_of(se);
>>>> -
>>>> -               update_load_avg(cfs_rq, se, UPDATE_TG);
>>>> -
>>>> -               if (cfs_rq_throttled(cfs_rq))
>>>> -                       break;
>>>> -
>>>> -               if (!throttled_hierarchy(cfs_rq))
>>>> -                       list_add_leaf_cfs_rq(cfs_rq);
>>>> -       }
>>>> -}
>>>> -#else
>>>> -static void propagate_entity_cfs_rq(struct sched_entity *se) { }
>>>> -#endif
>>>> -
>>>> -static void detach_entity_cfs_rq(struct sched_entity *se)
>>>> -{
>>>> -       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>>>> -
>>>> -       /* Catch up with the cfs_rq and remove our load when we leave */
>>>> -       update_load_avg(cfs_rq, se, 0);
>>>> -       detach_entity_load_avg(cfs_rq, se);
>>>> -       update_tg_load_avg(cfs_rq);
>>>> -       propagate_entity_cfs_rq(se);
>>>> -}
>>>> -
>>>> -static void attach_entity_cfs_rq(struct sched_entity *se)
>>>> -{
>>>> -       struct cfs_rq *cfs_rq = cfs_rq_of(se);
>>>> -
>>>> -       /* Synchronize entity with its cfs_rq */
>>>> -       update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
>>>> -       attach_entity_load_avg(cfs_rq, se);
>>>> -       update_tg_load_avg(cfs_rq);
>>>> -       propagate_entity_cfs_rq(se);
>>>> -}
>>>> -
>>>>  static void detach_task_cfs_rq(struct task_struct *p)
>>>>  {
>>>>         struct sched_entity *se = &p->se;
>>>> --
>>>> 2.36.1
>>>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-08-17 15:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-08 12:57 [PATCH v4 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
2022-08-08 12:57 ` [PATCH v4 1/9] sched/fair: maintain task se depth in set_task_rq() Chengming Zhou
2022-08-08 12:57 ` [PATCH v4 2/9] sched/fair: remove redundant cpu_cgrp_subsys->fork() Chengming Zhou
2022-08-08 12:57 ` [PATCH v4 3/9] sched/fair: reset sched_avg last_update_time before set_task_rq() Chengming Zhou
2022-08-08 12:57 ` [PATCH v4 4/9] sched/fair: update comments in enqueue/dequeue_entity() Chengming Zhou
2022-08-08 12:57 ` [PATCH v4 5/9] sched/fair: combine detach into dequeue when migrating task Chengming Zhou
2022-08-08 12:57 ` [PATCH v4 6/9] sched/fair: fix another detach on unattached task corner case Chengming Zhou
2022-08-17 15:01   ` Vincent Guittot
2022-08-17 15:04     ` Chengming Zhou
2022-08-17 15:08       ` Vincent Guittot
2022-08-17 15:10         ` Chengming Zhou
2022-08-08 12:57 ` [PATCH v4 7/9] sched/fair: allow changing cgroup of new forked task Chengming Zhou
2022-08-08 14:57   ` Chengming Zhou
2022-08-08 16:42   ` kernel test robot
2022-08-15 21:11   ` Tejun Heo
2022-08-16 13:14     ` Chengming Zhou
2022-08-08 12:57 ` [PATCH v4 8/9] sched/fair: defer task sched_avg attach to enqueue_entity() Chengming Zhou
2022-08-08 12:57 ` [PATCH v4 9/9] sched/fair: don't init util/runnable_avg for !fair task Chengming Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).