All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/9] sched/fair: task load tracking optimization and cleanup
@ 2022-08-18  3:43 Chengming Zhou
  2022-08-18  3:43 ` [PATCH v5 1/9] sched/fair: maintain task se depth in set_task_rq() Chengming Zhou
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-18  3:43 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, tj, Chengming Zhou

Hi all,

This patch series is optimization and cleanup for task load tracking when
task migrate CPU/cgroup or switched_from/to_fair(), based on tip/sched/core.

There are three types of detach/attach_entity_load_avg (except fork and exit)
for a fair task:
1. task migrate CPU (on_rq migrate or wake_up migrate)
2. task migrate cgroup (detach and attach)
3. task switched_from/to_fair (detach later attach)

patch 1-3 cleanup the task change cgroup case by remove cpu_cgrp_subsys->fork(),
since we already do the same thing in sched_cgroup_fork().

patch 5/9 optimize the task migrate CPU case by combine detach into dequeue.

patch 6/9 fix another detach on unattached task case which has been woken up
by try_to_wake_up() but is waiting for actually being woken up by
sched_ttwu_pending().

patch 7/9 remove unnecessary limitation that we would fail when change
cgroup of forked task which hasn't been woken up by wake_up_new_task().

patch 8-9 optimize post_init_entity_util_avg() for fair task and skip
setting util_avg and runnable_avg for !fair task at the fork time.

Thanks!

Changes in v5:
 - Don't do code movements in patch 6/9, which complicate code review,
   as suggested by Vincent. Thanks!
 - Fix a build error of typo in patch 7/9.

Changes in v4:
 - Drop detach/attach_entity_cfs_rq() refactor patch in the last version.
 - Move new forked task check to task_change_group_fair().

Changes in v3:
 - One big change is this series don't freeze PELT sum/avg values to be
   used as initial values when re-entering fair any more, since these
   PELT values become much less relevant.
 - Reorder patches and collect tags from Vincent and Dietmar. Thanks!
 - Fix detach on unattached task which has been woken up by try_to_wake_up()
   but is waiting for actually being woken up by sched_ttwu_pending().
 - Delete TASK_NEW which limit forked task from changing cgroup.
 - Don't init util_avg and runnable_avg for !fair taks at fork time.

Changes in v2:
 - split task se depth maintenance into a separate patch3, suggested
   by Peter.
 - reorder patch6-7 before patch8-9, since we need update_load_avg()
   to do conditional attach/detach to avoid corner cases like twice
   attach problem.

Chengming Zhou (9):
  sched/fair: maintain task se depth in set_task_rq()
  sched/fair: remove redundant cpu_cgrp_subsys->fork()
  sched/fair: reset sched_avg last_update_time before set_task_rq()
  sched/fair: update comments in enqueue/dequeue_entity()
  sched/fair: combine detach into dequeue when migrating task
  sched/fair: fix another detach on unattached task corner case
  sched/fair: allow changing cgroup of new forked task
  sched/fair: defer task sched_avg attach to enqueue_entity()
  sched/fair: don't init util/runnable_avg for !fair task

 include/linux/sched.h |   5 +-
 kernel/sched/core.c   |  57 ++++-----------------
 kernel/sched/fair.c   | 113 +++++++++++++++++++-----------------------
 kernel/sched/sched.h  |   6 +--
 4 files changed, 67 insertions(+), 114 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 1/9] sched/fair: maintain task se depth in set_task_rq()
  2022-08-18  3:43 [PATCH v5 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
@ 2022-08-18  3:43 ` Chengming Zhou
  2022-08-18  3:43 ` [PATCH v5 2/9] sched/fair: remove redundant cpu_cgrp_subsys->fork() Chengming Zhou
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-18  3:43 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, tj, Chengming Zhou

Previously we only maintain task se depth in task_move_group_fair(),
if a !fair task change task group, its se depth will not be updated,
so commit eb7a59b2c888 ("sched/fair: Reset se-depth when task switched to FAIR")
fix the problem by updating se depth in switched_to_fair() too.

Then commit daa59407b558 ("sched/fair: Unify switched_{from,to}_fair()
and task_move_group_fair()") unified these two functions, moved se.depth
setting to attach_task_cfs_rq(), which further into attach_entity_cfs_rq()
with commit df217913e72e ("sched/fair: Factorize attach/detach entity").

This patch move task se depth maintenance from attach_entity_cfs_rq()
to set_task_rq(), which will be called when CPU/cgroup change, so its
depth will always be correct.

This patch is preparation for the next patch.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c  | 8 --------
 kernel/sched/sched.h | 1 +
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a71d6686149b..c5ee08b187ec 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11726,14 +11726,6 @@ static void attach_entity_cfs_rq(struct sched_entity *se)
 {
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
-#ifdef CONFIG_FAIR_GROUP_SCHED
-	/*
-	 * Since the real-depth could have been changed (only FAIR
-	 * class maintain depth value), reset depth properly.
-	 */
-	se->depth = se->parent ? se->parent->depth + 1 : 0;
-#endif
-
 	/* Synchronize entity with its cfs_rq */
 	update_load_avg(cfs_rq, se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD);
 	attach_entity_load_avg(cfs_rq, se);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ddcfc7837595..628ffa974123 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1932,6 +1932,7 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
 	set_task_rq_fair(&p->se, p->se.cfs_rq, tg->cfs_rq[cpu]);
 	p->se.cfs_rq = tg->cfs_rq[cpu];
 	p->se.parent = tg->se[cpu];
+	p->se.depth = tg->se[cpu] ? tg->se[cpu]->depth + 1 : 0;
 #endif
 
 #ifdef CONFIG_RT_GROUP_SCHED
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 2/9] sched/fair: remove redundant cpu_cgrp_subsys->fork()
  2022-08-18  3:43 [PATCH v5 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
  2022-08-18  3:43 ` [PATCH v5 1/9] sched/fair: maintain task se depth in set_task_rq() Chengming Zhou
@ 2022-08-18  3:43 ` Chengming Zhou
  2022-08-18  3:43 ` [PATCH v5 3/9] sched/fair: reset sched_avg last_update_time before set_task_rq() Chengming Zhou
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-18  3:43 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, tj, Chengming Zhou

We use cpu_cgrp_subsys->fork() to set task group for the new fair task
in cgroup_post_fork().

Since commit b1e8206582f9 ("sched: Fix yet more sched_fork() races")
has already set_task_rq() for the new fair task in sched_cgroup_fork(),
so cpu_cgrp_subsys->fork() can be removed.

  cgroup_can_fork()	--> pin parent's sched_task_group
  sched_cgroup_fork()
    __set_task_cpu()
      set_task_rq()
  cgroup_post_fork()
    ss->fork() := cpu_cgroup_fork()
      sched_change_group(..., TASK_SET_GROUP)
        task_set_group_fair()
          set_task_rq()  --> can be removed

After this patch's change, task_change_group_fair() only need to
care about task cgroup migration, make the code much simplier.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/core.c  | 27 ++++-----------------------
 kernel/sched/fair.c  | 23 +----------------------
 kernel/sched/sched.h |  5 +----
 3 files changed, 6 insertions(+), 49 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 863b5203e357..8e3f1c3f0b2c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -481,8 +481,7 @@ sched_core_dequeue(struct rq *rq, struct task_struct *p, int flags) { }
  *				p->se.load, p->rt_priority,
  *				p->dl.dl_{runtime, deadline, period, flags, bw, density}
  *  - sched_setnuma():		p->numa_preferred_nid
- *  - sched_move_task()/
- *    cpu_cgroup_fork():	p->sched_task_group
+ *  - sched_move_task():	p->sched_task_group
  *  - uclamp_update_active()	p->uclamp*
  *
  * p->state <- TASK_*:
@@ -10166,7 +10165,7 @@ void sched_release_group(struct task_group *tg)
 	spin_unlock_irqrestore(&task_group_lock, flags);
 }
 
-static void sched_change_group(struct task_struct *tsk, int type)
+static void sched_change_group(struct task_struct *tsk)
 {
 	struct task_group *tg;
 
@@ -10182,7 +10181,7 @@ static void sched_change_group(struct task_struct *tsk, int type)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	if (tsk->sched_class->task_change_group)
-		tsk->sched_class->task_change_group(tsk, type);
+		tsk->sched_class->task_change_group(tsk);
 	else
 #endif
 		set_task_rq(tsk, task_cpu(tsk));
@@ -10213,7 +10212,7 @@ void sched_move_task(struct task_struct *tsk)
 	if (running)
 		put_prev_task(rq, tsk);
 
-	sched_change_group(tsk, TASK_MOVE_GROUP);
+	sched_change_group(tsk);
 
 	if (queued)
 		enqueue_task(rq, tsk, queue_flags);
@@ -10291,23 +10290,6 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
 	sched_unregister_group(tg);
 }
 
-/*
- * This is called before wake_up_new_task(), therefore we really only
- * have to set its group bits, all the other stuff does not apply.
- */
-static void cpu_cgroup_fork(struct task_struct *task)
-{
-	struct rq_flags rf;
-	struct rq *rq;
-
-	rq = task_rq_lock(task, &rf);
-
-	update_rq_clock(rq);
-	sched_change_group(task, TASK_SET_GROUP);
-
-	task_rq_unlock(rq, task, &rf);
-}
-
 static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
 {
 	struct task_struct *task;
@@ -11173,7 +11155,6 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 	.css_released	= cpu_cgroup_css_released,
 	.css_free	= cpu_cgroup_css_free,
 	.css_extra_stat_show = cpu_extra_stat_show,
-	.fork		= cpu_cgroup_fork,
 	.can_attach	= cpu_cgroup_can_attach,
 	.attach		= cpu_cgroup_attach,
 	.legacy_cftypes	= cpu_legacy_files,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c5ee08b187ec..4b95599aa951 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11821,15 +11821,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-static void task_set_group_fair(struct task_struct *p)
-{
-	struct sched_entity *se = &p->se;
-
-	set_task_rq(p, task_cpu(p));
-	se->depth = se->parent ? se->parent->depth + 1 : 0;
-}
-
-static void task_move_group_fair(struct task_struct *p)
+static void task_change_group_fair(struct task_struct *p)
 {
 	detach_task_cfs_rq(p);
 	set_task_rq(p, task_cpu(p));
@@ -11841,19 +11833,6 @@ static void task_move_group_fair(struct task_struct *p)
 	attach_task_cfs_rq(p);
 }
 
-static void task_change_group_fair(struct task_struct *p, int type)
-{
-	switch (type) {
-	case TASK_SET_GROUP:
-		task_set_group_fair(p);
-		break;
-
-	case TASK_MOVE_GROUP:
-		task_move_group_fair(p);
-		break;
-	}
-}
-
 void free_fair_sched_group(struct task_group *tg)
 {
 	int i;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 628ffa974123..2db7b0494c19 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2195,11 +2195,8 @@ struct sched_class {
 
 	void (*update_curr)(struct rq *rq);
 
-#define TASK_SET_GROUP		0
-#define TASK_MOVE_GROUP		1
-
 #ifdef CONFIG_FAIR_GROUP_SCHED
-	void (*task_change_group)(struct task_struct *p, int type);
+	void (*task_change_group)(struct task_struct *p);
 #endif
 };
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 3/9] sched/fair: reset sched_avg last_update_time before set_task_rq()
  2022-08-18  3:43 [PATCH v5 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
  2022-08-18  3:43 ` [PATCH v5 1/9] sched/fair: maintain task se depth in set_task_rq() Chengming Zhou
  2022-08-18  3:43 ` [PATCH v5 2/9] sched/fair: remove redundant cpu_cgrp_subsys->fork() Chengming Zhou
@ 2022-08-18  3:43 ` Chengming Zhou
  2022-08-18  3:43 ` [PATCH v5 4/9] sched/fair: update comments in enqueue/dequeue_entity() Chengming Zhou
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-18  3:43 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, tj, Chengming Zhou

set_task_rq() -> set_task_rq_fair() will try to synchronize the blocked
task's sched_avg when migrate, which is not needed for already detached
task.

task_change_group_fair() will detached the task sched_avg from prev cfs_rq
first, so reset sched_avg last_update_time before set_task_rq() to avoid that.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4b95599aa951..5a704109472a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11824,12 +11824,12 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 static void task_change_group_fair(struct task_struct *p)
 {
 	detach_task_cfs_rq(p);
-	set_task_rq(p, task_cpu(p));
 
 #ifdef CONFIG_SMP
 	/* Tell se's cfs_rq has been changed -- migrated */
 	p->se.avg.last_update_time = 0;
 #endif
+	set_task_rq(p, task_cpu(p));
 	attach_task_cfs_rq(p);
 }
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 4/9] sched/fair: update comments in enqueue/dequeue_entity()
  2022-08-18  3:43 [PATCH v5 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (2 preceding siblings ...)
  2022-08-18  3:43 ` [PATCH v5 3/9] sched/fair: reset sched_avg last_update_time before set_task_rq() Chengming Zhou
@ 2022-08-18  3:43 ` Chengming Zhou
  2022-08-18  3:43 ` [PATCH v5 5/9] sched/fair: combine detach into dequeue when migrating task Chengming Zhou
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-18  3:43 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, tj, Chengming Zhou

When reading the sched_avg related code, I found the comments in
enqueue/dequeue_entity() are not updated with the current code.

We don't add/subtract entity's runnable_avg from cfs_rq->runnable_avg
during enqueue/dequeue_entity(), those are done only for attach/detach.

This patch updates the comments to reflect the current code working.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5a704109472a..372e5f4a49a3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4598,7 +4598,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	/*
 	 * When enqueuing a sched_entity, we must:
 	 *   - Update loads to have both entity and cfs_rq synced with now.
-	 *   - Add its load to cfs_rq->runnable_avg
+	 *   - For group_entity, update its runnable_weight to reflect the new
+	 *     h_nr_running of its group cfs_rq.
 	 *   - For group_entity, update its weight to reflect the new share of
 	 *     its group cfs_rq
 	 *   - Add its new weight to cfs_rq->load.weight
@@ -4683,7 +4684,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	/*
 	 * When dequeuing a sched_entity, we must:
 	 *   - Update loads to have both entity and cfs_rq synced with now.
-	 *   - Subtract its load from the cfs_rq->runnable_avg.
+	 *   - For group_entity, update its runnable_weight to reflect the new
+	 *     h_nr_running of its group cfs_rq.
 	 *   - Subtract its previous weight from cfs_rq->load.weight.
 	 *   - For group entity, update its weight to reflect the new share
 	 *     of its group cfs_rq.
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 5/9] sched/fair: combine detach into dequeue when migrating task
  2022-08-18  3:43 [PATCH v5 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (3 preceding siblings ...)
  2022-08-18  3:43 ` [PATCH v5 4/9] sched/fair: update comments in enqueue/dequeue_entity() Chengming Zhou
@ 2022-08-18  3:43 ` Chengming Zhou
  2022-08-18  3:43 ` [PATCH v5 6/9] sched/fair: fix another detach on unattached task corner case Chengming Zhou
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-18  3:43 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, tj, Chengming Zhou

When we are migrating task out of the CPU, we can combine detach and
propagation into dequeue_entity() to save the detach_entity_cfs_rq()
in migrate_task_rq_fair().

This optimization is like combining DO_ATTACH in the enqueue_entity()
when migrating task to the CPU. So we don't have to traverse the CFS tree
extra time to do the detach_entity_cfs_rq() -> propagate_entity_cfs_rq(),
which wouldn't be called anymore with this patch's change.

detach_task()
  deactivate_task()
    dequeue_task_fair()
      for_each_sched_entity(se)
        dequeue_entity()
          update_load_avg() /* (1) */
            detach_entity_load_avg()

  set_task_cpu()
    migrate_task_rq_fair()
      detach_entity_cfs_rq() /* (2) */
        update_load_avg();
        detach_entity_load_avg();
        propagate_entity_cfs_rq();
          for_each_sched_entity()
            update_load_avg()

This patch save the detach_entity_cfs_rq() called in (2) by doing
the detach_entity_load_avg() for a CPU migrating task inside (1)
(the task being the first se in the loop)

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 372e5f4a49a3..1eb3fb3d95c3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4167,6 +4167,7 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 #define UPDATE_TG	0x1
 #define SKIP_AGE_LOAD	0x2
 #define DO_ATTACH	0x4
+#define DO_DETACH	0x8
 
 /* Update task and its cfs_rq load average */
 static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
@@ -4196,6 +4197,13 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 		attach_entity_load_avg(cfs_rq, se);
 		update_tg_load_avg(cfs_rq);
 
+	} else if (flags & DO_DETACH) {
+		/*
+		 * DO_DETACH means we're here from dequeue_entity()
+		 * and we are migrating task out of the CPU.
+		 */
+		detach_entity_load_avg(cfs_rq, se);
+		update_tg_load_avg(cfs_rq);
 	} else if (decayed) {
 		cfs_rq_util_change(cfs_rq, 0);
 
@@ -4456,6 +4464,7 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
 #define UPDATE_TG	0x0
 #define SKIP_AGE_LOAD	0x0
 #define DO_ATTACH	0x0
+#define DO_DETACH	0x0
 
 static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int not_used1)
 {
@@ -4676,6 +4685,11 @@ static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq);
 static void
 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 {
+	int action = UPDATE_TG;
+
+	if (entity_is_task(se) && task_on_rq_migrating(task_of(se)))
+		action |= DO_DETACH;
+
 	/*
 	 * Update run-time statistics of the 'current'.
 	 */
@@ -4690,7 +4704,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 *   - For group entity, update its weight to reflect the new share
 	 *     of its group cfs_rq.
 	 */
-	update_load_avg(cfs_rq, se, UPDATE_TG);
+	update_load_avg(cfs_rq, se, action);
 	se_update_runnable(se);
 
 	update_stats_dequeue_fair(cfs_rq, se, flags);
@@ -7242,8 +7256,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 	return new_cpu;
 }
 
-static void detach_entity_cfs_rq(struct sched_entity *se);
-
 /*
  * Called immediately before a task is migrated to a new CPU; task_cpu(p) and
  * cfs_rq_of(p) references at time of call are still valid and identify the
@@ -7265,15 +7277,7 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
 		se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
 	}
 
-	if (p->on_rq == TASK_ON_RQ_MIGRATING) {
-		/*
-		 * In case of TASK_ON_RQ_MIGRATING we in fact hold the 'old'
-		 * rq->lock and can modify state directly.
-		 */
-		lockdep_assert_rq_held(task_rq(p));
-		detach_entity_cfs_rq(se);
-
-	} else {
+	if (!task_on_rq_migrating(p)) {
 		remove_entity_load_avg(se);
 
 		/*
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 6/9] sched/fair: fix another detach on unattached task corner case
  2022-08-18  3:43 [PATCH v5 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (4 preceding siblings ...)
  2022-08-18  3:43 ` [PATCH v5 5/9] sched/fair: combine detach into dequeue when migrating task Chengming Zhou
@ 2022-08-18  3:43 ` Chengming Zhou
  2022-08-18  3:43 ` [PATCH v5 7/9] sched/fair: allow changing cgroup of new forked task Chengming Zhou
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-18  3:43 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, tj, Chengming Zhou

commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
fixed two load tracking problems for new task, including detach on
unattached new task problem.

There still left another detach on unattached task problem for the task
which has been woken up by try_to_wake_up() and waiting for actually
being woken up by sched_ttwu_pending().

try_to_wake_up(p)
  cpu = select_task_rq(p)
  if (task_cpu(p) != cpu)
    set_task_cpu(p, cpu)
      migrate_task_rq_fair()
        remove_entity_load_avg()       --> unattached
        se->avg.last_update_time = 0;
      __set_task_cpu()
  ttwu_queue(p, cpu)
    ttwu_queue_wakelist()
      __ttwu_queue_wakelist()

task_change_group_fair()
  detach_task_cfs_rq()
    detach_entity_cfs_rq()
      detach_entity_load_avg()   --> detach on unattached task
  set_task_rq()
  attach_task_cfs_rq()
    attach_entity_cfs_rq()
      attach_entity_load_avg()

The reason of this problem is similar, we should check in detach_entity_cfs_rq()
that se->avg.last_update_time != 0, before do detach_entity_load_avg().

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/fair.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1eb3fb3d95c3..eba8a64f905a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11721,6 +11721,17 @@ static void detach_entity_cfs_rq(struct sched_entity *se)
 {
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
+#ifdef CONFIG_SMP
+	/*
+	 * In case the task sched_avg hasn't been attached:
+	 * - A forked task which hasn't been woken up by wake_up_new_task().
+	 * - A task which has been woken up by try_to_wake_up() but is
+	 *   waiting for actually being woken up by sched_ttwu_pending().
+	 */
+	if (!se->avg.last_update_time)
+		return;
+#endif
+
 	/* Catch up with the cfs_rq and remove our load when we leave */
 	update_load_avg(cfs_rq, se, 0);
 	detach_entity_load_avg(cfs_rq, se);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 7/9] sched/fair: allow changing cgroup of new forked task
  2022-08-18  3:43 [PATCH v5 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (5 preceding siblings ...)
  2022-08-18  3:43 ` [PATCH v5 6/9] sched/fair: fix another detach on unattached task corner case Chengming Zhou
@ 2022-08-18  3:43 ` Chengming Zhou
  2022-08-18 10:36   ` Peter Zijlstra
  2022-08-18  3:43 ` [PATCH v5 8/9] sched/fair: defer task sched_avg attach to enqueue_entity() Chengming Zhou
  2022-08-18  3:43 ` [PATCH v5 9/9] sched/fair: don't init util/runnable_avg for !fair task Chengming Zhou
  8 siblings, 1 reply; 14+ messages in thread
From: Chengming Zhou @ 2022-08-18  3:43 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, tj, Chengming Zhou

commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new tasks")
introduce a TASK_NEW state and an unnessary limitation that would fail
when changing cgroup of new forked task.

Because at that time, we can't handle task_change_group_fair() for new
forked fair task which hasn't been woken up by wake_up_new_task(),
which will cause detach on an unattached task sched_avg problem.

This patch delete this unnessary limitation by adding check before do
detach or attach in task_change_group_fair().

So cpu_cgrp_subsys.can_attach() has nothing to do for fair tasks,
only define it in #ifdef CONFIG_RT_GROUP_SCHED.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 include/linux/sched.h |  5 ++---
 kernel/sched/core.c   | 30 +++++++-----------------------
 kernel/sched/fair.c   |  7 +++++++
 3 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e7b2f8a5c711..0b296e855dee 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -96,10 +96,9 @@ struct task_group;
 #define TASK_WAKEKILL			0x0100
 #define TASK_WAKING			0x0200
 #define TASK_NOLOAD			0x0400
-#define TASK_NEW			0x0800
 /* RT specific auxilliary flag to mark RT lock waiters */
-#define TASK_RTLOCK_WAIT		0x1000
-#define TASK_STATE_MAX			0x2000
+#define TASK_RTLOCK_WAIT		0x0800
+#define TASK_STATE_MAX			0x1000
 
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8e3f1c3f0b2c..157f7461a08a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4550,11 +4550,11 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 {
 	__sched_fork(clone_flags, p);
 	/*
-	 * We mark the process as NEW here. This guarantees that
+	 * We mark the process as running here. This guarantees that
 	 * nobody will actually run it, and a signal or other external
 	 * event cannot wake it up and insert it on the runqueue either.
 	 */
-	p->__state = TASK_NEW;
+	p->__state = TASK_RUNNING;
 
 	/*
 	 * Make sure we do not leak PI boosting priority to the child.
@@ -4672,7 +4672,6 @@ void wake_up_new_task(struct task_struct *p)
 	struct rq *rq;
 
 	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
-	WRITE_ONCE(p->__state, TASK_RUNNING);
 #ifdef CONFIG_SMP
 	/*
 	 * Fork balancing, do it here and not earlier because:
@@ -10290,36 +10289,19 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
 	sched_unregister_group(tg);
 }
 
+#ifdef CONFIG_RT_GROUP_SCHED
 static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
 {
 	struct task_struct *task;
 	struct cgroup_subsys_state *css;
-	int ret = 0;
 
 	cgroup_taskset_for_each(task, css, tset) {
-#ifdef CONFIG_RT_GROUP_SCHED
 		if (!sched_rt_can_attach(css_tg(css), task))
 			return -EINVAL;
-#endif
-		/*
-		 * Serialize against wake_up_new_task() such that if it's
-		 * running, we're sure to observe its full state.
-		 */
-		raw_spin_lock_irq(&task->pi_lock);
-		/*
-		 * Avoid calling sched_move_task() before wake_up_new_task()
-		 * has happened. This would lead to problems with PELT, due to
-		 * move wanting to detach+attach while we're not attached yet.
-		 */
-		if (READ_ONCE(task->__state) == TASK_NEW)
-			ret = -EINVAL;
-		raw_spin_unlock_irq(&task->pi_lock);
-
-		if (ret)
-			break;
 	}
-	return ret;
+	return 0;
 }
+#endif
 
 static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 {
@@ -11155,7 +11137,9 @@ struct cgroup_subsys cpu_cgrp_subsys = {
 	.css_released	= cpu_cgroup_css_released,
 	.css_free	= cpu_cgroup_css_free,
 	.css_extra_stat_show = cpu_extra_stat_show,
+#ifdef CONFIG_RT_GROUP_SCHED
 	.can_attach	= cpu_cgroup_can_attach,
+#endif
 	.attach		= cpu_cgroup_attach,
 	.legacy_cftypes	= cpu_legacy_files,
 	.dfl_cftypes	= cpu_files,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index eba8a64f905a..e0d34ecdabae 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11840,6 +11840,13 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 #ifdef CONFIG_FAIR_GROUP_SCHED
 static void task_change_group_fair(struct task_struct *p)
 {
+	/*
+	 * We couldn't detach or attach a forked task which
+	 * hasn't been woken up by wake_up_new_task().
+	 */
+	if (!p->on_rq && !p->se.sum_exec_runtime)
+		return;
+
 	detach_task_cfs_rq(p);
 
 #ifdef CONFIG_SMP
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 8/9] sched/fair: defer task sched_avg attach to enqueue_entity()
  2022-08-18  3:43 [PATCH v5 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (6 preceding siblings ...)
  2022-08-18  3:43 ` [PATCH v5 7/9] sched/fair: allow changing cgroup of new forked task Chengming Zhou
@ 2022-08-18  3:43 ` Chengming Zhou
  2022-08-18 10:39   ` Peter Zijlstra
  2022-08-18  3:43 ` [PATCH v5 9/9] sched/fair: don't init util/runnable_avg for !fair task Chengming Zhou
  8 siblings, 1 reply; 14+ messages in thread
From: Chengming Zhou @ 2022-08-18  3:43 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, tj, Chengming Zhou

When wake_up_new_task(), we would use post_init_entity_util_avg()
to init util_avg/runnable_avg based on cpu's util_avg at that time,
then attach task sched_avg to cfs_rq.

Since enqueue_entity() would always attach any unattached task entity,
so we can defer this work to enqueue_entity().

post_init_entity_util_avg(p)
  attach_entity_cfs_rq()  --> (1)
activate_task(rq, p)
  enqueue_task() := enqueue_task_fair()
  enqueue_entity()
    update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH)
      if (!se->avg.last_update_time && (flags & DO_ATTACH))
        attach_entity_load_avg()  --> (2)

This patch defer attach from (1) to (2)

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/fair.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e0d34ecdabae..aacf38a72714 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -799,8 +799,6 @@ void init_entity_runnable_average(struct sched_entity *se)
 	/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
 }
 
-static void attach_entity_cfs_rq(struct sched_entity *se);
-
 /*
  * With new tasks being created, their initial util_avgs are extrapolated
  * based on the cfs_rq's current util_avg:
@@ -863,8 +861,6 @@ void post_init_entity_util_avg(struct task_struct *p)
 		se->avg.last_update_time = cfs_rq_clock_pelt(cfs_rq);
 		return;
 	}
-
-	attach_entity_cfs_rq(se);
 }
 
 #else /* !CONFIG_SMP */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 9/9] sched/fair: don't init util/runnable_avg for !fair task
  2022-08-18  3:43 [PATCH v5 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
                   ` (7 preceding siblings ...)
  2022-08-18  3:43 ` [PATCH v5 8/9] sched/fair: defer task sched_avg attach to enqueue_entity() Chengming Zhou
@ 2022-08-18  3:43 ` Chengming Zhou
  8 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-18  3:43 UTC (permalink / raw)
  To: vincent.guittot, dietmar.eggemann, mingo, peterz, rostedt,
	bsegall, vschneid
  Cc: linux-kernel, tj, Chengming Zhou

post_init_entity_util_avg() init task util_avg according to the cpu util_avg
at the time of fork, which will decay when switched_to_fair() some time later,
we'd better to not set them at all in the case of !fair task.

Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/fair.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index aacf38a72714..235b59b9d75a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -833,20 +833,6 @@ void post_init_entity_util_avg(struct task_struct *p)
 	long cpu_scale = arch_scale_cpu_capacity(cpu_of(rq_of(cfs_rq)));
 	long cap = (long)(cpu_scale - cfs_rq->avg.util_avg) / 2;
 
-	if (cap > 0) {
-		if (cfs_rq->avg.util_avg != 0) {
-			sa->util_avg  = cfs_rq->avg.util_avg * se->load.weight;
-			sa->util_avg /= (cfs_rq->avg.load_avg + 1);
-
-			if (sa->util_avg > cap)
-				sa->util_avg = cap;
-		} else {
-			sa->util_avg = cap;
-		}
-	}
-
-	sa->runnable_avg = sa->util_avg;
-
 	if (p->sched_class != &fair_sched_class) {
 		/*
 		 * For !fair tasks do:
@@ -861,6 +847,20 @@ void post_init_entity_util_avg(struct task_struct *p)
 		se->avg.last_update_time = cfs_rq_clock_pelt(cfs_rq);
 		return;
 	}
+
+	if (cap > 0) {
+		if (cfs_rq->avg.util_avg != 0) {
+			sa->util_avg  = cfs_rq->avg.util_avg * se->load.weight;
+			sa->util_avg /= (cfs_rq->avg.load_avg + 1);
+
+			if (sa->util_avg > cap)
+				sa->util_avg = cap;
+		} else {
+			sa->util_avg = cap;
+		}
+	}
+
+	sa->runnable_avg = sa->util_avg;
 }
 
 #else /* !CONFIG_SMP */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 7/9] sched/fair: allow changing cgroup of new forked task
  2022-08-18  3:43 ` [PATCH v5 7/9] sched/fair: allow changing cgroup of new forked task Chengming Zhou
@ 2022-08-18 10:36   ` Peter Zijlstra
  2022-08-18 10:48     ` Chengming Zhou
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2022-08-18 10:36 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: vincent.guittot, dietmar.eggemann, mingo, rostedt, bsegall,
	vschneid, linux-kernel, tj

On Thu, Aug 18, 2022 at 11:43:41AM +0800, Chengming Zhou wrote:

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8e3f1c3f0b2c..157f7461a08a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4550,11 +4550,11 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
>  {
>  	__sched_fork(clone_flags, p);
>  	/*
> -	 * We mark the process as NEW here. This guarantees that
> +	 * We mark the process as running here. This guarantees that
>  	 * nobody will actually run it, and a signal or other external
>  	 * event cannot wake it up and insert it on the runqueue either.
>  	 */
> -	p->__state = TASK_NEW;
> +	p->__state = TASK_RUNNING;
>  
>  	/*
>  	 * Make sure we do not leak PI boosting priority to the child.
> @@ -4672,7 +4672,6 @@ void wake_up_new_task(struct task_struct *p)
>  	struct rq *rq;
>  
>  	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
> -	WRITE_ONCE(p->__state, TASK_RUNNING);
>  #ifdef CONFIG_SMP
>  	/*
>  	 * Fork balancing, do it here and not earlier because:
> @@ -10290,36 +10289,19 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
>  	sched_unregister_group(tg);
>  }

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index eba8a64f905a..e0d34ecdabae 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11840,6 +11840,13 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  static void task_change_group_fair(struct task_struct *p)
>  {
> +	/*
> +	 * We couldn't detach or attach a forked task which
> +	 * hasn't been woken up by wake_up_new_task().
> +	 */
> +	if (!p->on_rq && !p->se.sum_exec_runtime)
> +		return;
> +
>  	detach_task_cfs_rq(p);

Wouldn't that be much clearer when expressed in TASK_NEW ?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 8/9] sched/fair: defer task sched_avg attach to enqueue_entity()
  2022-08-18  3:43 ` [PATCH v5 8/9] sched/fair: defer task sched_avg attach to enqueue_entity() Chengming Zhou
@ 2022-08-18 10:39   ` Peter Zijlstra
  2022-08-18 11:03     ` Chengming Zhou
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2022-08-18 10:39 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: vincent.guittot, dietmar.eggemann, mingo, rostedt, bsegall,
	vschneid, linux-kernel, tj

On Thu, Aug 18, 2022 at 11:43:42AM +0800, Chengming Zhou wrote:
> When wake_up_new_task(), we would use post_init_entity_util_avg()
> to init util_avg/runnable_avg based on cpu's util_avg at that time,
> then attach task sched_avg to cfs_rq.
> 
> Since enqueue_entity() would always attach any unattached task entity,
> so we can defer this work to enqueue_entity().
> 
> post_init_entity_util_avg(p)
>   attach_entity_cfs_rq()  --> (1)
> activate_task(rq, p)
>   enqueue_task() := enqueue_task_fair()
>   enqueue_entity()
>     update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH)
>       if (!se->avg.last_update_time && (flags & DO_ATTACH))
>         attach_entity_load_avg()  --> (2)
> 
> This patch defer attach from (1) to (2)
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  kernel/sched/fair.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e0d34ecdabae..aacf38a72714 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -799,8 +799,6 @@ void init_entity_runnable_average(struct sched_entity *se)
>  	/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
>  }
>  
> -static void attach_entity_cfs_rq(struct sched_entity *se);
> -
>  /*
>   * With new tasks being created, their initial util_avgs are extrapolated
>   * based on the cfs_rq's current util_avg:
> @@ -863,8 +861,6 @@ void post_init_entity_util_avg(struct task_struct *p)
>  		se->avg.last_update_time = cfs_rq_clock_pelt(cfs_rq);
>  		return;
>  	}
> -
> -	attach_entity_cfs_rq(se);
>  }

There are comments with update_cfs_rq_load_avg() and
remove_entity_load_avg() that seem to rely on post_init_entity_util()
doing this attach.

If that is no longer true; at the very least those comments need to be
updated, but also, I don't immediately see why that's no longer the
case, so please explain.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 7/9] sched/fair: allow changing cgroup of new forked task
  2022-08-18 10:36   ` Peter Zijlstra
@ 2022-08-18 10:48     ` Chengming Zhou
  0 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-18 10:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: vincent.guittot, dietmar.eggemann, mingo, rostedt, bsegall,
	vschneid, linux-kernel, tj

On 2022/8/18 18:36, Peter Zijlstra wrote:
> On Thu, Aug 18, 2022 at 11:43:41AM +0800, Chengming Zhou wrote:
> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 8e3f1c3f0b2c..157f7461a08a 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4550,11 +4550,11 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
>>  {
>>  	__sched_fork(clone_flags, p);
>>  	/*
>> -	 * We mark the process as NEW here. This guarantees that
>> +	 * We mark the process as running here. This guarantees that
>>  	 * nobody will actually run it, and a signal or other external
>>  	 * event cannot wake it up and insert it on the runqueue either.
>>  	 */
>> -	p->__state = TASK_NEW;
>> +	p->__state = TASK_RUNNING;
>>  
>>  	/*
>>  	 * Make sure we do not leak PI boosting priority to the child.
>> @@ -4672,7 +4672,6 @@ void wake_up_new_task(struct task_struct *p)
>>  	struct rq *rq;
>>  
>>  	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
>> -	WRITE_ONCE(p->__state, TASK_RUNNING);
>>  #ifdef CONFIG_SMP
>>  	/*
>>  	 * Fork balancing, do it here and not earlier because:
>> @@ -10290,36 +10289,19 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
>>  	sched_unregister_group(tg);
>>  }
> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index eba8a64f905a..e0d34ecdabae 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -11840,6 +11840,13 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>>  #ifdef CONFIG_FAIR_GROUP_SCHED
>>  static void task_change_group_fair(struct task_struct *p)
>>  {
>> +	/*
>> +	 * We couldn't detach or attach a forked task which
>> +	 * hasn't been woken up by wake_up_new_task().
>> +	 */
>> +	if (!p->on_rq && !p->se.sum_exec_runtime)
>> +		return;
>> +
>>  	detach_task_cfs_rq(p);
> 
> Wouldn't that be much clearer when expressed in TASK_NEW ?

Ah, I was stupid, will change to use TASK_NEW.

Thanks for your suggestion!


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 8/9] sched/fair: defer task sched_avg attach to enqueue_entity()
  2022-08-18 10:39   ` Peter Zijlstra
@ 2022-08-18 11:03     ` Chengming Zhou
  0 siblings, 0 replies; 14+ messages in thread
From: Chengming Zhou @ 2022-08-18 11:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: vincent.guittot, dietmar.eggemann, mingo, rostedt, bsegall,
	vschneid, linux-kernel, tj

On 2022/8/18 18:39, Peter Zijlstra wrote:
> On Thu, Aug 18, 2022 at 11:43:42AM +0800, Chengming Zhou wrote:
>> When wake_up_new_task(), we would use post_init_entity_util_avg()
>> to init util_avg/runnable_avg based on cpu's util_avg at that time,
>> then attach task sched_avg to cfs_rq.
>>
>> Since enqueue_entity() would always attach any unattached task entity,
>> so we can defer this work to enqueue_entity().
>>
>> post_init_entity_util_avg(p)
>>   attach_entity_cfs_rq()  --> (1)
>> activate_task(rq, p)
>>   enqueue_task() := enqueue_task_fair()
>>   enqueue_entity()
>>     update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH)
>>       if (!se->avg.last_update_time && (flags & DO_ATTACH))
>>         attach_entity_load_avg()  --> (2)
>>
>> This patch defer attach from (1) to (2)
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>> ---
>>  kernel/sched/fair.c | 4 ----
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index e0d34ecdabae..aacf38a72714 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -799,8 +799,6 @@ void init_entity_runnable_average(struct sched_entity *se)
>>  	/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
>>  }
>>  
>> -static void attach_entity_cfs_rq(struct sched_entity *se);
>> -
>>  /*
>>   * With new tasks being created, their initial util_avgs are extrapolated
>>   * based on the cfs_rq's current util_avg:
>> @@ -863,8 +861,6 @@ void post_init_entity_util_avg(struct task_struct *p)
>>  		se->avg.last_update_time = cfs_rq_clock_pelt(cfs_rq);
>>  		return;
>>  	}
>> -
>> -	attach_entity_cfs_rq(se);
>>  }
> 
> There are comments with update_cfs_rq_load_avg() and
> remove_entity_load_avg() that seem to rely on post_init_entity_util()
> doing this attach.
> 
> If that is no longer true; at the very least those comments need to be
> updated, but also, I don't immediately see why that's no longer the
> case, so please explain.

This attach in post_init_entity_util() will be done in enqueue_entity()
-> update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH) loop.

So these comments should be updated in the next version.

Thanks!


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-08-18 11:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-18  3:43 [PATCH v5 0/9] sched/fair: task load tracking optimization and cleanup Chengming Zhou
2022-08-18  3:43 ` [PATCH v5 1/9] sched/fair: maintain task se depth in set_task_rq() Chengming Zhou
2022-08-18  3:43 ` [PATCH v5 2/9] sched/fair: remove redundant cpu_cgrp_subsys->fork() Chengming Zhou
2022-08-18  3:43 ` [PATCH v5 3/9] sched/fair: reset sched_avg last_update_time before set_task_rq() Chengming Zhou
2022-08-18  3:43 ` [PATCH v5 4/9] sched/fair: update comments in enqueue/dequeue_entity() Chengming Zhou
2022-08-18  3:43 ` [PATCH v5 5/9] sched/fair: combine detach into dequeue when migrating task Chengming Zhou
2022-08-18  3:43 ` [PATCH v5 6/9] sched/fair: fix another detach on unattached task corner case Chengming Zhou
2022-08-18  3:43 ` [PATCH v5 7/9] sched/fair: allow changing cgroup of new forked task Chengming Zhou
2022-08-18 10:36   ` Peter Zijlstra
2022-08-18 10:48     ` Chengming Zhou
2022-08-18  3:43 ` [PATCH v5 8/9] sched/fair: defer task sched_avg attach to enqueue_entity() Chengming Zhou
2022-08-18 10:39   ` Peter Zijlstra
2022-08-18 11:03     ` Chengming Zhou
2022-08-18  3:43 ` [PATCH v5 9/9] sched/fair: don't init util/runnable_avg for !fair task Chengming Zhou

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.