linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] sched/fair: Fix attach and detach sched avgs for task group change or sched class change
@ 2016-06-06  0:20 Yuyang Du
  2016-06-06  0:20 ` [PATCH v4 1/5] sched/fair: Clean up attach_entity_load_avg() Yuyang Du
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Yuyang Du @ 2016-06-06  0:20 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

Hi Peter,

It seems the problem is worse, the attach twice issue can happen not only as
Vincent raised when task moves between groups, but also when switching to fair
class. In addition, for newly forked task, the detach and attach also have
problems.

Thanks a lot to Vincent. This new version mainly addresses his comments to
reword the description of the problem.

Thanks,
Yuyang

--

Yuyang Du (5):
  sched/fair: Clean up attach_entity_load_avg()
  sched/fair: Fix attaching task sched avgs twice when switching to
    fair or changing task group
  sched/fair: Skip detach sched avgs for new task when changing task
    groups
  sched/fair: Move load and util avgs from wake_up_new_task() to
    sched_fork()
  sched/fair: Add inline to detach_entity_load_evg()

 kernel/sched/auto_group.c |    2 +-
 kernel/sched/core.c       |   13 ++++----
 kernel/sched/fair.c       |   76 ++++++++++++++++++++-------------------------
 kernel/sched/sched.h      |    6 ++--
 4 files changed, 45 insertions(+), 52 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 1/5] sched/fair: Clean up attach_entity_load_avg()
  2016-06-06  0:20 [PATCH v4 0/5] sched/fair: Fix attach and detach sched avgs for task group change or sched class change Yuyang Du
@ 2016-06-06  0:20 ` Yuyang Du
  2016-06-06 13:30   ` Matt Fleming
  2016-06-06  0:20 ` [PATCH v4 2/5] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group Yuyang Du
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Yuyang Du @ 2016-06-06  0:20 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

attach_entity_load_avg() is called (indirectly) from:

 - switched_to_fair(): switch between classes to fair
 - task_move_group_fair(): move between task groups
 - enqueue_entity_load_avg(): enqueue entity

Only in switched_to_fair() is it possible that the task's last_update_time
is not 0 and therefore the task needs sched avgs update, so move the task
sched avgs update to switched_to_fair() only. In addition, the code is
refactored and code comments are updated.

No functionality change.

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/fair.c |   43 ++++++++++++++++++++-----------------------
 1 file changed, 20 insertions(+), 23 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c6dd8ba..34e658b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2935,24 +2935,6 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
 
 static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	if (!sched_feat(ATTACH_AGE_LOAD))
-		goto skip_aging;
-
-	/*
-	 * If we got migrated (either between CPUs or between cgroups) we'll
-	 * have aged the average right before clearing @last_update_time.
-	 */
-	if (se->avg.last_update_time) {
-		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
-				  &se->avg, 0, 0, NULL);
-
-		/*
-		 * XXX: we could have just aged the entire load away if we've been
-		 * absent from the fair class for too long.
-		 */
-	}
-
-skip_aging:
 	se->avg.last_update_time = cfs_rq->avg.last_update_time;
 	cfs_rq->avg.load_avg += se->avg.load_avg;
 	cfs_rq->avg.load_sum += se->avg.load_sum;
@@ -2962,6 +2944,19 @@ skip_aging:
 	cfs_rq_util_change(cfs_rq);
 }
 
+static inline void attach_age_load_task(struct rq *rq, struct task_struct *p)
+{
+	struct sched_entity *se = &p->se;
+
+	if (!sched_feat(ATTACH_AGE_LOAD))
+		return;
+
+	if (se->avg.last_update_time) {
+		__update_load_avg(cfs_rq_of(se)->avg.last_update_time, cpu_of(rq),
+				  &se->avg, 0, 0, NULL);
+	}
+}
+
 static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
@@ -3091,6 +3086,7 @@ static inline void
 attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 static inline void
 detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
+static inline void attach_age_load_task(struct rq *rq, struct task_struct *p) {}
 
 static inline int idle_balance(struct rq *rq)
 {
@@ -8390,6 +8386,12 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
 
 static void switched_to_fair(struct rq *rq, struct task_struct *p)
 {
+	/*
+	 * If we change between classes, age the averages before attaching them.
+	 * XXX: we could have just aged the entire load away if we've been
+	 * absent from the fair class for too long.
+	 */
+	attach_age_load_task(rq, p);
 	attach_task_cfs_rq(p);
 
 	if (task_on_rq_queued(p)) {
@@ -8441,11 +8443,6 @@ static void task_move_group_fair(struct task_struct *p)
 {
 	detach_task_cfs_rq(p);
 	set_task_rq(p, task_cpu(p));
-
-#ifdef CONFIG_SMP
-	/* Tell se's cfs_rq has been changed -- migrated */
-	p->se.avg.last_update_time = 0;
-#endif
 	attach_task_cfs_rq(p);
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 2/5] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group
  2016-06-06  0:20 [PATCH v4 0/5] sched/fair: Fix attach and detach sched avgs for task group change or sched class change Yuyang Du
  2016-06-06  0:20 ` [PATCH v4 1/5] sched/fair: Clean up attach_entity_load_avg() Yuyang Du
@ 2016-06-06  0:20 ` Yuyang Du
  2016-06-06  9:54   ` Peter Zijlstra
  2016-06-06 12:32   ` Vincent Guittot
  2016-06-06  0:20 ` [PATCH v4 3/5] sched/fair: Skip detach sched avgs for new task when changing task groups Yuyang Du
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Yuyang Du @ 2016-06-06  0:20 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

Vincent reported that the first task to a new task group's cfs_rq will
be attached in attach_task_cfs_rq() and once more when it is enqueued
(see https://lkml.org/lkml/2016/5/25/388).

Actually, it is worse. The sched avgs can be sometimes attached twice
not only when we change task groups but also when we switch to fair class,
these two scenarios will descripbe in the following respectively.

1) Switch to fair class:

The sched class change is done like this:

	if (queued)
	  enqueue_task();
	check_class_changed()
	  switched_from()
	  switched_to()

If the task is on_rq, it should have already been enqueued, which
MAY have attached sched avgs to the cfs_rq, if so, we shouldn't attach
it again in switched_to(), otherwise, we attach it twice.

To address both the on_rq and !on_rq cases, as well as both the task
was switched from fair and otherwise, the simplest solution is to reset
the task's last_update_time to 0, when the task is switched from fair.
Then let task enqueue do the sched avgs attachment only once.

2) Change between fair task groups:

The task groups are changed like this:

	if (queued)
          dequeue_task()
	task_move_group()
	if (queued)
          enqueue_task()

Unlike the switch to fair class case, if the task is on_rq, it will be
enqueued after we move task groups, so the simplest solution is to reset
the task's last_update_time when we do task_move_group(), but not to
attach sched avgs in task_move_group(), and then let enqueue_task() do
the sched avgs attachment.

Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/fair.c |   47 +++++++++++++++++++++--------------------------
 1 file changed, 21 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 34e658b..28b3415 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2933,7 +2933,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
 		update_tg_load_avg(cfs_rq, 0);
 }
 
-static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+/* Virtually synchronize task with its cfs_rq */
+static inline void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	se->avg.last_update_time = cfs_rq->avg.last_update_time;
 	cfs_rq->avg.load_avg += se->avg.load_avg;
@@ -2944,19 +2945,6 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	cfs_rq_util_change(cfs_rq);
 }
 
-static inline void attach_age_load_task(struct rq *rq, struct task_struct *p)
-{
-	struct sched_entity *se = &p->se;
-
-	if (!sched_feat(ATTACH_AGE_LOAD))
-		return;
-
-	if (se->avg.last_update_time) {
-		__update_load_avg(cfs_rq_of(se)->avg.last_update_time, cpu_of(rq),
-				  &se->avg, 0, 0, NULL);
-	}
-}
-
 static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
@@ -3031,6 +3019,11 @@ static inline u64 cfs_rq_last_update_time(struct cfs_rq *cfs_rq)
 }
 #endif
 
+static inline void reset_task_last_update_time(struct task_struct *p)
+{
+	p->se.avg.last_update_time = 0;
+}
+
 /*
  * Task first catches up with cfs_rq, and then subtract
  * itself from the cfs_rq (task must be off the queue now).
@@ -3083,10 +3076,8 @@ dequeue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 static inline void remove_entity_load_avg(struct sched_entity *se) {}
 
 static inline void
-attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
-static inline void
 detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
-static inline void attach_age_load_task(struct rq *rq, struct task_struct *p) {}
+static inline void reset_task_last_update_time(struct task_struct *p) {}
 
 static inline int idle_balance(struct rq *rq)
 {
@@ -8372,9 +8363,6 @@ static void attach_task_cfs_rq(struct task_struct *p)
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
 #endif
 
-	/* Synchronize task with its cfs_rq */
-	attach_entity_load_avg(cfs_rq, se);
-
 	if (!vruntime_normalized(p))
 		se->vruntime += cfs_rq->min_vruntime;
 }
@@ -8382,16 +8370,18 @@ static void attach_task_cfs_rq(struct task_struct *p)
 static void switched_from_fair(struct rq *rq, struct task_struct *p)
 {
 	detach_task_cfs_rq(p);
+	reset_task_last_update_time(p);
+	/*
+	 * If we change back to fair class, we will attach the sched
+	 * avgs when we are enqueued, which will be done only once. We
+	 * won't have the chance to consistently age the avgs before
+	 * attaching them, so we have to continue with the last updated
+	 * sched avgs when we were detached.
+	 */
 }
 
 static void switched_to_fair(struct rq *rq, struct task_struct *p)
 {
-	/*
-	 * If we change between classes, age the averages before attaching them.
-	 * XXX: we could have just aged the entire load away if we've been
-	 * absent from the fair class for too long.
-	 */
-	attach_age_load_task(rq, p);
 	attach_task_cfs_rq(p);
 
 	if (task_on_rq_queued(p)) {
@@ -8444,6 +8434,11 @@ static void task_move_group_fair(struct task_struct *p)
 	detach_task_cfs_rq(p);
 	set_task_rq(p, task_cpu(p));
 	attach_task_cfs_rq(p);
+	/*
+	 * This assures we will attach the sched avgs when we are enqueued,
+	 * which will be done only once.
+	 */
+	reset_task_last_update_time(p);
 }
 
 void free_fair_sched_group(struct task_group *tg)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 3/5] sched/fair: Skip detach sched avgs for new task when changing task groups
  2016-06-06  0:20 [PATCH v4 0/5] sched/fair: Fix attach and detach sched avgs for task group change or sched class change Yuyang Du
  2016-06-06  0:20 ` [PATCH v4 1/5] sched/fair: Clean up attach_entity_load_avg() Yuyang Du
  2016-06-06  0:20 ` [PATCH v4 2/5] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group Yuyang Du
@ 2016-06-06  0:20 ` Yuyang Du
  2016-06-06  9:58   ` Peter Zijlstra
  2016-06-06 14:03   ` Matt Fleming
  2016-06-06  0:20 ` [PATCH v4 4/5] sched/fair: Move load and util avgs from wake_up_new_task() to sched_fork() Yuyang Du
  2016-06-06  0:20 ` [PATCH v4 5/5] sched/fair: Add inline to detach_entity_load_evg() Yuyang Du
  4 siblings, 2 replies; 19+ messages in thread
From: Yuyang Du @ 2016-06-06  0:20 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

Newly forked task has not been enqueued, so should not be removed from
cfs_rq in task_move_group_fair(). To do so, we need to pass the fork
information all the way from sched_move_task() to task_move_group_fair().

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/auto_group.c |    2 +-
 kernel/sched/core.c       |    8 ++++----
 kernel/sched/fair.c       |    8 ++++++--
 kernel/sched/sched.h      |    4 ++--
 4 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index a5d966c..15f2eb7 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -143,7 +143,7 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
 		goto out;
 
 	for_each_thread(p, t)
-		sched_move_task(t);
+		sched_move_task(t, false);
 out:
 	unlock_task_sighand(p, &flags);
 	autogroup_kref_put(prev);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4..ae5b8a8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7724,7 +7724,7 @@ void sched_offline_group(struct task_group *tg)
  *	by now. This function just updates tsk->se.cfs_rq and tsk->se.parent to
  *	reflect its new group.
  */
-void sched_move_task(struct task_struct *tsk)
+void sched_move_task(struct task_struct *tsk, bool fork)
 {
 	struct task_group *tg;
 	int queued, running;
@@ -7753,7 +7753,7 @@ void sched_move_task(struct task_struct *tsk)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	if (tsk->sched_class->task_move_group)
-		tsk->sched_class->task_move_group(tsk);
+		tsk->sched_class->task_move_group(tsk, fork);
 	else
 #endif
 		set_task_rq(tsk, task_cpu(tsk));
@@ -8186,7 +8186,7 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
 
 static void cpu_cgroup_fork(struct task_struct *task)
 {
-	sched_move_task(task);
+	sched_move_task(task, true);
 }
 
 static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
@@ -8213,7 +8213,7 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 	struct cgroup_subsys_state *css;
 
 	cgroup_taskset_for_each(task, css, tset)
-		sched_move_task(task);
+		sched_move_task(task, false);
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 28b3415..370e284 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8429,9 +8429,13 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-static void task_move_group_fair(struct task_struct *p)
+static void task_move_group_fair(struct task_struct *p, bool fork)
 {
-	detach_task_cfs_rq(p);
+	/*
+	 * Newly forked task should not be removed from any cfs_rq
+	 */
+	if (!fork)
+		detach_task_cfs_rq(p);
 	set_task_rq(p, task_cpu(p));
 	attach_task_cfs_rq(p);
 	/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 72f1f30..c139ec4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -343,7 +343,7 @@ extern void sched_online_group(struct task_group *tg,
 extern void sched_destroy_group(struct task_group *tg);
 extern void sched_offline_group(struct task_group *tg);
 
-extern void sched_move_task(struct task_struct *tsk);
+extern void sched_move_task(struct task_struct *tsk, bool fork);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 extern int sched_group_set_shares(struct task_group *tg, unsigned long shares);
@@ -1247,7 +1247,7 @@ struct sched_class {
 	void (*update_curr) (struct rq *rq);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-	void (*task_move_group) (struct task_struct *p);
+	void (*task_move_group) (struct task_struct *p, bool fork);
 #endif
 };
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 4/5] sched/fair: Move load and util avgs from wake_up_new_task() to sched_fork()
  2016-06-06  0:20 [PATCH v4 0/5] sched/fair: Fix attach and detach sched avgs for task group change or sched class change Yuyang Du
                   ` (2 preceding siblings ...)
  2016-06-06  0:20 ` [PATCH v4 3/5] sched/fair: Skip detach sched avgs for new task when changing task groups Yuyang Du
@ 2016-06-06  0:20 ` Yuyang Du
  2016-06-06  0:20 ` [PATCH v4 5/5] sched/fair: Add inline to detach_entity_load_evg() Yuyang Du
  4 siblings, 0 replies; 19+ messages in thread
From: Yuyang Du @ 2016-06-06  0:20 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

Move new task initialization to sched_fork(). For initial non-fair class
task, the first switched_to_fair() will do the attach correctly.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/core.c  |    5 +++--
 kernel/sched/fair.c  |   14 +++++---------
 kernel/sched/sched.h |    2 +-
 3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ae5b8a8..77a8a2b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2370,6 +2370,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 	if (p->sched_class->task_fork)
 		p->sched_class->task_fork(p);
 
+	/* Initialize new task's sched averages */
+	init_entity_sched_avg(&p->se);
+
 	/*
 	 * The child is not yet in the pid-hash so no cgroup attach races,
 	 * and the cgroup is pinned to this child due to cgroup_fork()
@@ -2510,8 +2513,6 @@ void wake_up_new_task(struct task_struct *p)
 	struct rq_flags rf;
 	struct rq *rq;
 
-	/* Initialize new task's runnable average */
-	init_entity_runnable_average(&p->se);
 	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 #ifdef CONFIG_SMP
 	/*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 370e284..a5a7149 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -668,8 +668,8 @@ static unsigned long task_h_load(struct task_struct *p);
 #define LOAD_AVG_MAX 47742 /* maximum possible load avg */
 #define LOAD_AVG_MAX_N 345 /* number of full periods to produce LOAD_AVG_MAX */
 
-/* Give new sched_entity start runnable values to heavy its load in infant time */
-void init_entity_runnable_average(struct sched_entity *se)
+/* Give new sched_entity start load values to heavy its load in infant time */
+void init_entity_sched_avg(struct sched_entity *se)
 {
 	struct sched_avg *sa = &se->avg;
 
@@ -738,12 +738,8 @@ void post_init_entity_util_avg(struct sched_entity *se)
 static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq);
 static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq);
 #else
-void init_entity_runnable_average(struct sched_entity *se)
-{
-}
-void post_init_entity_util_avg(struct sched_entity *se)
-{
-}
+void init_entity_sched_avg(struct sched_entity *se) { }
+void post_init_entity_util_avg(struct sched_entity *se) { }
 #endif
 
 /*
@@ -8492,7 +8488,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
 
 		init_cfs_rq(cfs_rq);
 		init_tg_cfs_entry(tg, cfs_rq, se, i, parent->se[i]);
-		init_entity_runnable_average(se);
+		init_entity_sched_avg(se);
 		post_init_entity_util_avg(se);
 	}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c139ec4..bc9c99e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1321,7 +1321,7 @@ extern void init_dl_task_timer(struct sched_dl_entity *dl_se);
 
 unsigned long to_ratio(u64 period, u64 runtime);
 
-extern void init_entity_runnable_average(struct sched_entity *se);
+extern void init_entity_sched_avg(struct sched_entity *se);
 extern void post_init_entity_util_avg(struct sched_entity *se);
 
 #ifdef CONFIG_NO_HZ_FULL
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 5/5] sched/fair: Add inline to detach_entity_load_evg()
  2016-06-06  0:20 [PATCH v4 0/5] sched/fair: Fix attach and detach sched avgs for task group change or sched class change Yuyang Du
                   ` (3 preceding siblings ...)
  2016-06-06  0:20 ` [PATCH v4 4/5] sched/fair: Move load and util avgs from wake_up_new_task() to sched_fork() Yuyang Du
@ 2016-06-06  0:20 ` Yuyang Du
  4 siblings, 0 replies; 19+ messages in thread
From: Yuyang Du @ 2016-06-06  0:20 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

detach_entity_load_evg() is only called by detach_task_cfs_rq(), so
explicitly add inline attribute to it.

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/fair.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a5a7149..1b6adfc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2941,7 +2941,8 @@ static inline void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_en
 	cfs_rq_util_change(cfs_rq);
 }
 
-static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+/* Catch up with the cfs_rq and remove our load when we leave */
+static inline void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
 			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
@@ -8342,7 +8343,6 @@ static void detach_task_cfs_rq(struct task_struct *p)
 		se->vruntime -= cfs_rq->min_vruntime;
 	}
 
-	/* Catch up with the cfs_rq and remove our load when we leave */
 	detach_entity_load_avg(cfs_rq, se);
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group
  2016-06-06  0:20 ` [PATCH v4 2/5] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group Yuyang Du
@ 2016-06-06  9:54   ` Peter Zijlstra
  2016-06-06 19:38     ` Yuyang Du
  2016-06-06 12:32   ` Vincent Guittot
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2016-06-06  9:54 UTC (permalink / raw)
  To: Yuyang Du
  Cc: mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Mon, Jun 06, 2016 at 08:20:38AM +0800, Yuyang Du wrote:
> Vincent reported that the first task to a new task group's cfs_rq will
> be attached in attach_task_cfs_rq() and once more when it is enqueued
> (see https://lkml.org/lkml/2016/5/25/388).
> 
> Actually, it is worse. The sched avgs can be sometimes attached twice
> not only when we change task groups but also when we switch to fair class,
> these two scenarios will descripbe in the following respectively.
> 
> 1) Switch to fair class:
> 
> The sched class change is done like this:
> 
> 	if (queued)
> 	  enqueue_task();
> 	check_class_changed()
> 	  switched_from()
> 	  switched_to()
> 
> If the task is on_rq, it should have already been enqueued, which
> MAY have attached sched avgs to the cfs_rq, if so, we shouldn't attach
> it again in switched_to(), otherwise, we attach it twice.
> 
> To address both the on_rq and !on_rq cases, as well as both the task
> was switched from fair and otherwise, the simplest solution is to reset
> the task's last_update_time to 0, when the task is switched from fair.
> Then let task enqueue do the sched avgs attachment only once.
> 
> 2) Change between fair task groups:
> 
> The task groups are changed like this:
> 
> 	if (queued)
>           dequeue_task()
> 	task_move_group()
> 	if (queued)
>           enqueue_task()
> 
> Unlike the switch to fair class case, if the task is on_rq, it will be
> enqueued after we move task groups, so the simplest solution is to reset
> the task's last_update_time when we do task_move_group(), but not to
> attach sched avgs in task_move_group(), and then let enqueue_task() do
> the sched avgs attachment.

So this patch completely removes the detach->attach aging you moved
around in the previous patch -- leading me to wonder what the purpose of
the previous patch was.

Also, this Changelog completely fails to mention this fact, nor does it
explain why this is 'right'.

> +/* Virtually synchronize task with its cfs_rq */

I don't feel this comment actually enlightens the function much.

> @@ -8372,9 +8363,6 @@ static void attach_task_cfs_rq(struct task_struct *p)
>  	se->depth = se->parent ? se->parent->depth + 1 : 0;
>  #endif
>  
> -	/* Synchronize task with its cfs_rq */
> -	attach_entity_load_avg(cfs_rq, se);
> -
>  	if (!vruntime_normalized(p))
>  		se->vruntime += cfs_rq->min_vruntime;
>  }

You leave attach/detach asymmetric and not a comment in sight explaining
why.

> @@ -8382,16 +8370,18 @@ static void attach_task_cfs_rq(struct task_struct *p)
>  static void switched_from_fair(struct rq *rq, struct task_struct *p)
>  {
>  	detach_task_cfs_rq(p);
> +	reset_task_last_update_time(p);
> +	/*
> +	 * If we change back to fair class, we will attach the sched
> +	 * avgs when we are enqueued, which will be done only once. We
> +	 * won't have the chance to consistently age the avgs before
> +	 * attaching them, so we have to continue with the last updated
> +	 * sched avgs when we were detached.
> +	 */

This comment needs improvement; it confuses.

> @@ -8444,6 +8434,11 @@ static void task_move_group_fair(struct task_struct *p)
>  	detach_task_cfs_rq(p);
>  	set_task_rq(p, task_cpu(p));
>  	attach_task_cfs_rq(p);
> +	/*
> +	 * This assures we will attach the sched avgs when we are enqueued,

"ensures" ? Also, more confusion.

> +	 * which will be done only once.
> +	 */
> +	reset_task_last_update_time(p);
>  }

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] sched/fair: Skip detach sched avgs for new task when changing task groups
  2016-06-06  0:20 ` [PATCH v4 3/5] sched/fair: Skip detach sched avgs for new task when changing task groups Yuyang Du
@ 2016-06-06  9:58   ` Peter Zijlstra
  2016-06-06 14:03   ` Matt Fleming
  1 sibling, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2016-06-06  9:58 UTC (permalink / raw)
  To: Yuyang Du
  Cc: mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Mon, Jun 06, 2016 at 08:20:39AM +0800, Yuyang Du wrote:
> +static void task_move_group_fair(struct task_struct *p, bool fork)
>  {
> -	detach_task_cfs_rq(p);
> +	/*
> +	 * Newly forked task should not be removed from any cfs_rq

That's a pointless comment; that's exactly what the code _does_.
Comments should explain the code, not restate it.

So a comment like:

	 * Newly forked tasks are not attached yet.

Is entirely more useful.

> +	 */
> +	if (!fork)
> +		detach_task_cfs_rq(p);
>  	set_task_rq(p, task_cpu(p));
>  	attach_task_cfs_rq(p);
>  	/*

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group
  2016-06-06  0:20 ` [PATCH v4 2/5] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group Yuyang Du
  2016-06-06  9:54   ` Peter Zijlstra
@ 2016-06-06 12:32   ` Vincent Guittot
  2016-06-06 19:05     ` Yuyang Du
  1 sibling, 1 reply; 19+ messages in thread
From: Vincent Guittot @ 2016-06-06 12:32 UTC (permalink / raw)
  To: Yuyang Du
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Benjamin Segall,
	Paul Turner, Morten Rasmussen, Dietmar Eggemann

Hi Yuyang,

On 6 June 2016 at 02:20, Yuyang Du <yuyang.du@intel.com> wrote:
> Vincent reported that the first task to a new task group's cfs_rq will
> be attached in attach_task_cfs_rq() and once more when it is enqueued
> (see https://lkml.org/lkml/2016/5/25/388).
>
> Actually, it is worse. The sched avgs can be sometimes attached twice
> not only when we change task groups but also when we switch to fair class,
> these two scenarios will descripbe in the following respectively.
>
> 1) Switch to fair class:
>
> The sched class change is done like this:
>
>         if (queued)
>           enqueue_task();
>         check_class_changed()
>           switched_from()
>           switched_to()
>
> If the task is on_rq, it should have already been enqueued, which
> MAY have attached sched avgs to the cfs_rq, if so, we shouldn't attach
> it again in switched_to(), otherwise, we attach it twice.
>
> To address both the on_rq and !on_rq cases, as well as both the task
> was switched from fair and otherwise, the simplest solution is to reset
> the task's last_update_time to 0, when the task is switched from fair.
> Then let task enqueue do the sched avgs attachment only once.
>
> 2) Change between fair task groups:
>
> The task groups are changed like this:
>
>         if (queued)
>           dequeue_task()
>         task_move_group()
>         if (queued)
>           enqueue_task()
>
> Unlike the switch to fair class case, if the task is on_rq, it will be
> enqueued after we move task groups, so the simplest solution is to reset
> the task's last_update_time when we do task_move_group(), but not to
> attach sched avgs in task_move_group(), and then let enqueue_task() do
> the sched avgs attachment.

According to the review of the previous version
http://www.gossamer-threads.com/lists/linux/kernel/2450678#2450678,
only the use case switch to fair with a task that has never been
queued as a CFS task before, can have the issue and this will not
happen for other use cases described above.
So can you limit the description in the commit message to just this
use case unless you have discovered new use cases and in this case
please add the description.

Then, the problem with this use case, comes that last_update_time == 0
has a special meaning ( task has migrated ) and we initialize
last_update_time with this value. A much more simple solution would be
to prevent last_update_time to be initialized with this special value.
We can initialize the last_update_time of a sched_entity to 1 as an
example which is easier than these changes. Furthermore,  this patch
makes a task to be never decayed for the time that elapsed between its
detach and its next enqueue on a cfs_rq which can be quite long
duration and will make the task been far higher than it should.

>
> Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Yuyang Du <yuyang.du@intel.com>
> ---
>  kernel/sched/fair.c |   47 +++++++++++++++++++++--------------------------
>  1 file changed, 21 insertions(+), 26 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 34e658b..28b3415 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2933,7 +2933,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
>                 update_tg_load_avg(cfs_rq, 0);
>  }
>
> -static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> +/* Virtually synchronize task with its cfs_rq */
> +static inline void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
>         se->avg.last_update_time = cfs_rq->avg.last_update_time;
>         cfs_rq->avg.load_avg += se->avg.load_avg;
> @@ -2944,19 +2945,6 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>         cfs_rq_util_change(cfs_rq);
>  }
>
> -static inline void attach_age_load_task(struct rq *rq, struct task_struct *p)
> -{
> -       struct sched_entity *se = &p->se;
> -
> -       if (!sched_feat(ATTACH_AGE_LOAD))
> -               return;
> -
> -       if (se->avg.last_update_time) {
> -               __update_load_avg(cfs_rq_of(se)->avg.last_update_time, cpu_of(rq),
> -                                 &se->avg, 0, 0, NULL);
> -       }
> -}
> -
>  static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
>         __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> @@ -3031,6 +3019,11 @@ static inline u64 cfs_rq_last_update_time(struct cfs_rq *cfs_rq)
>  }
>  #endif
>
> +static inline void reset_task_last_update_time(struct task_struct *p)
> +{
> +       p->se.avg.last_update_time = 0;
> +}
> +
>  /*
>   * Task first catches up with cfs_rq, and then subtract
>   * itself from the cfs_rq (task must be off the queue now).
> @@ -3083,10 +3076,8 @@ dequeue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
>  static inline void remove_entity_load_avg(struct sched_entity *se) {}
>
>  static inline void
> -attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
> -static inline void
>  detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
> -static inline void attach_age_load_task(struct rq *rq, struct task_struct *p) {}
> +static inline void reset_task_last_update_time(struct task_struct *p) {}
>
>  static inline int idle_balance(struct rq *rq)
>  {
> @@ -8372,9 +8363,6 @@ static void attach_task_cfs_rq(struct task_struct *p)
>         se->depth = se->parent ? se->parent->depth + 1 : 0;
>  #endif
>
> -       /* Synchronize task with its cfs_rq */
> -       attach_entity_load_avg(cfs_rq, se);
> -
>         if (!vruntime_normalized(p))
>                 se->vruntime += cfs_rq->min_vruntime;
>  }
> @@ -8382,16 +8370,18 @@ static void attach_task_cfs_rq(struct task_struct *p)
>  static void switched_from_fair(struct rq *rq, struct task_struct *p)
>  {
>         detach_task_cfs_rq(p);
> +       reset_task_last_update_time(p);
> +       /*
> +        * If we change back to fair class, we will attach the sched
> +        * avgs when we are enqueued, which will be done only once. We
> +        * won't have the chance to consistently age the avgs before
> +        * attaching them, so we have to continue with the last updated
> +        * sched avgs when we were detached.
> +        */
>  }
>
>  static void switched_to_fair(struct rq *rq, struct task_struct *p)
>  {
> -       /*
> -        * If we change between classes, age the averages before attaching them.
> -        * XXX: we could have just aged the entire load away if we've been
> -        * absent from the fair class for too long.
> -        */
> -       attach_age_load_task(rq, p);
>         attach_task_cfs_rq(p);
>
>         if (task_on_rq_queued(p)) {
> @@ -8444,6 +8434,11 @@ static void task_move_group_fair(struct task_struct *p)
>         detach_task_cfs_rq(p);
>         set_task_rq(p, task_cpu(p));
>         attach_task_cfs_rq(p);
> +       /*
> +        * This assures we will attach the sched avgs when we are enqueued,
> +        * which will be done only once.
> +        */
> +       reset_task_last_update_time(p);
>  }
>
>  void free_fair_sched_group(struct task_group *tg)
> --
> 1.7.9.5
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/5] sched/fair: Clean up attach_entity_load_avg()
  2016-06-06  0:20 ` [PATCH v4 1/5] sched/fair: Clean up attach_entity_load_avg() Yuyang Du
@ 2016-06-06 13:30   ` Matt Fleming
  2016-06-06 13:40     ` Vincent Guittot
  0 siblings, 1 reply; 19+ messages in thread
From: Matt Fleming @ 2016-06-06 13:30 UTC (permalink / raw)
  To: Yuyang Du
  Cc: peterz, mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Mon, 06 Jun, at 08:20:37AM, Yuyang Du wrote:
> attach_entity_load_avg() is called (indirectly) from:
> 
>  - switched_to_fair(): switch between classes to fair
>  - task_move_group_fair(): move between task groups
>  - enqueue_entity_load_avg(): enqueue entity
> 
> Only in switched_to_fair() is it possible that the task's last_update_time
> is not 0 and therefore the task needs sched avgs update, so move the task
> sched avgs update to switched_to_fair() only. In addition, the code is
> refactored and code comments are updated.
> 
> No functionality change.
> 
> Signed-off-by: Yuyang Du <yuyang.du@intel.com>
> ---
>  kernel/sched/fair.c |   43 ++++++++++++++++++++-----------------------
>  1 file changed, 20 insertions(+), 23 deletions(-)

What about when tasks are moved between groups on CONFIG_SMP=n ?
Doesn't this change fail to age the load in that case?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/5] sched/fair: Clean up attach_entity_load_avg()
  2016-06-06 13:30   ` Matt Fleming
@ 2016-06-06 13:40     ` Vincent Guittot
  2016-06-06 14:06       ` Matt Fleming
  2016-06-06 14:27       ` Peter Zijlstra
  0 siblings, 2 replies; 19+ messages in thread
From: Vincent Guittot @ 2016-06-06 13:40 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Yuyang Du, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Benjamin Segall, Paul Turner, Morten Rasmussen, Dietmar Eggemann

Hi Matt,

On 6 June 2016 at 15:30, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Mon, 06 Jun, at 08:20:37AM, Yuyang Du wrote:
>> attach_entity_load_avg() is called (indirectly) from:
>>
>>  - switched_to_fair(): switch between classes to fair
>>  - task_move_group_fair(): move between task groups
>>  - enqueue_entity_load_avg(): enqueue entity
>>
>> Only in switched_to_fair() is it possible that the task's last_update_time
>> is not 0 and therefore the task needs sched avgs update, so move the task
>> sched avgs update to switched_to_fair() only. In addition, the code is
>> refactored and code comments are updated.
>>
>> No functionality change.
>>
>> Signed-off-by: Yuyang Du <yuyang.du@intel.com>
>> ---
>>  kernel/sched/fair.c |   43 ++++++++++++++++++++-----------------------
>>  1 file changed, 20 insertions(+), 23 deletions(-)
>
> What about when tasks are moved between groups on CONFIG_SMP=n ?
> Doesn't this change fail to age the load in that case?

The load average stuff is only enable for SMP system but it's not used
for UP system.
To be honest, I don't know why

Vincent

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] sched/fair: Skip detach sched avgs for new task when changing task groups
  2016-06-06  0:20 ` [PATCH v4 3/5] sched/fair: Skip detach sched avgs for new task when changing task groups Yuyang Du
  2016-06-06  9:58   ` Peter Zijlstra
@ 2016-06-06 14:03   ` Matt Fleming
  2016-06-06 19:15     ` Yuyang Du
  1 sibling, 1 reply; 19+ messages in thread
From: Matt Fleming @ 2016-06-06 14:03 UTC (permalink / raw)
  To: Yuyang Du
  Cc: peterz, mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Mon, 06 Jun, at 08:20:39AM, Yuyang Du wrote:
> Newly forked task has not been enqueued, so should not be removed from
> cfs_rq in task_move_group_fair(). To do so, we need to pass the fork
> information all the way from sched_move_task() to task_move_group_fair().
> 
> Signed-off-by: Yuyang Du <yuyang.du@intel.com>
> ---
>  kernel/sched/auto_group.c |    2 +-
>  kernel/sched/core.c       |    8 ++++----
>  kernel/sched/fair.c       |    8 ++++++--
>  kernel/sched/sched.h      |    4 ++--
>  4 files changed, 13 insertions(+), 9 deletions(-)

[...]

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 28b3415..370e284 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8429,9 +8429,13 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>  }
>  
>  #ifdef CONFIG_FAIR_GROUP_SCHED
> -static void task_move_group_fair(struct task_struct *p)
> +static void task_move_group_fair(struct task_struct *p, bool fork)
>  {
> -	detach_task_cfs_rq(p);
> +	/*
> +	 * Newly forked task should not be removed from any cfs_rq
> +	 */
> +	if (!fork)
> +		detach_task_cfs_rq(p);
>  	set_task_rq(p, task_cpu(p));
>  	attach_task_cfs_rq(p);
>  	/*

Wouldn't it be more symmetric to add,

	if (p->se.avg.last_update_time)
		__update_load_avg(...)

to detach_entity_load_avg() so that there's no need to pass the @fork
parameter all the way down the stack, and more importantly, remember
to *not* call detach_task_cfs_rq() if the code changes in the future?

Additionally adding the above check looks like it would fix a race
that (I think) occurs if a newly forked task's class is changed before
it is enqueued in wake_up_new_task().

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/5] sched/fair: Clean up attach_entity_load_avg()
  2016-06-06 13:40     ` Vincent Guittot
@ 2016-06-06 14:06       ` Matt Fleming
  2016-06-06 14:27       ` Peter Zijlstra
  1 sibling, 0 replies; 19+ messages in thread
From: Matt Fleming @ 2016-06-06 14:06 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Yuyang Du, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Benjamin Segall, Paul Turner, Morten Rasmussen, Dietmar Eggemann

On Mon, 06 Jun, at 03:40:18PM, Vincent Guittot wrote:
> 
> The load average stuff is only enable for SMP system but it's not used
> for UP system.

Duh, right. My bad.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/5] sched/fair: Clean up attach_entity_load_avg()
  2016-06-06 13:40     ` Vincent Guittot
  2016-06-06 14:06       ` Matt Fleming
@ 2016-06-06 14:27       ` Peter Zijlstra
  1 sibling, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2016-06-06 14:27 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Matt Fleming, Yuyang Du, Ingo Molnar, linux-kernel,
	Benjamin Segall, Paul Turner, Morten Rasmussen, Dietmar Eggemann

On Mon, Jun 06, 2016 at 03:40:18PM +0200, Vincent Guittot wrote:
> > What about when tasks are moved between groups on CONFIG_SMP=n ?
> > Doesn't this change fail to age the load in that case?
> 
> The load average stuff is only enable for SMP system but it's not used
> for UP system.
> To be honest, I don't know why

We'll likely change that 'soon', because the cpufreq_schedutil thing
also needs these numbers on UP.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group
  2016-06-06 12:32   ` Vincent Guittot
@ 2016-06-06 19:05     ` Yuyang Du
  2016-06-07  8:09       ` Vincent Guittot
  0 siblings, 1 reply; 19+ messages in thread
From: Yuyang Du @ 2016-06-06 19:05 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Benjamin Segall,
	Paul Turner, Morten Rasmussen, Dietmar Eggemann

Hi Vincent,

On Mon, Jun 06, 2016 at 02:32:39PM +0200, Vincent Guittot wrote:
> > Unlike the switch to fair class case, if the task is on_rq, it will be
> > enqueued after we move task groups, so the simplest solution is to reset
> > the task's last_update_time when we do task_move_group(), but not to
> > attach sched avgs in task_move_group(), and then let enqueue_task() do
> > the sched avgs attachment.
> 
> According to the review of the previous version
> http://www.gossamer-threads.com/lists/linux/kernel/2450678#2450678,
> only the use case switch to fair with a task that has never been
> queued as a CFS task before, can have the issue and this will not
> happen for other use cases described above.
> So can you limit the description in the commit message to just this
> use case unless you have discovered new use cases and in this case
> please add the description.

I assure you that I will, :)
 
> Then, the problem with this use case, comes that last_update_time == 0
> has a special meaning ( task has migrated ) and we initialize
> last_update_time with this value.

In general, the meaning of last_update_time == 0 is: not yet attached to
any cfs queue.

> A much more simple solution would be
> to prevent last_update_time to be initialized with this special value.
> We can initialize the last_update_time of a sched_entity to 1 as an
> example which is easier than these changes.

Then at least, we must differentiate CONFIG_FAIR_GROUP_SCHED and
!CONFIG_FAIR_GROUP_SCHED. And I am not sure whether this is a simpler
solution, as I haven't sorted out the whole thing. IMO, this solution
seems a little too hacky and I'd prefer not if we have alternative.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] sched/fair: Skip detach sched avgs for new task when changing task groups
  2016-06-06 14:03   ` Matt Fleming
@ 2016-06-06 19:15     ` Yuyang Du
  0 siblings, 0 replies; 19+ messages in thread
From: Yuyang Du @ 2016-06-06 19:15 UTC (permalink / raw)
  To: Matt Fleming
  Cc: peterz, mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Mon, Jun 06, 2016 at 03:03:38PM +0100, Matt Fleming wrote:
> >  #ifdef CONFIG_FAIR_GROUP_SCHED
> > -static void task_move_group_fair(struct task_struct *p)
> > +static void task_move_group_fair(struct task_struct *p, bool fork)
> >  {
> > -	detach_task_cfs_rq(p);
> > +	/*
> > +	 * Newly forked task should not be removed from any cfs_rq
> > +	 */
> > +	if (!fork)
> > +		detach_task_cfs_rq(p);
> >  	set_task_rq(p, task_cpu(p));
> >  	attach_task_cfs_rq(p);
> >  	/*
> 
> Wouldn't it be more symmetric to add,
> 
> 	if (p->se.avg.last_update_time)
> 		__update_load_avg(...)
> 
> to detach_entity_load_avg() so that there's no need to pass the @fork
> parameter all the way down the stack

Up to this point, we can't. But we should be able to if we move the next
patch (4/5) before this patch. I should have changed my mindset with that
patch. Good advice though, thanks, Matt.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group
  2016-06-06  9:54   ` Peter Zijlstra
@ 2016-06-06 19:38     ` Yuyang Du
  0 siblings, 0 replies; 19+ messages in thread
From: Yuyang Du @ 2016-06-06 19:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Mon, Jun 06, 2016 at 11:54:50AM +0200, Peter Zijlstra wrote:
> > The task groups are changed like this:
> > 
> > 	if (queued)
> >           dequeue_task()
> > 	task_move_group()
> > 	if (queued)
> >           enqueue_task()
> > 
> > Unlike the switch to fair class case, if the task is on_rq, it will be
> > enqueued after we move task groups, so the simplest solution is to reset
> > the task's last_update_time when we do task_move_group(), but not to
> > attach sched avgs in task_move_group(), and then let enqueue_task() do
> > the sched avgs attachment.
> 
> So this patch completely removes the detach->attach aging you moved
> around in the previous patch -- leading me to wonder what the purpose of
> the previous patch was.
 
Basically, they address different issues, and should not be conflated.

> Also, this Changelog completely fails to mention this fact, nor does it
> explain why this is 'right'.
 
I should have explained this in the changelog. It is "right", because when a
task switches to fair, it is most likely "we could have just aged the entire
load away" as the XXX comment said. But despite that, basically we have a lost
record time, aging or not aging by that time, it doesn't occur to me one is
definitely better than the other. I will make a comment in the code explaining
this. You think?

> > +/* Virtually synchronize task with its cfs_rq */
> 
> I don't feel this comment actually enlightens the function much.
 
Synchronize without update, so virtually, :)

> > @@ -8372,9 +8363,6 @@ static void attach_task_cfs_rq(struct task_struct *p)
> >  	se->depth = se->parent ? se->parent->depth + 1 : 0;
> >  #endif
> >  
> > -	/* Synchronize task with its cfs_rq */
> > -	attach_entity_load_avg(cfs_rq, se);
> > -
> >  	if (!vruntime_normalized(p))
> >  		se->vruntime += cfs_rq->min_vruntime;
> >  }
> 
> You leave attach/detach asymmetric and not a comment in sight explaining
> why.
 
So it is asymmetric because we uniformly attach in enqueue. I will explain.
This also relates to the following code comments and your comments.

> > @@ -8382,16 +8370,18 @@ static void attach_task_cfs_rq(struct task_struct *p)
> >  static void switched_from_fair(struct rq *rq, struct task_struct *p)
> >  {
> >  	detach_task_cfs_rq(p);
> > +	reset_task_last_update_time(p);
> > +	/*
> > +	 * If we change back to fair class, we will attach the sched
> > +	 * avgs when we are enqueued, which will be done only once. We
> > +	 * won't have the chance to consistently age the avgs before
> > +	 * attaching them, so we have to continue with the last updated
> > +	 * sched avgs when we were detached.
> > +	 */
> 
> This comment needs improvement; it confuses.
> 
> > @@ -8444,6 +8434,11 @@ static void task_move_group_fair(struct task_struct *p)
> >  	detach_task_cfs_rq(p);
> >  	set_task_rq(p, task_cpu(p));
> >  	attach_task_cfs_rq(p);
> > +	/*
> > +	 * This assures we will attach the sched avgs when we are enqueued,
> 
> "ensures" ? Also, more confusion.
> 
> > +	 * which will be done only once.
> > +	 */
> > +	reset_task_last_update_time(p);
> >  }
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group
  2016-06-06 19:05     ` Yuyang Du
@ 2016-06-07  8:09       ` Vincent Guittot
  2016-06-07 18:16         ` Yuyang Du
  0 siblings, 1 reply; 19+ messages in thread
From: Vincent Guittot @ 2016-06-07  8:09 UTC (permalink / raw)
  To: Yuyang Du
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Benjamin Segall,
	Paul Turner, Morten Rasmussen, Dietmar Eggemann

On 6 June 2016 at 21:05, Yuyang Du <yuyang.du@intel.com> wrote:
> Hi Vincent,
>
> On Mon, Jun 06, 2016 at 02:32:39PM +0200, Vincent Guittot wrote:
>> > Unlike the switch to fair class case, if the task is on_rq, it will be
>> > enqueued after we move task groups, so the simplest solution is to reset
>> > the task's last_update_time when we do task_move_group(), but not to
>> > attach sched avgs in task_move_group(), and then let enqueue_task() do
>> > the sched avgs attachment.
>>
>> According to the review of the previous version
>> http://www.gossamer-threads.com/lists/linux/kernel/2450678#2450678,
>> only the use case switch to fair with a task that has never been
>> queued as a CFS task before, can have the issue and this will not
>> happen for other use cases described above.
>> So can you limit the description in the commit message to just this
>> use case unless you have discovered new use cases and in this case
>> please add the description.
>
> I assure you that I will, :)
>
>> Then, the problem with this use case, comes that last_update_time == 0
>> has a special meaning ( task has migrated ) and we initialize
>> last_update_time with this value.
>
> In general, the meaning of last_update_time == 0 is: not yet attached to
> any cfs queue.
>
>> A much more simple solution would be
>> to prevent last_update_time to be initialized with this special value.
>> We can initialize the last_update_time of a sched_entity to 1 as an
>> example which is easier than these changes.
>
> Then at least, we must differentiate CONFIG_FAIR_GROUP_SCHED and
> !CONFIG_FAIR_GROUP_SCHED. And I am not sure whether this is a simpler

Why do you want to differentiate ? we already have
sa->last_update_time = 0; in init_entity_runnable_average. We just
have to change it by sa->last_update_time = 1;

> solution, as I haven't sorted out the whole thing. IMO, this solution
> seems a little too hacky and I'd prefer not if we have alternative.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/5] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group
  2016-06-07  8:09       ` Vincent Guittot
@ 2016-06-07 18:16         ` Yuyang Du
  0 siblings, 0 replies; 19+ messages in thread
From: Yuyang Du @ 2016-06-07 18:16 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Benjamin Segall,
	Paul Turner, Morten Rasmussen, Dietmar Eggemann

On Tue, Jun 07, 2016 at 10:09:14AM +0200, Vincent Guittot wrote:
> >> A much more simple solution would be
> >> to prevent last_update_time to be initialized with this special value.
> >> We can initialize the last_update_time of a sched_entity to 1 as an
> >> example which is easier than these changes.
> >
> > Then at least, we must differentiate CONFIG_FAIR_GROUP_SCHED and
> > !CONFIG_FAIR_GROUP_SCHED. And I am not sure whether this is a simpler
> 
> Why do you want to differentiate ? we already have
> sa->last_update_time = 0; in init_entity_runnable_average. We just
> have to change it by sa->last_update_time = 1;

If !CONFIG_FAIR_GROUP_SCHED, we don't have task_move_group_fair().
If last_update_time != 0, we won't attach it for new task. Correct?

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-06-08  2:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-06  0:20 [PATCH v4 0/5] sched/fair: Fix attach and detach sched avgs for task group change or sched class change Yuyang Du
2016-06-06  0:20 ` [PATCH v4 1/5] sched/fair: Clean up attach_entity_load_avg() Yuyang Du
2016-06-06 13:30   ` Matt Fleming
2016-06-06 13:40     ` Vincent Guittot
2016-06-06 14:06       ` Matt Fleming
2016-06-06 14:27       ` Peter Zijlstra
2016-06-06  0:20 ` [PATCH v4 2/5] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group Yuyang Du
2016-06-06  9:54   ` Peter Zijlstra
2016-06-06 19:38     ` Yuyang Du
2016-06-06 12:32   ` Vincent Guittot
2016-06-06 19:05     ` Yuyang Du
2016-06-07  8:09       ` Vincent Guittot
2016-06-07 18:16         ` Yuyang Du
2016-06-06  0:20 ` [PATCH v4 3/5] sched/fair: Skip detach sched avgs for new task when changing task groups Yuyang Du
2016-06-06  9:58   ` Peter Zijlstra
2016-06-06 14:03   ` Matt Fleming
2016-06-06 19:15     ` Yuyang Du
2016-06-06  0:20 ` [PATCH v4 4/5] sched/fair: Move load and util avgs from wake_up_new_task() to sched_fork() Yuyang Du
2016-06-06  0:20 ` [PATCH v4 5/5] sched/fair: Add inline to detach_entity_load_evg() Yuyang Du

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).