linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/4] sched/fair: Fix attach and detach sched avgs for task group change and sched class change
@ 2016-06-19 23:02 Yuyang Du
  2016-06-19 23:02 ` [PATCH v7 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group Yuyang Du
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Yuyang Du @ 2016-06-19 23:02 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, matt, Yuyang Du

Hi Peter,

This new version attaches sched avgs in task_move_group_fair(), since Vincent
and Peter are concerned with delaying it to task enqueue, and I agreed.

Thanks a lot to Vincent and Peter.

Thanks,
Yuyang

--

Yuyang Du (4):
  sched/fair: Fix attaching task sched avgs twice when switching to
    fair or changing task group
  sched/fair: Move load and util avgs from wake_up_new_task() to
    sched_fork()
  sched/fair: Skip detach sched avgs for new task when changing task
    groups
  sched/fair: Add inline to detach_entity_load_evg()

 kernel/sched/core.c  |    5 ++-
 kernel/sched/fair.c  |  106 +++++++++++++++++++++++++++++++-------------------
 kernel/sched/sched.h |    2 +-
 3 files changed, 71 insertions(+), 42 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v7 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group
  2016-06-19 23:02 [PATCH v7 0/4] sched/fair: Fix attach and detach sched avgs for task group change and sched class change Yuyang Du
@ 2016-06-19 23:02 ` Yuyang Du
  2016-06-19 23:02 ` [PATCH v7 2/4] sched/fair: Move load and util avgs from wake_up_new_task() to sched_fork() Yuyang Du
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Yuyang Du @ 2016-06-19 23:02 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, matt, Yuyang Du

Vincent reported that the first task to a new task group's cfs_rq will
be attached in attach_task_cfs_rq() and once more when it is enqueued
(see https://lkml.org/lkml/2016/5/25/388).

Actually, it is worse. The sched avgs can be sometimes attached twice
not only when we change task groups but also when we switch to fair class.
These two scenarios will be described in the following respectively.

(1) Switch to fair class:

The sched class change is done like this:

	if (queued)
	  enqueue_task();
	check_class_changed()
	  switched_from()
	  switched_to()

If the task is on_rq, before switched_to(), it has been enqueued, which
already attached sched avgs to the cfs_rq if the task's last_update_time
is 0, which can happen if the task was never fair class, if so, we
shouldn't attach it again in switched_to(), otherwise, we attach it twice.

To address both the on_rq and !on_rq cases, as well as both the task
was switched from fair and otherwise, the simplest solution is to reset
the task's last_update_time to 0 when the task is switched from fair.
Then let task enqueue do the sched avgs attachment uniformly only once.

(2) Change between fair task groups:

The task groups are changed like this:

	if (queued)
          dequeue_task()
	task_move_group()
	if (queued)
          enqueue_task()

Unlike the switch to fair class case, if the task is on_rq, it will be
enqueued right away after we move task groups, and if not, in the future
when the task is runnable. The attach twice problem can happen if the
cfs_rq and the task are both new as Vincent discovered. The simplest
solution is to only reset the task's last_update_time in task_move_group(),
and then let enqueue_task() do the sched avgs attachment.

Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/fair.c |   82 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 55 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f75930b..d43d242 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2933,26 +2933,9 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
 		update_tg_load_avg(cfs_rq, 0);
 }
 
-static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+/* Synchronize task with its cfs_rq */
+static inline void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	if (!sched_feat(ATTACH_AGE_LOAD))
-		goto skip_aging;
-
-	/*
-	 * If we got migrated (either between CPUs or between cgroups) we'll
-	 * have aged the average right before clearing @last_update_time.
-	 */
-	if (se->avg.last_update_time) {
-		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
-				  &se->avg, 0, 0, NULL);
-
-		/*
-		 * XXX: we could have just aged the entire load away if we've been
-		 * absent from the fair class for too long.
-		 */
-	}
-
-skip_aging:
 	se->avg.last_update_time = cfs_rq->avg.last_update_time;
 	cfs_rq->avg.load_avg += se->avg.load_avg;
 	cfs_rq->avg.load_sum += se->avg.load_sum;
@@ -3036,6 +3019,11 @@ static inline u64 cfs_rq_last_update_time(struct cfs_rq *cfs_rq)
 }
 #endif
 
+static inline void reset_task_last_update_time(struct task_struct *p)
+{
+	p->se.avg.last_update_time = 0;
+}
+
 /*
  * Task first catches up with cfs_rq, and then subtract
  * itself from the cfs_rq (task must be off the queue now).
@@ -3088,9 +3076,8 @@ dequeue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 static inline void remove_entity_load_avg(struct sched_entity *se) {}
 
 static inline void
-attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
-static inline void
 detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
+static inline void reset_task_last_update_time(struct task_struct *p) {}
 
 static inline int idle_balance(struct rq *rq)
 {
@@ -8376,8 +8363,19 @@ static void attach_task_cfs_rq(struct task_struct *p)
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
 #endif
 
-	/* Synchronize task with its cfs_rq */
-	attach_entity_load_avg(cfs_rq, se);
+	/*
+	 * At this point, we don't do sched avgs attachment. Instead,
+	 * we uniformly do the attachement at enqueue time in all
+	 * scenarios:
+	 *
+	 * (1) Actually, we may have already done it (e.g., at enqueue_task()
+	 * in __sched_setscheduler() when a task switches to fair class).
+	 *
+	 * (2) We will do it right away (e.g., in sched_move_task() when
+	 * an on_rq task moves between groups)
+	 *
+	 * (3) In the future when this currently !on_rq task is enqueued.
+	 */
 
 	if (!vruntime_normalized(p))
 		se->vruntime += cfs_rq->min_vruntime;
@@ -8386,6 +8384,26 @@ static void attach_task_cfs_rq(struct task_struct *p)
 static void switched_from_fair(struct rq *rq, struct task_struct *p)
 {
 	detach_task_cfs_rq(p);
+	/*
+	 * If the task changes back to fair class, we will attach the
+	 * task's sched avgs when it is enqueued, as the task's last_update_time
+	 * is reset to 0.
+	 *
+	 * This simplifies sched avgs attachment when a task is switched
+	 * to fair class, as we can unify the case where the task was
+	 * never fair class and the case where the task was fair class.
+	 *
+	 * Having lost last_update_time, we can't age the task's sched avgs
+	 * before attaching them, so the last updated sched avgs (in the
+	 * above detach_task_cfs_rq()) will be used.
+	 *
+	 * Typically the time between switched_from_fair() and switched_to_fair()
+	 * most likely could have just aged the entire load/util away.
+	 * Despite that, there is no better way to account for the lost
+	 * record of sched avgs during that time, therefore, we simply
+	 * lean toward no aging at all.
+	 */
+	reset_task_last_update_time(p);
 }
 
 static void switched_to_fair(struct rq *rq, struct task_struct *p)
@@ -8441,12 +8459,22 @@ static void task_move_group_fair(struct task_struct *p)
 {
 	detach_task_cfs_rq(p);
 	set_task_rq(p, task_cpu(p));
-
+	attach_task_cfs_rq(p);
 #ifdef CONFIG_SMP
-	/* Tell se's cfs_rq has been changed -- migrated */
-	p->se.avg.last_update_time = 0;
+	/*
+	 * If the cfs_rq's last_update_time is 0, attach the sched avgs
+	 * won't be anything useful, as it will be decayed to 0 when any
+	 * sched_entity is enqueued to that cfs_rq.
+	 *
+	 * On the other hand, if the cfs_rq's last_update_time is 0, we
+	 * must reset the task's last_update_time to ensure we will attach
+	 * the sched avgs when the task is enqueued.
+	 */
+	if (!cfs_rq_of(&p->se)->avg.last_update_time)
+		reset_task_last_update_time(p);
+	else
+		attach_entity_load_avg(cfs_rq_of(&p->se), &p->se);
 #endif
-	attach_task_cfs_rq(p);
 }
 
 void free_fair_sched_group(struct task_group *tg)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v7 2/4] sched/fair: Move load and util avgs from wake_up_new_task() to sched_fork()
  2016-06-19 23:02 [PATCH v7 0/4] sched/fair: Fix attach and detach sched avgs for task group change and sched class change Yuyang Du
  2016-06-19 23:02 ` [PATCH v7 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group Yuyang Du
@ 2016-06-19 23:02 ` Yuyang Du
  2016-06-19 23:02 ` [PATCH v7 3/4] sched/fair: Skip detach sched avgs for new task when changing task groups Yuyang Du
  2016-06-19 23:02 ` [PATCH v7 4/4] sched/fair: Add inline to detach_entity_load_evg() Yuyang Du
  3 siblings, 0 replies; 5+ messages in thread
From: Yuyang Du @ 2016-06-19 23:02 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, matt, Yuyang Du

Move new task initialization to sched_fork(). For initial non-fair class
task, the first switched_to_fair() will do the attach correctly.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/core.c  |    5 +++--
 kernel/sched/fair.c  |   14 +++++---------
 kernel/sched/sched.h |    2 +-
 3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 31c30e5..6b7d0b1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2384,6 +2384,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
 	if (p->sched_class->task_fork)
 		p->sched_class->task_fork(p);
 
+	/* Initialize new task's sched averages */
+	init_entity_sched_avg(&p->se);
+
 	/*
 	 * The child is not yet in the pid-hash so no cgroup attach races,
 	 * and the cgroup is pinned to this child due to cgroup_fork()
@@ -2524,8 +2527,6 @@ void wake_up_new_task(struct task_struct *p)
 	struct rq_flags rf;
 	struct rq *rq;
 
-	/* Initialize new task's runnable average */
-	init_entity_runnable_average(&p->se);
 	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 #ifdef CONFIG_SMP
 	/*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d43d242..35d76cf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -668,8 +668,8 @@ static unsigned long task_h_load(struct task_struct *p);
 #define LOAD_AVG_MAX 47742 /* maximum possible load avg */
 #define LOAD_AVG_MAX_N 345 /* number of full periods to produce LOAD_AVG_MAX */
 
-/* Give new sched_entity start runnable values to heavy its load in infant time */
-void init_entity_runnable_average(struct sched_entity *se)
+/* Give new sched_entity start load values to heavy its load in infant time */
+void init_entity_sched_avg(struct sched_entity *se)
 {
 	struct sched_avg *sa = &se->avg;
 
@@ -738,12 +738,8 @@ void post_init_entity_util_avg(struct sched_entity *se)
 static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq);
 static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq);
 #else
-void init_entity_runnable_average(struct sched_entity *se)
-{
-}
-void post_init_entity_util_avg(struct sched_entity *se)
-{
-}
+void init_entity_sched_avg(struct sched_entity *se) { }
+void post_init_entity_util_avg(struct sched_entity *se) { }
 #endif
 
 /*
@@ -8527,7 +8523,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
 
 		init_cfs_rq(cfs_rq);
 		init_tg_cfs_entry(tg, cfs_rq, se, i, parent->se[i]);
-		init_entity_runnable_average(se);
+		init_entity_sched_avg(se);
 
 		raw_spin_lock_irq(&rq->lock);
 		post_init_entity_util_avg(se);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bf6fea9..1b1d439 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1321,7 +1321,7 @@ extern void init_dl_task_timer(struct sched_dl_entity *dl_se);
 
 unsigned long to_ratio(u64 period, u64 runtime);
 
-extern void init_entity_runnable_average(struct sched_entity *se);
+extern void init_entity_sched_avg(struct sched_entity *se);
 extern void post_init_entity_util_avg(struct sched_entity *se);
 
 #ifdef CONFIG_NO_HZ_FULL
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v7 3/4] sched/fair: Skip detach sched avgs for new task when changing task groups
  2016-06-19 23:02 [PATCH v7 0/4] sched/fair: Fix attach and detach sched avgs for task group change and sched class change Yuyang Du
  2016-06-19 23:02 ` [PATCH v7 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group Yuyang Du
  2016-06-19 23:02 ` [PATCH v7 2/4] sched/fair: Move load and util avgs from wake_up_new_task() to sched_fork() Yuyang Du
@ 2016-06-19 23:02 ` Yuyang Du
  2016-06-19 23:02 ` [PATCH v7 4/4] sched/fair: Add inline to detach_entity_load_evg() Yuyang Du
  3 siblings, 0 replies; 5+ messages in thread
From: Yuyang Du @ 2016-06-19 23:02 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, matt, Yuyang Du

Newly forked task has not been enqueued, so should not be removed from
cfs_rq in task_move_group_fair(). To do so, we identify newly forked
tasks if their sum_exec_runtime is 0, an existing heuristic as per
vruntime_normalized(). In addition to that, uniformly use this test
in remove_entity_load_avg().

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/fair.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 35d76cf..c1de063 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2943,6 +2943,10 @@ static inline void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_en
 
 static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
+	/* Newly forked tasks are not attached yet. */
+	if (!se->sum_exec_runtime)
+		return;
+
 	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
 			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
 			  cfs_rq->curr == se, NULL);
@@ -3033,7 +3037,7 @@ void remove_entity_load_avg(struct sched_entity *se)
 	 * Newly created task or never used group entity should not be removed
 	 * from its (source) cfs_rq
 	 */
-	if (se->avg.last_update_time == 0)
+	if (!se->sum_exec_runtime)
 		return;
 
 	last_update_time = cfs_rq_last_update_time(cfs_rq);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v7 4/4] sched/fair: Add inline to detach_entity_load_evg()
  2016-06-19 23:02 [PATCH v7 0/4] sched/fair: Fix attach and detach sched avgs for task group change and sched class change Yuyang Du
                   ` (2 preceding siblings ...)
  2016-06-19 23:02 ` [PATCH v7 3/4] sched/fair: Skip detach sched avgs for new task when changing task groups Yuyang Du
@ 2016-06-19 23:02 ` Yuyang Du
  3 siblings, 0 replies; 5+ messages in thread
From: Yuyang Du @ 2016-06-19 23:02 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, matt, Yuyang Du

detach_entity_load_evg() is only called by detach_task_cfs_rq(), so
explicitly add inline attribute to it.

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/fair.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c1de063..a679407 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2941,7 +2941,8 @@ static inline void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_en
 	cfs_rq_util_change(cfs_rq);
 }
 
-static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+/* Catch up with the cfs_rq and remove our load when we leave */
+static inline void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	/* Newly forked tasks are not attached yet. */
 	if (!se->sum_exec_runtime)
@@ -8346,7 +8347,6 @@ static void detach_task_cfs_rq(struct task_struct *p)
 		se->vruntime -= cfs_rq->min_vruntime;
 	}
 
-	/* Catch up with the cfs_rq and remove our load when we leave */
 	detach_entity_load_avg(cfs_rq, se);
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-06-20  7:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-19 23:02 [PATCH v7 0/4] sched/fair: Fix attach and detach sched avgs for task group change and sched class change Yuyang Du
2016-06-19 23:02 ` [PATCH v7 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group Yuyang Du
2016-06-19 23:02 ` [PATCH v7 2/4] sched/fair: Move load and util avgs from wake_up_new_task() to sched_fork() Yuyang Du
2016-06-19 23:02 ` [PATCH v7 3/4] sched/fair: Skip detach sched avgs for new task when changing task groups Yuyang Du
2016-06-19 23:02 ` [PATCH v7 4/4] sched/fair: Add inline to detach_entity_load_evg() Yuyang Du

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).