All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] sched/fair: Skip attach and detach for new group task in task_move_group_fair()
@ 2016-05-30 22:32 Yuyang Du
  2016-05-30 22:32 ` [PATCH v2 1/3] sched/fair: Clean up attach_entity_load_avg() Yuyang Du
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Yuyang Du @ 2016-05-30 22:32 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

Hi Peter,

Vincent reported this problem, which is there for a while.

The first patch is cleanup as well as paving the way to the fix.

Regarding the fix in the second patch, it seems the vruntime part
in the detach and attach does not make sense for new task too, so
I also avoided it.

Thanks,
Yuyang

--
Yuyang Du (3):
  sched/fair: Clean up attach_entity_load_avg()
  sched/fair: Skip detach and attach new group task
  sched/fair: Add inline to detach_entity_load_evg()

 kernel/sched/auto_group.c |    2 +-
 kernel/sched/core.c       |    8 +++---
 kernel/sched/fair.c       |   61 ++++++++++++++++++++++++---------------------
 kernel/sched/sched.h      |    4 +--
 4 files changed, 40 insertions(+), 35 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/3] sched/fair: Clean up attach_entity_load_avg()
  2016-05-30 22:32 [PATCH v2 0/3] sched/fair: Skip attach and detach for new group task in task_move_group_fair() Yuyang Du
@ 2016-05-30 22:32 ` Yuyang Du
  2016-05-31  8:34   ` Peter Zijlstra
  2016-05-30 22:32 ` [PATCH v2 2/3] sched/fair: Skip detach and attach new group task Yuyang Du
  2016-05-30 22:32 ` [PATCH v2 3/3] sched/fair: Add inline to detach_entity_load_evg() Yuyang Du
  2 siblings, 1 reply; 13+ messages in thread
From: Yuyang Du @ 2016-05-30 22:32 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

attach_entity_load_avg() is called (indirectly) from:

 - switched_to_fair(): switch between classes to fair
 - task_move_group_fair(): move between task groups
 - enqueue_entity_load_avg(): enqueue entity

Only in switched_to_fair() is it possible that the task's last_update_time
is not 0 and therefore the task needs sched avgs update, so move the task
sched avgs update to switched_to_fair() only. In addition, code comments are
clarified.

No functionality change.

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/fair.c |   44 +++++++++++++++++++++-----------------------
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e8..a5bdbeb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2961,24 +2961,6 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
 
 static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	if (!sched_feat(ATTACH_AGE_LOAD))
-		goto skip_aging;
-
-	/*
-	 * If we got migrated (either between CPUs or between cgroups) we'll
-	 * have aged the average right before clearing @last_update_time.
-	 */
-	if (se->avg.last_update_time) {
-		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
-				  &se->avg, 0, 0, NULL);
-
-		/*
-		 * XXX: we could have just aged the entire load away if we've been
-		 * absent from the fair class for too long.
-		 */
-	}
-
-skip_aging:
 	se->avg.last_update_time = cfs_rq->avg.last_update_time;
 	cfs_rq->avg.load_avg += se->avg.load_avg;
 	cfs_rq->avg.load_sum += se->avg.load_sum;
@@ -8418,6 +8400,27 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
 
 static void switched_to_fair(struct rq *rq, struct task_struct *p)
 {
+#ifdef CONFIG_SMP
+	struct sched_entity *se = &p->se;
+
+	if (!sched_feat(ATTACH_AGE_LOAD))
+		goto skip_aging;
+
+	/*
+	 * If we change between classes, age the averages before attaching them.
+	 */
+	if (se->avg.last_update_time) {
+		__update_load_avg(cfs_rq_of(se)->avg.last_update_time, cpu_of(rq),
+				  &se->avg, 0, 0, NULL);
+
+		/*
+		 * XXX: we could have just aged the entire load away if we've been
+		 * absent from the fair class for too long.
+		 */
+	}
+
+skip_aging:
+#endif
 	attach_task_cfs_rq(p);
 
 	if (task_on_rq_queued(p)) {
@@ -8469,11 +8472,6 @@ static void task_move_group_fair(struct task_struct *p)
 {
 	detach_task_cfs_rq(p);
 	set_task_rq(p, task_cpu(p));
-
-#ifdef CONFIG_SMP
-	/* Tell se's cfs_rq has been changed -- migrated */
-	p->se.avg.last_update_time = 0;
-#endif
 	attach_task_cfs_rq(p);
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/3] sched/fair: Skip detach and attach new group task
  2016-05-30 22:32 [PATCH v2 0/3] sched/fair: Skip attach and detach for new group task in task_move_group_fair() Yuyang Du
  2016-05-30 22:32 ` [PATCH v2 1/3] sched/fair: Clean up attach_entity_load_avg() Yuyang Du
@ 2016-05-30 22:32 ` Yuyang Du
  2016-05-31 11:55   ` Vincent Guittot
  2016-05-30 22:32 ` [PATCH v2 3/3] sched/fair: Add inline to detach_entity_load_evg() Yuyang Du
  2 siblings, 1 reply; 13+ messages in thread
From: Yuyang Du @ 2016-05-30 22:32 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

Vincent reported that the first task to a new task group's cfs_rq will
be attached in attach_task_cfs_rq() and once more when it is enqueued
(see https://lkml.org/lkml/2016/5/25/388).

Actually, it is worse, attach_task_cfs_rq() is invoked for new task even
way before the new task is initiated in init_entity_runnable_average().

Solve this by avoiding attach as well as detach new task in
task_move_group_fair(). To do it, we need to know whether the task
is forked or not, so we pass this info all the way from sched_move_task()
to attach_task_cfs_rq().

Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/auto_group.c |    2 +-
 kernel/sched/core.c       |    8 ++++----
 kernel/sched/fair.c       |   17 ++++++++++++-----
 kernel/sched/sched.h      |    4 ++--
 4 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index a5d966c..e5f0be2 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -143,7 +143,7 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
 		goto out;
 
 	for_each_thread(p, t)
-		sched_move_task(t);
+		sched_move_task(t, 0);
 out:
 	unlock_task_sighand(p, &flags);
 	autogroup_kref_put(prev);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4..8585032 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7724,7 +7724,7 @@ void sched_offline_group(struct task_group *tg)
  *	by now. This function just updates tsk->se.cfs_rq and tsk->se.parent to
  *	reflect its new group.
  */
-void sched_move_task(struct task_struct *tsk)
+void sched_move_task(struct task_struct *tsk, int fork)
 {
 	struct task_group *tg;
 	int queued, running;
@@ -7753,7 +7753,7 @@ void sched_move_task(struct task_struct *tsk)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	if (tsk->sched_class->task_move_group)
-		tsk->sched_class->task_move_group(tsk);
+		tsk->sched_class->task_move_group(tsk, fork);
 	else
 #endif
 		set_task_rq(tsk, task_cpu(tsk));
@@ -8186,7 +8186,7 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
 
 static void cpu_cgroup_fork(struct task_struct *task)
 {
-	sched_move_task(task);
+	sched_move_task(task, 1);
 }
 
 static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
@@ -8213,7 +8213,7 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 	struct cgroup_subsys_state *css;
 
 	cgroup_taskset_for_each(task, css, tset)
-		sched_move_task(task);
+		sched_move_task(task, 0);
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a5bdbeb..5b34286 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2970,6 +2970,7 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	cfs_rq_util_change(cfs_rq);
 }
 
+/* Catch up with the cfs_rq and then remove our sched avgs from it */
 static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
@@ -8369,7 +8370,6 @@ static void detach_task_cfs_rq(struct task_struct *p)
 		se->vruntime -= cfs_rq->min_vruntime;
 	}
 
-	/* Catch up with the cfs_rq and remove our load when we leave */
 	detach_entity_load_avg(cfs_rq, se);
 }
 
@@ -8386,7 +8386,7 @@ static void attach_task_cfs_rq(struct task_struct *p)
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
 #endif
 
-	/* Synchronize task with its cfs_rq */
+	/* Synchronize and attach task to its cfs_rq */
 	attach_entity_load_avg(cfs_rq, se);
 
 	if (!vruntime_normalized(p))
@@ -8468,11 +8468,18 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 }
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-static void task_move_group_fair(struct task_struct *p)
+static void task_move_group_fair(struct task_struct *p, int fork)
 {
-	detach_task_cfs_rq(p);
+	/*
+	 * New task does not need detach or attach (see below)
+	 */
+	if (!fork)
+		detach_task_cfs_rq(p);
+
 	set_task_rq(p, task_cpu(p));
-	attach_task_cfs_rq(p);
+
+	if (!fork)
+		attach_task_cfs_rq(p);
 }
 
 void free_fair_sched_group(struct task_group *tg)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 72f1f30..58b1259 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -343,7 +343,7 @@ extern void sched_online_group(struct task_group *tg,
 extern void sched_destroy_group(struct task_group *tg);
 extern void sched_offline_group(struct task_group *tg);
 
-extern void sched_move_task(struct task_struct *tsk);
+extern void sched_move_task(struct task_struct *tsk, int fork);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 extern int sched_group_set_shares(struct task_group *tg, unsigned long shares);
@@ -1247,7 +1247,7 @@ struct sched_class {
 	void (*update_curr) (struct rq *rq);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-	void (*task_move_group) (struct task_struct *p);
+	void (*task_move_group) (struct task_struct *p, int fork);
 #endif
 };
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/3] sched/fair: Add inline to detach_entity_load_evg()
  2016-05-30 22:32 [PATCH v2 0/3] sched/fair: Skip attach and detach for new group task in task_move_group_fair() Yuyang Du
  2016-05-30 22:32 ` [PATCH v2 1/3] sched/fair: Clean up attach_entity_load_avg() Yuyang Du
  2016-05-30 22:32 ` [PATCH v2 2/3] sched/fair: Skip detach and attach new group task Yuyang Du
@ 2016-05-30 22:32 ` Yuyang Du
  2 siblings, 0 replies; 13+ messages in thread
From: Yuyang Du @ 2016-05-30 22:32 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: bsegall, pjt, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

detach_entity_load_evg() is only called by detach_task_cfs_rq(), so
explicitly add inline attribute to it.

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/fair.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5b34286..524afcb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2971,7 +2971,7 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 }
 
 /* Catch up with the cfs_rq and then remove our sched avgs from it */
-static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+static inline void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
 			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Clean up attach_entity_load_avg()
  2016-05-31  8:34   ` Peter Zijlstra
@ 2016-05-31  0:49     ` Yuyang Du
  2016-05-31  9:14       ` Peter Zijlstra
  2016-05-31 18:37     ` Yuyang Du
  1 sibling, 1 reply; 13+ messages in thread
From: Yuyang Du @ 2016-05-31  0:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Tue, May 31, 2016 at 10:34:02AM +0200, Peter Zijlstra wrote:
> On Tue, May 31, 2016 at 06:32:54AM +0800, Yuyang Du wrote:
> > +++ b/kernel/sched/fair.c
> > @@ -2961,24 +2961,6 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
> >  
> >  static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >  {
> > -	if (!sched_feat(ATTACH_AGE_LOAD))
> > -		goto skip_aging;
> > -
> > -	/*
> > -	 * If we got migrated (either between CPUs or between cgroups) we'll
> > -	 * have aged the average right before clearing @last_update_time.
> > -	 */
> > -	if (se->avg.last_update_time) {
> > -		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > -				  &se->avg, 0, 0, NULL);
> > -
> > -		/*
> > -		 * XXX: we could have just aged the entire load away if we've been
> > -		 * absent from the fair class for too long.
> > -		 */
> > -	}
> > -
> > -skip_aging:
> >  	se->avg.last_update_time = cfs_rq->avg.last_update_time;
> >  	cfs_rq->avg.load_avg += se->avg.load_avg;
> >  	cfs_rq->avg.load_sum += se->avg.load_sum;
> 
> So I'm not a big fan of this patch; the aging is a conceptual part of
> attaching the load, the fact that it only happens in one callsite is a
> mere 'accident'.

Strictly in concept, it is part of load dealing, maybe not load attaching, :)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Clean up attach_entity_load_avg()
  2016-05-30 22:32 ` [PATCH v2 1/3] sched/fair: Clean up attach_entity_load_avg() Yuyang Du
@ 2016-05-31  8:34   ` Peter Zijlstra
  2016-05-31  0:49     ` Yuyang Du
  2016-05-31 18:37     ` Yuyang Du
  0 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-31  8:34 UTC (permalink / raw)
  To: Yuyang Du
  Cc: mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Tue, May 31, 2016 at 06:32:54AM +0800, Yuyang Du wrote:
> +++ b/kernel/sched/fair.c
> @@ -2961,24 +2961,6 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
>  
>  static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> -	if (!sched_feat(ATTACH_AGE_LOAD))
> -		goto skip_aging;
> -
> -	/*
> -	 * If we got migrated (either between CPUs or between cgroups) we'll
> -	 * have aged the average right before clearing @last_update_time.
> -	 */
> -	if (se->avg.last_update_time) {
> -		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> -				  &se->avg, 0, 0, NULL);
> -
> -		/*
> -		 * XXX: we could have just aged the entire load away if we've been
> -		 * absent from the fair class for too long.
> -		 */
> -	}
> -
> -skip_aging:
>  	se->avg.last_update_time = cfs_rq->avg.last_update_time;
>  	cfs_rq->avg.load_avg += se->avg.load_avg;
>  	cfs_rq->avg.load_sum += se->avg.load_sum;

So I'm not a big fan of this patch; the aging is a conceptual part of
attaching the load, the fact that it only happens in one callsite is a
mere 'accident'.

The other two patches look reasonable; but I'll wait until Vince has
commented on them..

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Clean up attach_entity_load_avg()
  2016-05-31  0:49     ` Yuyang Du
@ 2016-05-31  9:14       ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-31  9:14 UTC (permalink / raw)
  To: Yuyang Du
  Cc: mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Tue, May 31, 2016 at 08:49:54AM +0800, Yuyang Du wrote:
> On Tue, May 31, 2016 at 10:34:02AM +0200, Peter Zijlstra wrote:
> > On Tue, May 31, 2016 at 06:32:54AM +0800, Yuyang Du wrote:
> > > +++ b/kernel/sched/fair.c
> > > @@ -2961,24 +2961,6 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
> > >  
> > >  static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > >  {
> > > -	if (!sched_feat(ATTACH_AGE_LOAD))
> > > -		goto skip_aging;
> > > -
> > > -	/*
> > > -	 * If we got migrated (either between CPUs or between cgroups) we'll
> > > -	 * have aged the average right before clearing @last_update_time.
> > > -	 */
> > > -	if (se->avg.last_update_time) {
> > > -		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > > -				  &se->avg, 0, 0, NULL);
> > > -
> > > -		/*
> > > -		 * XXX: we could have just aged the entire load away if we've been
> > > -		 * absent from the fair class for too long.
> > > -		 */
> > > -	}
> > > -
> > > -skip_aging:
> > >  	se->avg.last_update_time = cfs_rq->avg.last_update_time;
> > >  	cfs_rq->avg.load_avg += se->avg.load_avg;
> > >  	cfs_rq->avg.load_sum += se->avg.load_sum;
> > 
> > So I'm not a big fan of this patch; the aging is a conceptual part of
> > attaching the load, the fact that it only happens in one callsite is a
> > mere 'accident'.
> 
> Strictly in concept, it is part of load dealing, maybe not load attaching, :)

Well it deals with the time between detach and attach. Attach is the
obvious point to do that. If you move it elsewhere you run the risk of
forgetting it when the code changes etc..

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] sched/fair: Skip detach and attach new group task
  2016-05-30 22:32 ` [PATCH v2 2/3] sched/fair: Skip detach and attach new group task Yuyang Du
@ 2016-05-31 11:55   ` Vincent Guittot
  2016-05-31 12:03     ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Vincent Guittot @ 2016-05-31 11:55 UTC (permalink / raw)
  To: Yuyang Du
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Benjamin Segall,
	Paul Turner, Morten Rasmussen, Dietmar Eggemann

Hi Yuyang,

On 31 May 2016 at 00:32, Yuyang Du <yuyang.du@intel.com> wrote:
> Vincent reported that the first task to a new task group's cfs_rq will
> be attached in attach_task_cfs_rq() and once more when it is enqueued
> (see https://lkml.org/lkml/2016/5/25/388).
>
> Actually, it is worse, attach_task_cfs_rq() is invoked for new task even
> way before the new task is initiated in init_entity_runnable_average().
>
> Solve this by avoiding attach as well as detach new task in
> task_move_group_fair(). To do it, we need to know whether the task
> is forked or not, so we pass this info all the way from sched_move_task()
> to attach_task_cfs_rq().

I have tested your patch and I can't the spurious detach
/attach_task_cfs_rq anymore
so you can add my tested-by.

see few comments below

>
> Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Yuyang Du <yuyang.du@intel.com>
> ---
>  kernel/sched/auto_group.c |    2 +-
>  kernel/sched/core.c       |    8 ++++----
>  kernel/sched/fair.c       |   17 ++++++++++++-----
>  kernel/sched/sched.h      |    4 ++--
>  4 files changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
> index a5d966c..e5f0be2 100644
> --- a/kernel/sched/auto_group.c
> +++ b/kernel/sched/auto_group.c
> @@ -143,7 +143,7 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
>                 goto out;
>
>         for_each_thread(p, t)
> -               sched_move_task(t);
> +               sched_move_task(t, 0);

false instead of 0 and use a bool instead of a int

>  out:
>         unlock_task_sighand(p, &flags);
>         autogroup_kref_put(prev);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f2cae4..8585032 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7724,7 +7724,7 @@ void sched_offline_group(struct task_group *tg)
>   *     by now. This function just updates tsk->se.cfs_rq and tsk->se.parent to
>   *     reflect its new group.
>   */
> -void sched_move_task(struct task_struct *tsk)
> +void sched_move_task(struct task_struct *tsk, int fork)

bool instead of int

this comment applies for the other places where you have use a int
instead of a bool

>  {
>         struct task_group *tg;
>         int queued, running;
> @@ -7753,7 +7753,7 @@ void sched_move_task(struct task_struct *tsk)
>
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>         if (tsk->sched_class->task_move_group)
> -               tsk->sched_class->task_move_group(tsk);
> +               tsk->sched_class->task_move_group(tsk, fork);
>         else
>  #endif
>                 set_task_rq(tsk, task_cpu(tsk));
> @@ -8186,7 +8186,7 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
>
>  static void cpu_cgroup_fork(struct task_struct *task)
>  {
> -       sched_move_task(task);
> +       sched_move_task(task, 1);
>  }
>
>  static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
> @@ -8213,7 +8213,7 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset)
>         struct cgroup_subsys_state *css;
>
>         cgroup_taskset_for_each(task, css, tset)
> -               sched_move_task(task);
> +               sched_move_task(task, 0);
>  }
>
>  #ifdef CONFIG_FAIR_GROUP_SCHED
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a5bdbeb..5b34286 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2970,6 +2970,7 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>         cfs_rq_util_change(cfs_rq);
>  }
>
> +/* Catch up with the cfs_rq and then remove our sched avgs from it */
>  static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
>         __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> @@ -8369,7 +8370,6 @@ static void detach_task_cfs_rq(struct task_struct *p)
>                 se->vruntime -= cfs_rq->min_vruntime;
>         }
>
> -       /* Catch up with the cfs_rq and remove our load when we leave */
>         detach_entity_load_avg(cfs_rq, se);
>  }
>
> @@ -8386,7 +8386,7 @@ static void attach_task_cfs_rq(struct task_struct *p)
>         se->depth = se->parent ? se->parent->depth + 1 : 0;
>  #endif
>
> -       /* Synchronize task with its cfs_rq */
> +       /* Synchronize and attach task to its cfs_rq */
>         attach_entity_load_avg(cfs_rq, se);
>
>         if (!vruntime_normalized(p))
> @@ -8468,11 +8468,18 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
>  }
>
>  #ifdef CONFIG_FAIR_GROUP_SCHED
> -static void task_move_group_fair(struct task_struct *p)
> +static void task_move_group_fair(struct task_struct *p, int fork)
>  {
> -       detach_task_cfs_rq(p);
> +       /*
> +        * New task does not need detach or attach (see below)
> +        */
> +       if (!fork)
> +               detach_task_cfs_rq(p);
> +
>         set_task_rq(p, task_cpu(p));
> -       attach_task_cfs_rq(p);
> +
> +       if (!fork)
> +               attach_task_cfs_rq(p);
>  }
>
>  void free_fair_sched_group(struct task_group *tg)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 72f1f30..58b1259 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -343,7 +343,7 @@ extern void sched_online_group(struct task_group *tg,
>  extern void sched_destroy_group(struct task_group *tg);
>  extern void sched_offline_group(struct task_group *tg);
>
> -extern void sched_move_task(struct task_struct *tsk);
> +extern void sched_move_task(struct task_struct *tsk, int fork);
>
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  extern int sched_group_set_shares(struct task_group *tg, unsigned long shares);
> @@ -1247,7 +1247,7 @@ struct sched_class {
>         void (*update_curr) (struct rq *rq);
>
>  #ifdef CONFIG_FAIR_GROUP_SCHED
> -       void (*task_move_group) (struct task_struct *p);
> +       void (*task_move_group) (struct task_struct *p, int fork);
>  #endif
>  };
>
> --
> 1.7.9.5
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] sched/fair: Skip detach and attach new group task
  2016-05-31 11:55   ` Vincent Guittot
@ 2016-05-31 12:03     ` Peter Zijlstra
  2016-05-31 12:33       ` Vincent Guittot
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-31 12:03 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Yuyang Du, Ingo Molnar, linux-kernel, Benjamin Segall,
	Paul Turner, Morten Rasmussen, Dietmar Eggemann

On Tue, May 31, 2016 at 01:55:49PM +0200, Vincent Guittot wrote:
> Hi Yuyang,
> 
> On 31 May 2016 at 00:32, Yuyang Du <yuyang.du@intel.com> wrote:
> > Vincent reported that the first task to a new task group's cfs_rq will
> > be attached in attach_task_cfs_rq() and once more when it is enqueued
> > (see https://lkml.org/lkml/2016/5/25/388).
> >
> > Actually, it is worse, attach_task_cfs_rq() is invoked for new task even
> > way before the new task is initiated in init_entity_runnable_average().
> >
> > Solve this by avoiding attach as well as detach new task in
> > task_move_group_fair(). To do it, we need to know whether the task
> > is forked or not, so we pass this info all the way from sched_move_task()
> > to attach_task_cfs_rq().
> 
> I have tested your patch and I can't the spurious detach
> /attach_task_cfs_rq anymore
> so you can add my tested-by.
> 

Could someone update the Changelog to better explain the whole
callchain.

Because a quick look seems to suggest something like:

	copy_process()
	  sched_fork()
	  ...
	  cgroup_post_fork()
	    ss->fork() := cpu_cgroup_fork()

Which seems to suggest init_entity_runnable_average() is placed wrong
and should live in sched_fork() ?

Or am I not getting it.. ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] sched/fair: Skip detach and attach new group task
  2016-05-31 12:03     ` Peter Zijlstra
@ 2016-05-31 12:33       ` Vincent Guittot
  2016-05-31 12:39         ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Vincent Guittot @ 2016-05-31 12:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yuyang Du, Ingo Molnar, linux-kernel, Benjamin Segall,
	Paul Turner, Morten Rasmussen, Dietmar Eggemann

On 31 May 2016 at 14:03, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, May 31, 2016 at 01:55:49PM +0200, Vincent Guittot wrote:
>> Hi Yuyang,
>>
>> On 31 May 2016 at 00:32, Yuyang Du <yuyang.du@intel.com> wrote:
>> > Vincent reported that the first task to a new task group's cfs_rq will
>> > be attached in attach_task_cfs_rq() and once more when it is enqueued
>> > (see https://lkml.org/lkml/2016/5/25/388).
>> >
>> > Actually, it is worse, attach_task_cfs_rq() is invoked for new task even
>> > way before the new task is initiated in init_entity_runnable_average().
>> >
>> > Solve this by avoiding attach as well as detach new task in
>> > task_move_group_fair(). To do it, we need to know whether the task
>> > is forked or not, so we pass this info all the way from sched_move_task()
>> > to attach_task_cfs_rq().
>>
>> I have tested your patch and I can't the spurious detach
>> /attach_task_cfs_rq anymore
>> so you can add my tested-by.
>>
>
> Could someone update the Changelog to better explain the whole
> callchain.
>
> Because a quick look seems to suggest something like:
>
>         copy_process()
>           sched_fork()
>           ...
>           cgroup_post_fork()
>             ss->fork() := cpu_cgroup_fork()
>
> Which seems to suggest init_entity_runnable_average() is placed wrong
> and should live in sched_fork() ?
>
> Or am I not getting it.. ?

The sched_avg of a task is not used anymore in copy_process with
yuyang's patchset. To be fully correct, we still have a
p->se.avg.last_update_time = 0 in cpu_cgroup_fork but this is just the
side effect of factoring cpu_group_fork and cpu_group_move in
sched_move_task and it will be overwritten by
init_entity_runnable_average before being used.
Now, cpu_cgroup_fork() only sets task group and cfs_rq of the forked task

copy_process
  sched_fork
    ...
    cgroup_post_fork
      ss->fork() := cpu_cgroup_fork() which only set task group and
cfs_rq of the sched_entity of the task

wake_up_new_task
  init_entity_runnable_average
  activate_task
    enqueue_task
      attach_task

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] sched/fair: Skip detach and attach new group task
  2016-05-31 12:33       ` Vincent Guittot
@ 2016-05-31 12:39         ` Peter Zijlstra
  2016-05-31 13:20           ` Vincent Guittot
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-31 12:39 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Yuyang Du, Ingo Molnar, linux-kernel, Benjamin Segall,
	Paul Turner, Morten Rasmussen, Dietmar Eggemann

On Tue, May 31, 2016 at 02:33:45PM +0200, Vincent Guittot wrote:

> The sched_avg of a task is not used anymore in copy_process with
> yuyang's patchset. To be fully correct, we still have a
> p->se.avg.last_update_time = 0 in cpu_cgroup_fork but this is just the
> side effect of factoring cpu_group_fork and cpu_group_move in
> sched_move_task and it will be overwritten by
> init_entity_runnable_average before being used.

Yeah, but why? Why does init_entity_runnable_average() live in
wake_up_new_task()? That doesn't seem to make sense.

Also note that vruntime_normalized() (ab)uses !se->sum_exec_runtime to
detect the new task state.

> Now, cpu_cgroup_fork() only sets task group and cfs_rq of the forked task
> 
> copy_process
>   sched_fork
>     ...
>     cgroup_post_fork
>       ss->fork() := cpu_cgroup_fork() which only set task group and
> cfs_rq of the sched_entity of the task
> 
> wake_up_new_task
>   init_entity_runnable_average
>   activate_task
>     enqueue_task
>       attach_task

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] sched/fair: Skip detach and attach new group task
  2016-05-31 12:39         ` Peter Zijlstra
@ 2016-05-31 13:20           ` Vincent Guittot
  0 siblings, 0 replies; 13+ messages in thread
From: Vincent Guittot @ 2016-05-31 13:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yuyang Du, Ingo Molnar, linux-kernel, Benjamin Segall,
	Paul Turner, Morten Rasmussen, Dietmar Eggemann

On 31 May 2016 at 14:39, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, May 31, 2016 at 02:33:45PM +0200, Vincent Guittot wrote:
>
>> The sched_avg of a task is not used anymore in copy_process with
>> yuyang's patchset. To be fully correct, we still have a
>> p->se.avg.last_update_time = 0 in cpu_cgroup_fork but this is just the
>> side effect of factoring cpu_group_fork and cpu_group_move in
>> sched_move_task and it will be overwritten by
>> init_entity_runnable_average before being used.
>
> Yeah, but why? Why does init_entity_runnable_average() live in
> wake_up_new_task()? That doesn't seem to make sense.

I don't really know.
I would say that it can make sense now that the init of sched_avg is
split it in 2 parts: post_init_entity_util_avg has to be done after
set_task_cpu in wake_up_new_task

>
> Also note that vruntime_normalized() (ab)uses !se->sum_exec_runtime to
> detect the new task state.
>
>> Now, cpu_cgroup_fork() only sets task group and cfs_rq of the forked task
>>
>> copy_process
>>   sched_fork
>>     ...
>>     cgroup_post_fork
>>       ss->fork() := cpu_cgroup_fork() which only set task group and
>> cfs_rq of the sched_entity of the task
>>
>> wake_up_new_task
>>   init_entity_runnable_average
>>   activate_task
>>     enqueue_task
>>       attach_task
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Clean up attach_entity_load_avg()
  2016-05-31  8:34   ` Peter Zijlstra
  2016-05-31  0:49     ` Yuyang Du
@ 2016-05-31 18:37     ` Yuyang Du
  1 sibling, 0 replies; 13+ messages in thread
From: Yuyang Du @ 2016-05-31 18:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, bsegall, pjt, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Tue, May 31, 2016 at 10:34:02AM +0200, Peter Zijlstra wrote:
> On Tue, May 31, 2016 at 06:32:54AM +0800, Yuyang Du wrote:
> > +++ b/kernel/sched/fair.c
> > @@ -2961,24 +2961,6 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
> >  
> >  static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >  {
> > -	if (!sched_feat(ATTACH_AGE_LOAD))
> > -		goto skip_aging;
> > -
> > -	/*
> > -	 * If we got migrated (either between CPUs or between cgroups) we'll
> > -	 * have aged the average right before clearing @last_update_time.
> > -	 */
> > -	if (se->avg.last_update_time) {
> > -		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > -				  &se->avg, 0, 0, NULL);
> > -
> > -		/*
> > -		 * XXX: we could have just aged the entire load away if we've been
> > -		 * absent from the fair class for too long.
> > -		 */
> > -	}
> > -
> > -skip_aging:
> >  	se->avg.last_update_time = cfs_rq->avg.last_update_time;
> >  	cfs_rq->avg.load_avg += se->avg.load_avg;
> >  	cfs_rq->avg.load_sum += se->avg.load_sum;
> 
> So I'm not a big fan of this patch; the aging is a conceptual part of
> attaching the load, the fact that it only happens in one callsite is a
> mere 'accident'.

Yes, and we have to put the following to make it an "accident".

static void task_move_group_fair(struct task_struct *p)
{
	detach_task_cfs_rq(p);
	set_task_rq(p, task_cpu(p));

#ifdef CONFIG_SMP
	/* Tell se's cfs_rq has been changed -- migrated */
	p->se.avg.last_update_time = 0;
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#endif
	attach_task_cfs_rq(p);
}

So let me give it another try to keep this cleanup. Will send another version.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-06-01  2:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-30 22:32 [PATCH v2 0/3] sched/fair: Skip attach and detach for new group task in task_move_group_fair() Yuyang Du
2016-05-30 22:32 ` [PATCH v2 1/3] sched/fair: Clean up attach_entity_load_avg() Yuyang Du
2016-05-31  8:34   ` Peter Zijlstra
2016-05-31  0:49     ` Yuyang Du
2016-05-31  9:14       ` Peter Zijlstra
2016-05-31 18:37     ` Yuyang Du
2016-05-30 22:32 ` [PATCH v2 2/3] sched/fair: Skip detach and attach new group task Yuyang Du
2016-05-31 11:55   ` Vincent Guittot
2016-05-31 12:03     ` Peter Zijlstra
2016-05-31 12:33       ` Vincent Guittot
2016-05-31 12:39         ` Peter Zijlstra
2016-05-31 13:20           ` Vincent Guittot
2016-05-30 22:32 ` [PATCH v2 3/3] sched/fair: Add inline to detach_entity_load_evg() Yuyang Du

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.