Re: [PATCH] sched: sync with the cfs_rq when changing sched class

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-13  5:55 [PATCH] sched: sync with the cfs_rq when changing sched class byungchul.park
@ 2015-08-12 22:41 ` Yuyang Du
  2015-08-13  7:19   ` Byungchul Park
  2015-08-13  7:46 ` Peter Zijlstra
  2015-08-14 12:59 ` T. Zhou
  2 siblings, 1 reply; 17+ messages in thread
From: Yuyang Du @ 2015-08-12 22:41 UTC (permalink / raw)
  To: byungchul.park; +Cc: mingo, peterz, linux-kernel

On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.park@lge.com wrote:
> 
> currently, a task load is synced with its cfs_rq, only when it
> leaves from fair class. we also need to sync it with cfs_rq when
> it returns back to fair class, too.
 
Syncing it at the time it is switched to fair is not necessary, because
since last_update_time if it has ever been updated, the load has become
random, IOW, useless. So we simply leave it unattended, and let itself
merge in the system.

>  
>  #ifdef CONFIG_SMP
>  	/* synchronize task with its prev cfs_rq */
> -	if (!queued)
> -		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> -				&se->avg, se->on_rq * scale_load_down(se->load.weight),
> -				cfs_rq->curr == se, NULL);
> -
> -	/* remove our load when we leave */
> -	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> -	cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> -	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> -	cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> +	detach_entity_load_avg(cfs_rq, se);
>  #endif

You changed the logic.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-13  8:21   ` Byungchul Park
@ 2015-08-13  2:15     ` Yuyang Du
  2015-08-13 10:56       ` Byungchul Park
  2015-08-13 15:22       ` Peter Zijlstra
  0 siblings, 2 replies; 17+ messages in thread
From: Yuyang Du @ 2015-08-13  2:15 UTC (permalink / raw)
  To: Byungchul Park; +Cc: Peter Zijlstra, mingo, linux-kernel

On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
> 
> yuyang said that switched_to don't need to consider task's load because it
> can have meaningless value. but i think considering task's load is better
> than leaving it unattended at all. and we can also use switched_to if we 
> consider task's load in switched_to.

when did I say "don't need to consider..."?

Doing more does not mean better, or just trivial. BTW, the task switched_to
does not have to be switched_from before. 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-13  7:19   ` Byungchul Park
@ 2015-08-13  2:30     ` Yuyang Du
  0 siblings, 0 replies; 17+ messages in thread
From: Yuyang Du @ 2015-08-13  2:30 UTC (permalink / raw)
  To: Byungchul Park; +Cc: mingo, peterz, linux-kernel

On Thu, Aug 13, 2015 at 04:19:04PM +0900, Byungchul Park wrote:
> > >  #ifdef CONFIG_SMP
> > >  	/* synchronize task with its prev cfs_rq */
> > > -	if (!queued)
> > > -		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > > -				&se->avg, se->on_rq * scale_load_down(se->load.weight),
> > > -				cfs_rq->curr == se, NULL);
> > > -
> > > -	/* remove our load when we leave */
> > > -	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> > > -	cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> > > -	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> > > -	cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> > > +	detach_entity_load_avg(cfs_rq, se);
> > >  #endif
> > 
> > You changed the logic.
> 
> yes, i changed it. but i think that calling __update_load_avg() is not
> a problem even in case of "queued == 1". so i didn't think that change
> seriously.
> 
> wrong? :(
> 

It is not a problem, but any good or maybe any bad? And I would suggest you
add the comment I gave you.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] sched: sync with the cfs_rq when changing sched class
@ 2015-08-13  5:55 byungchul.park
  2015-08-12 22:41 ` Yuyang Du
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: byungchul.park @ 2015-08-13  5:55 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel, yuyang.du, Byungchul Park

From: Byungchul Park <byungchul.park@lge.com>

this work is based on a patch below, which may not be reviewed yet.

https://lkml.org/lkml/2015/8/12/914, that subject is,
"sched: sync with the prev cfs when changing cgroup within a cpu".

----->8-----
>From e96bc0439a8d9504c9ea33d6253f946a92c3dbc5 Mon Sep 17 00:00:00 2001
From: Byungchul Park <byungchul.park@lge.com>
Date: Thu, 13 Aug 2015 14:29:52 +0900
Subject: [PATCH] sched: sync with the cfs_rq when changing sched class

currently, a task load is synced with its cfs_rq, only when it
leaves from fair class. we also need to sync it with cfs_rq when
it returns back to fair class, too.

in addition, i created two functions for attaching/detaching a se
load from/to its cfs_rq, and let them use those functions. and i
place that function call to where a se is attached/detached
to/from cfs_rq.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/fair.c |   78 +++++++++++++++++++++++++--------------------------
 1 file changed, 39 insertions(+), 39 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 979ca2c..72d13af 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2709,6 +2709,31 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
 		update_tg_load_avg(cfs_rq, 0);
 }
 
+static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	se->avg.last_update_time = cfs_rq->avg.last_update_time;
+	cfs_rq->avg.load_avg += se->avg.load_avg;
+	cfs_rq->avg.load_sum += se->avg.load_sum;
+	cfs_rq->avg.util_avg += se->avg.util_avg;
+	cfs_rq->avg.util_sum += se->avg.util_sum;
+}
+
+static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
+			&se->avg, se->on_rq * scale_load_down(se->load.weight),
+			cfs_rq->curr == se, NULL);
+
+	cfs_rq->avg.load_avg =
+		max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
+	cfs_rq->avg.load_sum =
+		max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
+	cfs_rq->avg.util_avg =
+		max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
+	cfs_rq->avg.util_sum =
+		max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+}
+
 /* Add the load generated by se into cfs_rq's load average */
 static inline void
 enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
@@ -2717,27 +2742,20 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	u64 now = cfs_rq_clock_task(cfs_rq);
 	int migrated = 0, decayed;
 
-	if (sa->last_update_time == 0) {
-		sa->last_update_time = now;
+	if (sa->last_update_time == 0)
 		migrated = 1;
-	}
-	else {
+	else
 		__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
-			se->on_rq * scale_load_down(se->load.weight),
-			cfs_rq->curr == se, NULL);
-	}
+				se->on_rq * scale_load_down(se->load.weight),
+				cfs_rq->curr == se, NULL);
 
 	decayed = update_cfs_rq_load_avg(now, cfs_rq);
 
 	cfs_rq->runnable_load_avg += sa->load_avg;
 	cfs_rq->runnable_load_sum += sa->load_sum;
 
-	if (migrated) {
-		cfs_rq->avg.load_avg += sa->load_avg;
-		cfs_rq->avg.load_sum += sa->load_sum;
-		cfs_rq->avg.util_avg += sa->util_avg;
-		cfs_rq->avg.util_sum += sa->util_sum;
-	}
+	if (migrated)
+		attach_entity_load_avg(cfs_rq, se);
 
 	if (decayed || migrated)
 		update_tg_load_avg(cfs_rq, 0);
@@ -7911,17 +7929,7 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
 
 #ifdef CONFIG_SMP
 	/* Catch up with the cfs_rq and remove our load when we leave */
-	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq), &se->avg,
-		se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL);
-
-	cfs_rq->avg.load_avg =
-		max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
-	cfs_rq->avg.load_sum =
-		max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
-	cfs_rq->avg.util_avg =
-		max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
-	cfs_rq->avg.util_sum =
-		max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+	detach_entity_load_avg(cfs_rq, se);
 #endif
 }
 
@@ -7938,6 +7946,11 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
 	 */
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
 #endif
+
+#ifdef CONFIG_SMP
+	/* synchronize task with its cfs_rq */
+	attach_entity_load_avg(cfs_rq_of(&p->se), &p->se);
+#endif
 	if (!task_on_rq_queued(p))
 		return;
 
@@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
 
 #ifdef CONFIG_SMP
 	/* synchronize task with its prev cfs_rq */
-	if (!queued)
-		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
-				&se->avg, se->on_rq * scale_load_down(se->load.weight),
-				cfs_rq->curr == se, NULL);
-
-	/* remove our load when we leave */
-	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
-	cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
-	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
-	cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+	detach_entity_load_avg(cfs_rq, se);
 #endif
 	set_task_rq(p, task_cpu(p));
 	se->depth = se->parent ? se->parent->depth + 1 : 0;
@@ -8042,11 +8046,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
 
 #ifdef CONFIG_SMP
 	/* Virtually synchronize task with its new cfs_rq */
-	p->se.avg.last_update_time = cfs_rq->avg.last_update_time;
-	cfs_rq->avg.load_avg += p->se.avg.load_avg;
-	cfs_rq->avg.load_sum += p->se.avg.load_sum;
-	cfs_rq->avg.util_avg += p->se.avg.util_avg;
-	cfs_rq->avg.util_sum += p->se.avg.util_sum;
+	attach_entity_load_avg(cfs_rq, se);
 #endif
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-12 22:41 ` Yuyang Du
@ 2015-08-13  7:19   ` Byungchul Park
  2015-08-13  2:30     ` Yuyang Du
  0 siblings, 1 reply; 17+ messages in thread
From: Byungchul Park @ 2015-08-13  7:19 UTC (permalink / raw)
  To: Yuyang Du; +Cc: mingo, peterz, linux-kernel

On Thu, Aug 13, 2015 at 06:41:45AM +0800, Yuyang Du wrote:
> On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.park@lge.com wrote:
> > 
> > currently, a task load is synced with its cfs_rq, only when it
> > leaves from fair class. we also need to sync it with cfs_rq when
> > it returns back to fair class, too.
>  
> Syncing it at the time it is switched to fair is not necessary, because
> since last_update_time if it has ever been updated, the load has become
> random, IOW, useless. So we simply leave it unattended, and let itself
> merge in the system.

hello,

i agree with you almost. it would have a meaningless value over time,
while it has a meaningful value as soon as it leaved that cfs_rq.

however, IMHO, nobody know when a task is switched between sched classes.
i think it would be better that we consider that load rather than leave
it unattended, even though, of course, in both of cases __update_load_avg()
will dacay and fix it over time.

shouldn't we consider it?

1. case returning back to fair class very soon:
original code cannot reflect the task load to cfs_rq, while
patched code can reflect the task load to cfs_rq.

2. case returning back to fair class after long:
original code adds 0 to cfs_rq and let __update_load_avg() fix it, while
patched code adds a meaningless value to cfs_rq and let
__update_load_avg() fix it, afterall these become same.

> 
> >  
> >  #ifdef CONFIG_SMP
> >  	/* synchronize task with its prev cfs_rq */
> > -	if (!queued)
> > -		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > -				&se->avg, se->on_rq * scale_load_down(se->load.weight),
> > -				cfs_rq->curr == se, NULL);
> > -
> > -	/* remove our load when we leave */
> > -	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> > -	cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> > -	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> > -	cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> > +	detach_entity_load_avg(cfs_rq, se);
> >  #endif
> 
> You changed the logic.

yes, i changed it. but i think that calling __update_load_avg() is not
a problem even in case of "queued == 1". so i didn't think that change
seriously.

wrong? :(

thanks,
byungchul

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-13  5:55 [PATCH] sched: sync with the cfs_rq when changing sched class byungchul.park
  2015-08-12 22:41 ` Yuyang Du
@ 2015-08-13  7:46 ` Peter Zijlstra
  2015-08-13  8:21   ` Byungchul Park
  2015-08-13  8:42   ` Byungchul Park
  2015-08-14 12:59 ` T. Zhou
  2 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2015-08-13  7:46 UTC (permalink / raw)
  To: byungchul.park; +Cc: mingo, linux-kernel, yuyang.du

On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.park@lge.com wrote:
> @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
>  
>  #ifdef CONFIG_SMP
>  	/* synchronize task with its prev cfs_rq */
> -	if (!queued)
> -		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> -				&se->avg, se->on_rq * scale_load_down(se->load.weight),
> -				cfs_rq->curr == se, NULL);
> -
> -	/* remove our load when we leave */
> -	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> -	cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> -	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> -	cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> +	detach_entity_load_avg(cfs_rq, se);
>  #endif
>  	set_task_rq(p, task_cpu(p));
>  	se->depth = se->parent ? se->parent->depth + 1 : 0;
> @@ -8042,11 +8046,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
>  
>  #ifdef CONFIG_SMP
>  	/* Virtually synchronize task with its new cfs_rq */
> -	p->se.avg.last_update_time = cfs_rq->avg.last_update_time;
> -	cfs_rq->avg.load_avg += p->se.avg.load_avg;
> -	cfs_rq->avg.load_sum += p->se.avg.load_sum;
> -	cfs_rq->avg.util_avg += p->se.avg.util_avg;
> -	cfs_rq->avg.util_sum += p->se.avg.util_sum;
> +	attach_entity_load_avg(cfs_rq, se);
>  #endif
>  }

Can't we go one further and do:

static void task_move_group_fair(struct task_struct *p)
{
	struct rq *rq = task_rq(p);

	switched_from_fair(rq, p);
	set_task_rq(p, task_cpu(p);
	switched_to_fair(rq, p);
}

switched_from already does the vruntime and load_avg thing,
switched_to should do the reverse, although it currently doesn't appear
to put the load_avg back.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-13  7:46 ` Peter Zijlstra
@ 2015-08-13  8:21   ` Byungchul Park
  2015-08-13  2:15     ` Yuyang Du
  2015-08-13  8:42   ` Byungchul Park
  1 sibling, 1 reply; 17+ messages in thread
From: Byungchul Park @ 2015-08-13  8:21 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, yuyang.du

On Thu, Aug 13, 2015 at 09:46:00AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.park@lge.com wrote:
> > @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
> >  
> >  #ifdef CONFIG_SMP
> >  	/* synchronize task with its prev cfs_rq */
> > -	if (!queued)
> > -		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > -				&se->avg, se->on_rq * scale_load_down(se->load.weight),
> > -				cfs_rq->curr == se, NULL);
> > -
> > -	/* remove our load when we leave */
> > -	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> > -	cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> > -	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> > -	cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> > +	detach_entity_load_avg(cfs_rq, se);
> >  #endif
> >  	set_task_rq(p, task_cpu(p));
> >  	se->depth = se->parent ? se->parent->depth + 1 : 0;
> > @@ -8042,11 +8046,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
> >  
> >  #ifdef CONFIG_SMP
> >  	/* Virtually synchronize task with its new cfs_rq */
> > -	p->se.avg.last_update_time = cfs_rq->avg.last_update_time;
> > -	cfs_rq->avg.load_avg += p->se.avg.load_avg;
> > -	cfs_rq->avg.load_sum += p->se.avg.load_sum;
> > -	cfs_rq->avg.util_avg += p->se.avg.util_avg;
> > -	cfs_rq->avg.util_sum += p->se.avg.util_sum;
> > +	attach_entity_load_avg(cfs_rq, se);
> >  #endif
> >  }
> 
> Can't we go one further and do:
> 
> static void task_move_group_fair(struct task_struct *p)
> {
> 	struct rq *rq = task_rq(p);
> 
> 	switched_from_fair(rq, p);
> 	set_task_rq(p, task_cpu(p);
> 	switched_to_fair(rq, p);
> }

it looks nice, but i will think more..

> 
> switched_from already does the vruntime and load_avg thing,
> switched_to should do the reverse, although it currently doesn't appear
> to put the load_avg back.

yuyang said that switched_to don't need to consider task's load because it
can have meaningless value. but i think considering task's load is better
than leaving it unattended at all. and we can also use switched_to if we 
consider task's load in switched_to.

thanks,
byungchul

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-13  7:46 ` Peter Zijlstra
  2015-08-13  8:21   ` Byungchul Park
@ 2015-08-13  8:42   ` Byungchul Park
  1 sibling, 0 replies; 17+ messages in thread
From: Byungchul Park @ 2015-08-13  8:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, yuyang.du

On Thu, Aug 13, 2015 at 09:46:00AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.park@lge.com wrote:
> > @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
> >  
> >  #ifdef CONFIG_SMP
> >  	/* synchronize task with its prev cfs_rq */
> > -	if (!queued)
> > -		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > -				&se->avg, se->on_rq * scale_load_down(se->load.weight),
> > -				cfs_rq->curr == se, NULL);
> > -
> > -	/* remove our load when we leave */
> > -	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> > -	cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> > -	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> > -	cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> > +	detach_entity_load_avg(cfs_rq, se);
> >  #endif
> >  	set_task_rq(p, task_cpu(p));
> >  	se->depth = se->parent ? se->parent->depth + 1 : 0;
> > @@ -8042,11 +8046,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
> >  
> >  #ifdef CONFIG_SMP
> >  	/* Virtually synchronize task with its new cfs_rq */
> > -	p->se.avg.last_update_time = cfs_rq->avg.last_update_time;
> > -	cfs_rq->avg.load_avg += p->se.avg.load_avg;
> > -	cfs_rq->avg.load_sum += p->se.avg.load_sum;
> > -	cfs_rq->avg.util_avg += p->se.avg.util_avg;
> > -	cfs_rq->avg.util_sum += p->se.avg.util_sum;
> > +	attach_entity_load_avg(cfs_rq, se);
> >  #endif
> >  }
> 
> Can't we go one further and do:
> 
> static void task_move_group_fair(struct task_struct *p)
> {
> 	struct rq *rq = task_rq(p);
> 
> 	switched_from_fair(rq, p);
> 	set_task_rq(p, task_cpu(p);
> 	switched_to_fair(rq, p);
> }

yeah, i think this is very nice idea which make code simple.
i will try it.

thanks,
byungchul

> 
> switched_from already does the vruntime and load_avg thing,
> switched_to should do the reverse, although it currently doesn't appear
> to put the load_avg back.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-13  2:15     ` Yuyang Du
@ 2015-08-13 10:56       ` Byungchul Park
  2015-08-13 15:22       ` Peter Zijlstra
  1 sibling, 0 replies; 17+ messages in thread
From: Byungchul Park @ 2015-08-13 10:56 UTC (permalink / raw)
  To: Yuyang Du; +Cc: Peter Zijlstra, mingo, linux-kernel

On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
> On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
> > 
> > yuyang said that switched_to don't need to consider task's load because it
> > can have meaningless value. but i think considering task's load is better
> > than leaving it unattended at all. and we can also use switched_to if we 
> > consider task's load in switched_to.
> 
> when did I say "don't need to consider..."?
> 
> Doing more does not mean better, or just trivial. BTW, the task switched_to
> does not have to be switched_from before. 

i see.. i need to add initialization routine for fair class..

thanks,
byungchul

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-13  2:15     ` Yuyang Du
  2015-08-13 10:56       ` Byungchul Park
@ 2015-08-13 15:22       ` Peter Zijlstra
  2015-08-13 23:20         ` Yuyang Du
  2015-08-15  6:52         ` Byungchul Park
  1 sibling, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2015-08-13 15:22 UTC (permalink / raw)
  To: Yuyang Du; +Cc: Byungchul Park, mingo, linux-kernel

On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
> On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
> > 
> > yuyang said that switched_to don't need to consider task's load because it
> > can have meaningless value. but i think considering task's load is better
> > than leaving it unattended at all. and we can also use switched_to if we 
> > consider task's load in switched_to.
> 
> when did I say "don't need to consider..."?
> 
> Doing more does not mean better, or just trivial. BTW, the task switched_to
> does not have to be switched_from before. 

Correct, there's a few corner cases we need to consider.

However, I think we unconditionally call init_entity_runnable_average()
on all tasks, regardless of their 'initial' sched class, so it should
have a valid state.

Another thing to consider is the state being very stale, suppose it
started live as FAIR, ran for a bit, got switched to !FAIR by means of
sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
time and for some reason gets switched back to FAIR, we need to age and
or re-init things.

I _think_ we can use last_update_time for that, but I've not looked too
hard.

That is, age based on last_update_time, if all 0, reinit, or somesuch.

The most common case of switched_from()/switched_to() is Priority
Inheritance, and that typically results in very short lived stints as
!FAIR and the avg data should be still accurate by the time we return.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-13 15:22       ` Peter Zijlstra
@ 2015-08-13 23:20         ` Yuyang Du
  2015-08-14  8:46           ` Peter Zijlstra
  2015-08-15  6:52         ` Byungchul Park
  1 sibling, 1 reply; 17+ messages in thread
From: Yuyang Du @ 2015-08-13 23:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Byungchul Park, mingo, linux-kernel

On Thu, Aug 13, 2015 at 05:22:12PM +0200, Peter Zijlstra wrote:
> > when did I say "don't need to consider..."?
> > 
> > Doing more does not mean better, or just trivial. BTW, the task switched_to
> > does not have to be switched_from before. 
> 
> Correct, there's a few corner cases we need to consider.
> 
> However, I think we unconditionally call init_entity_runnable_average()
> on all tasks, regardless of their 'initial' sched class, so it should
> have a valid state.
> 
> Another thing to consider is the state being very stale, suppose it
> started live as FAIR, ran for a bit, got switched to !FAIR by means of
> sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
> time and for some reason gets switched back to FAIR, we need to age and
> or re-init things.
> 
> I _think_ we can use last_update_time for that, but I've not looked too
> hard.
> 
> That is, age based on last_update_time, if all 0, reinit, or somesuch.
> 
> 
> The most common case of switched_from()/switched_to() is Priority
> Inheritance, and that typically results in very short lived stints as
> !FAIR and the avg data should be still accurate by the time we return.

Right, we have the following cases to consider:
queued or not * PI or sched_setscheduler * switched_from_fair or not

After enumerating them, it simply boils down to this basic question:
with either short or long lost record, what we predict?

It can have to do with perspective, preference, etc, and it differs
in how frequent it happens, which we lack. Otherwise, we don't have
the context to see what difference it makes.

To address this, I think the following patch might be helpful first.

--
sched: Provide sched class and priority change statistics to task

The sched class and priority changes make substantial impact for
a task, but we really have not a quantitative understanding of how
frequent they are, which makes it diffcult to work on many cases
especially some corner cases. We privide it at task level.

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 include/linux/sched.h |    3 +++
 kernel/sched/core.c   |    8 +++++++-
 kernel/sched/debug.c  |    2 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5b50082..9111696 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1228,6 +1228,9 @@ struct sched_statistics {
 	u64			nr_wakeups_affine_attempts;
 	u64			nr_wakeups_passive;
 	u64			nr_wakeups_idle;
+
+	u64			class_change;
+	u64			priority_change;
 };
 #endif
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 655557d..b65af01 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1009,12 +1009,18 @@ static inline void check_class_changed(struct rq *rq, struct task_struct *p,
 				       int oldprio)
 {
 	if (prev_class != p->sched_class) {
+		schedstat_inc(p, se.statistics.class_change);
+		schedstat_inc(p, se.statistics.priority_change);
+
 		if (prev_class->switched_from)
 			prev_class->switched_from(rq, p);
 
 		p->sched_class->switched_to(rq, p);
-	} else if (oldprio != p->prio || dl_task(p))
+	} else if (oldprio != p->prio || dl_task(p)) {
+		schedstat_inc(p, se.statistics.priority_change);
+
 		p->sched_class->prio_changed(rq, p, oldprio);
+	}
 }
 
 void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 6415117..fac8b89 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -597,6 +597,8 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m)
 	P(se.statistics.nr_wakeups_affine_attempts);
 	P(se.statistics.nr_wakeups_passive);
 	P(se.statistics.nr_wakeups_idle);
+	P(se.statistics.class_change);
+	P(se.statistics.priority_change);
 
 	{
 		u64 avg_atom, avg_per_cpu;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-13 23:20         ` Yuyang Du
@ 2015-08-14  8:46           ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2015-08-14  8:46 UTC (permalink / raw)
  To: Yuyang Du; +Cc: Byungchul Park, mingo, linux-kernel

On Fri, Aug 14, 2015 at 07:20:20AM +0800, Yuyang Du wrote:
> sched: Provide sched class and priority change statistics to task
> 
> The sched class and priority changes make substantial impact for
> a task, but we really have not a quantitative understanding of how
> frequent they are, which makes it diffcult to work on many cases
> especially some corner cases. We privide it at task level.
> 
> Signed-off-by: Yuyang Du <yuyang.du@intel.com>

Sure we can do that. I would think its fairly rare on regular kernels,
whilst fairly common on -RT kernels where PI is much more active.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-13  5:55 [PATCH] sched: sync with the cfs_rq when changing sched class byungchul.park
  2015-08-12 22:41 ` Yuyang Du
  2015-08-13  7:46 ` Peter Zijlstra
@ 2015-08-14 12:59 ` T. Zhou
  2015-08-15  4:24   ` Byungchul Park
  2 siblings, 1 reply; 17+ messages in thread
From: T. Zhou @ 2015-08-14 12:59 UTC (permalink / raw)
  To: byungchul.park; +Cc: mingo, peterz, linux-kernel, yuyang.du

Hi,

On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.park@lge.com wrote:
> +static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> +{
> +	se->avg.last_update_time = cfs_rq->avg.last_update_time;
> +	cfs_rq->avg.load_avg += se->avg.load_avg;
> +	cfs_rq->avg.load_sum += se->avg.load_sum;
> +	cfs_rq->avg.util_avg += se->avg.util_avg;
> +	cfs_rq->avg.util_sum += se->avg.util_sum;
> +}

I see this function is used in enqueue_entity_load_avg() also.
In tip tree code, enqueue_entity_load_avg() uses cfs_rq_clock_task()
as se->last_update_time in migration condition.
Here, you use cfs_rq->avg.last_update_time as se->last_update_time.
If se->last_update_time is different, next decay may be different too.
Just from inspecting code, maybe some reasonable there?

Thanks
> +static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> +{
> +	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> +			&se->avg, se->on_rq * scale_load_down(se->load.weight),
> +			cfs_rq->curr == se, NULL);
> +
> +	cfs_rq->avg.load_avg =
> +		max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> +	cfs_rq->avg.load_sum =
> +		max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> +	cfs_rq->avg.util_avg =
> +		max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> +	cfs_rq->avg.util_sum =
> +		max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> +}
> +
>  /* Add the load generated by se into cfs_rq's load average */
>  static inline void
>  enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> @@ -2717,27 +2742,20 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  	u64 now = cfs_rq_clock_task(cfs_rq);
>  	int migrated = 0, decayed;
>  
> -	if (sa->last_update_time == 0) {
> -		sa->last_update_time = now;
> +	if (sa->last_update_time == 0)
>  		migrated = 1;
> -	}
> -	else {
> +	else
>  		__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
> -			se->on_rq * scale_load_down(se->load.weight),
> -			cfs_rq->curr == se, NULL);
> -	}
> +				se->on_rq * scale_load_down(se->load.weight),
> +				cfs_rq->curr == se, NULL);
>  
>  	decayed = update_cfs_rq_load_avg(now, cfs_rq);
>  
>  	cfs_rq->runnable_load_avg += sa->load_avg;
>  	cfs_rq->runnable_load_sum += sa->load_sum;
>  
> -	if (migrated) {
> -		cfs_rq->avg.load_avg += sa->load_avg;
> -		cfs_rq->avg.load_sum += sa->load_sum;
> -		cfs_rq->avg.util_avg += sa->util_avg;
> -		cfs_rq->avg.util_sum += sa->util_sum;
> -	}
> +	if (migrated)
> +		attach_entity_load_avg(cfs_rq, se);
>  
>  	if (decayed || migrated)
>  		update_tg_load_avg(cfs_rq, 0);
> @@ -7911,17 +7929,7 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
>  
>  #ifdef CONFIG_SMP
>  	/* Catch up with the cfs_rq and remove our load when we leave */
> -	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq), &se->avg,
> -		se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL);
> -
> -	cfs_rq->avg.load_avg =
> -		max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> -	cfs_rq->avg.load_sum =
> -		max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> -	cfs_rq->avg.util_avg =
> -		max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> -	cfs_rq->avg.util_sum =
> -		max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> +	detach_entity_load_avg(cfs_rq, se);
>  #endif
>  }
>  
> @@ -7938,6 +7946,11 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
>  	 */
>  	se->depth = se->parent ? se->parent->depth + 1 : 0;
>  #endif
> +
> +#ifdef CONFIG_SMP
> +	/* synchronize task with its cfs_rq */
> +	attach_entity_load_avg(cfs_rq_of(&p->se), &p->se);
> +#endif
>  	if (!task_on_rq_queued(p))
>  		return;
>  
> @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
>  
>  #ifdef CONFIG_SMP
>  	/* synchronize task with its prev cfs_rq */
> -	if (!queued)
> -		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> -				&se->avg, se->on_rq * scale_load_down(se->load.weight),
> -				cfs_rq->curr == se, NULL);
> -
> -	/* remove our load when we leave */
> -	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> -	cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> -	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> -	cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> +	detach_entity_load_avg(cfs_rq, se);
>  #endif
>  	set_task_rq(p, task_cpu(p));
>  	se->depth = se->parent ? se->parent->depth + 1 : 0;
> @@ -8042,11 +8046,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
>  
>  #ifdef CONFIG_SMP
>  	/* Virtually synchronize task with its new cfs_rq */
> -	p->se.avg.last_update_time = cfs_rq->avg.last_update_time;
> -	cfs_rq->avg.load_avg += p->se.avg.load_avg;
> -	cfs_rq->avg.load_sum += p->se.avg.load_sum;
> -	cfs_rq->avg.util_avg += p->se.avg.util_avg;
> -	cfs_rq->avg.util_sum += p->se.avg.util_sum;
> +	attach_entity_load_avg(cfs_rq, se);
>  #endif
>  }

-- 
Tao
--

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-14 12:59 ` T. Zhou
@ 2015-08-15  4:24   ` Byungchul Park
  2015-08-15 12:34     ` T. Zhou
  0 siblings, 1 reply; 17+ messages in thread
From: Byungchul Park @ 2015-08-15  4:24 UTC (permalink / raw)
  To: T. Zhou; +Cc: mingo, peterz, linux-kernel, yuyang.du

On Fri, Aug 14, 2015 at 08:59:02PM +0800, T. Zhou wrote:
> Hi,
> 
> On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.park@lge.com wrote:
> > +static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > +{
> > +	se->avg.last_update_time = cfs_rq->avg.last_update_time;
> > +	cfs_rq->avg.load_avg += se->avg.load_avg;
> > +	cfs_rq->avg.load_sum += se->avg.load_sum;
> > +	cfs_rq->avg.util_avg += se->avg.util_avg;
> > +	cfs_rq->avg.util_sum += se->avg.util_sum;
> > +}
> 
> I see this function is used in enqueue_entity_load_avg() also.
> In tip tree code, enqueue_entity_load_avg() uses cfs_rq_clock_task()
> as se->last_update_time in migration condition.
> Here, you use cfs_rq->avg.last_update_time as se->last_update_time.
> If se->last_update_time is different, next decay may be different too.
> Just from inspecting code, maybe some reasonable there?

hello zhou,

update_cfs_rq_load_avg() would update cfs_rq->avg.last_update_time to now,
which is returned from cfs_rq_clock_task(). :)

thanks,
byungchul

> 
> Thanks
> > +static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > +{
> > +	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > +			&se->avg, se->on_rq * scale_load_down(se->load.weight),
> > +			cfs_rq->curr == se, NULL);
> > +
> > +	cfs_rq->avg.load_avg =
> > +		max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> > +	cfs_rq->avg.load_sum =
> > +		max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> > +	cfs_rq->avg.util_avg =
> > +		max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> > +	cfs_rq->avg.util_sum =
> > +		max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> > +}
> > +
> >  /* Add the load generated by se into cfs_rq's load average */
> >  static inline void
> >  enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > @@ -2717,27 +2742,20 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >  	u64 now = cfs_rq_clock_task(cfs_rq);
> >  	int migrated = 0, decayed;
> >  
> > -	if (sa->last_update_time == 0) {
> > -		sa->last_update_time = now;
> > +	if (sa->last_update_time == 0)
> >  		migrated = 1;
> > -	}
> > -	else {
> > +	else
> >  		__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
> > -			se->on_rq * scale_load_down(se->load.weight),
> > -			cfs_rq->curr == se, NULL);
> > -	}
> > +				se->on_rq * scale_load_down(se->load.weight),
> > +				cfs_rq->curr == se, NULL);
> >  
> >  	decayed = update_cfs_rq_load_avg(now, cfs_rq);
> >  
> >  	cfs_rq->runnable_load_avg += sa->load_avg;
> >  	cfs_rq->runnable_load_sum += sa->load_sum;
> >  
> > -	if (migrated) {
> > -		cfs_rq->avg.load_avg += sa->load_avg;
> > -		cfs_rq->avg.load_sum += sa->load_sum;
> > -		cfs_rq->avg.util_avg += sa->util_avg;
> > -		cfs_rq->avg.util_sum += sa->util_sum;
> > -	}
> > +	if (migrated)
> > +		attach_entity_load_avg(cfs_rq, se);
> >  
> >  	if (decayed || migrated)
> >  		update_tg_load_avg(cfs_rq, 0);
> > @@ -7911,17 +7929,7 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
> >  
> >  #ifdef CONFIG_SMP
> >  	/* Catch up with the cfs_rq and remove our load when we leave */
> > -	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq), &se->avg,
> > -		se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL);
> > -
> > -	cfs_rq->avg.load_avg =
> > -		max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> > -	cfs_rq->avg.load_sum =
> > -		max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> > -	cfs_rq->avg.util_avg =
> > -		max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> > -	cfs_rq->avg.util_sum =
> > -		max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> > +	detach_entity_load_avg(cfs_rq, se);
> >  #endif
> >  }
> >  
> > @@ -7938,6 +7946,11 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
> >  	 */
> >  	se->depth = se->parent ? se->parent->depth + 1 : 0;
> >  #endif
> > +
> > +#ifdef CONFIG_SMP
> > +	/* synchronize task with its cfs_rq */
> > +	attach_entity_load_avg(cfs_rq_of(&p->se), &p->se);
> > +#endif
> >  	if (!task_on_rq_queued(p))
> >  		return;
> >  
> > @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
> >  
> >  #ifdef CONFIG_SMP
> >  	/* synchronize task with its prev cfs_rq */
> > -	if (!queued)
> > -		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > -				&se->avg, se->on_rq * scale_load_down(se->load.weight),
> > -				cfs_rq->curr == se, NULL);
> > -
> > -	/* remove our load when we leave */
> > -	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> > -	cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> > -	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> > -	cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> > +	detach_entity_load_avg(cfs_rq, se);
> >  #endif
> >  	set_task_rq(p, task_cpu(p));
> >  	se->depth = se->parent ? se->parent->depth + 1 : 0;
> > @@ -8042,11 +8046,7 @@ static void task_move_group_fair(struct task_struct *p, int queued)
> >  
> >  #ifdef CONFIG_SMP
> >  	/* Virtually synchronize task with its new cfs_rq */
> > -	p->se.avg.last_update_time = cfs_rq->avg.last_update_time;
> > -	cfs_rq->avg.load_avg += p->se.avg.load_avg;
> > -	cfs_rq->avg.load_sum += p->se.avg.load_sum;
> > -	cfs_rq->avg.util_avg += p->se.avg.util_avg;
> > -	cfs_rq->avg.util_sum += p->se.avg.util_sum;
> > +	attach_entity_load_avg(cfs_rq, se);
> >  #endif
> >  }
> 
> -- 
> Tao
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-13 15:22       ` Peter Zijlstra
  2015-08-13 23:20         ` Yuyang Du
@ 2015-08-15  6:52         ` Byungchul Park
  2015-08-15  7:16           ` Byungchul Park
  1 sibling, 1 reply; 17+ messages in thread
From: Byungchul Park @ 2015-08-15  6:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Yuyang Du, mingo, linux-kernel

On Thu, Aug 13, 2015 at 05:22:12PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
> > On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
> > > 
> > > yuyang said that switched_to don't need to consider task's load because it
> > > can have meaningless value. but i think considering task's load is better
> > > than leaving it unattended at all. and we can also use switched_to if we 
> > > consider task's load in switched_to.
> > 
> > when did I say "don't need to consider..."?
> > 
> > Doing more does not mean better, or just trivial. BTW, the task switched_to
> > does not have to be switched_from before. 
> 
> Correct, there's a few corner cases we need to consider.
> 
> However, I think we unconditionally call init_entity_runnable_average()
> on all tasks, regardless of their 'initial' sched class, so it should
> have a valid state.
> 
> Another thing to consider is the state being very stale, suppose it
> started live as FAIR, ran for a bit, got switched to !FAIR by means of
> sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
> time and for some reason gets switched back to FAIR, we need to age and
> or re-init things.

hello,

what do you think about this approch for solving this problem ?
it makes se's loads decay for detached periods for that rq. and i used
rq instead of cfs_rq because it does not have dependency to cfs_rq
any more.

---

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5b50082..8f5e2de 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1191,6 +1191,8 @@ struct load_weight {
  */
 struct sched_avg {
 	u64 last_update_time, load_sum;
+	u64 last_detached_time;
+	int last_detached_cpu;
 	u32 util_sum, period_contrib;
 	unsigned long load_avg, util_avg;
 };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 72d13af..b2d22c8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -673,6 +673,8 @@ void init_entity_runnable_average(struct sched_entity *se)
 	struct sched_avg *sa = &se->avg;
 
 	sa->last_update_time = 0;
+	sa->last_detached_time = 0;
+	sa->last_detached_cpu = -1;
 	/*
 	 * sched_avg's period_contrib should be strictly less then 1024, so
 	 * we give it 1023 to make sure it is almost a period (1024us), and
@@ -2711,16 +2713,47 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
 
 static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	se->avg.last_update_time = cfs_rq->avg.last_update_time;
-	cfs_rq->avg.load_avg += se->avg.load_avg;
-	cfs_rq->avg.load_sum += se->avg.load_sum;
-	cfs_rq->avg.util_avg += se->avg.util_avg;
-	cfs_rq->avg.util_sum += se->avg.util_sum;
+	struct sched_avg *sa = &se->avg;
+	int cpu = sa->last_detached_cpu;
+	u64 delta;
+
+	if (cpu != -1) {
+		delta = rq_clock_task(cpu_rq(cpu)) - sa->last_detached_time;
+		/*
+		 * compute the number of period passed, where a period is 1 msec,
+		 * since the entity had detached from the rq, and ignore decaying
+		 * delta which is less than a period for fast calculation.
+		 */
+		delta >>= 20;
+		if (!delta)
+			goto do_attach;
+
+		sa->load_sum = decay_load(sa->load_sum, delta);
+		sa->util_sum = decay_load((u64)(sa->util_sum), delta);
+		sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
+		sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
+	}
+
+do_attach:
+	sa->last_detached_cpu = -1;
+	sa->last_detached_time = 0;
+	sa->period_contrib = 0;
+
+	sa->last_update_time = cfs_rq->avg.last_update_time;
+	cfs_rq->avg.load_avg += sa->load_avg;
+	cfs_rq->avg.load_sum += sa->load_sum;
+	cfs_rq->avg.util_avg += sa->util_avg;
+	cfs_rq->avg.util_sum += sa->util_sum;
 }
 
 static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
+	int cpu = cpu_of(rq_of(cfs_rq));
+
+	se->avg.last_detached_cpu = cpu;
+	se->avg.last_detached_time = rq_clock_task(rq_of(cfs_rq));
+
+	__update_load_avg(cfs_rq->avg.last_update_time, cpu,
 			&se->avg, se->on_rq * scale_load_down(se->load.weight),
 			cfs_rq->curr == se, NULL);

> 
> I _think_ we can use last_update_time for that, but I've not looked too
> hard.
> 
> That is, age based on last_update_time, if all 0, reinit, or somesuch.
> 
> 
> The most common case of switched_from()/switched_to() is Priority
> Inheritance, and that typically results in very short lived stints as
> !FAIR and the avg data should be still accurate by the time we return.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-15  6:52         ` Byungchul Park
@ 2015-08-15  7:16           ` Byungchul Park
  0 siblings, 0 replies; 17+ messages in thread
From: Byungchul Park @ 2015-08-15  7:16 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Yuyang Du, mingo, linux-kernel

On Sat, Aug 15, 2015 at 03:52:48PM +0900, Byungchul Park wrote:
> On Thu, Aug 13, 2015 at 05:22:12PM +0200, Peter Zijlstra wrote:
> > On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
> > > On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
> > > > 
> > > > yuyang said that switched_to don't need to consider task's load because it
> > > > can have meaningless value. but i think considering task's load is better
> > > > than leaving it unattended at all. and we can also use switched_to if we 
> > > > consider task's load in switched_to.
> > > 
> > > when did I say "don't need to consider..."?
> > > 
> > > Doing more does not mean better, or just trivial. BTW, the task switched_to
> > > does not have to be switched_from before. 
> > 
> > Correct, there's a few corner cases we need to consider.
> > 
> > However, I think we unconditionally call init_entity_runnable_average()
> > on all tasks, regardless of their 'initial' sched class, so it should
> > have a valid state.
> > 
> > Another thing to consider is the state being very stale, suppose it
> > started live as FAIR, ran for a bit, got switched to !FAIR by means of
> > sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
> > time and for some reason gets switched back to FAIR, we need to age and
> > or re-init things.
> 
> hello,
> 
> what do you think about this approch for solving this problem ?
> it makes se's loads decay for detached periods for that rq. and i used
> rq instead of cfs_rq because it does not have dependency to cfs_rq
> any more.

to be honest with you, i am not sure what kind of clock do i have to
use for the detached se's decaying in this approach..
do you think it's better to use sched_clock() instead of rq task clock?

after checking if this appoach is feasible or not, and then i need to
choose a appropriate clock..

thanks,
byungchul

> 
> ---
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5b50082..8f5e2de 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1191,6 +1191,8 @@ struct load_weight {
>   */
>  struct sched_avg {
>  	u64 last_update_time, load_sum;
> +	u64 last_detached_time;
> +	int last_detached_cpu;
>  	u32 util_sum, period_contrib;
>  	unsigned long load_avg, util_avg;
>  };
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 72d13af..b2d22c8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -673,6 +673,8 @@ void init_entity_runnable_average(struct sched_entity *se)
>  	struct sched_avg *sa = &se->avg;
>  
>  	sa->last_update_time = 0;
> +	sa->last_detached_time = 0;
> +	sa->last_detached_cpu = -1;
>  	/*
>  	 * sched_avg's period_contrib should be strictly less then 1024, so
>  	 * we give it 1023 to make sure it is almost a period (1024us), and
> @@ -2711,16 +2713,47 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
>  
>  static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> -	se->avg.last_update_time = cfs_rq->avg.last_update_time;
> -	cfs_rq->avg.load_avg += se->avg.load_avg;
> -	cfs_rq->avg.load_sum += se->avg.load_sum;
> -	cfs_rq->avg.util_avg += se->avg.util_avg;
> -	cfs_rq->avg.util_sum += se->avg.util_sum;
> +	struct sched_avg *sa = &se->avg;
> +	int cpu = sa->last_detached_cpu;
> +	u64 delta;
> +
> +	if (cpu != -1) {
> +		delta = rq_clock_task(cpu_rq(cpu)) - sa->last_detached_time;
> +		/*
> +		 * compute the number of period passed, where a period is 1 msec,
> +		 * since the entity had detached from the rq, and ignore decaying
> +		 * delta which is less than a period for fast calculation.
> +		 */
> +		delta >>= 20;
> +		if (!delta)
> +			goto do_attach;
> +
> +		sa->load_sum = decay_load(sa->load_sum, delta);
> +		sa->util_sum = decay_load((u64)(sa->util_sum), delta);
> +		sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
> +		sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
> +	}
> +
> +do_attach:
> +	sa->last_detached_cpu = -1;
> +	sa->last_detached_time = 0;
> +	sa->period_contrib = 0;
> +
> +	sa->last_update_time = cfs_rq->avg.last_update_time;
> +	cfs_rq->avg.load_avg += sa->load_avg;
> +	cfs_rq->avg.load_sum += sa->load_sum;
> +	cfs_rq->avg.util_avg += sa->util_avg;
> +	cfs_rq->avg.util_sum += sa->util_sum;
>  }
>  
>  static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> -	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> +	int cpu = cpu_of(rq_of(cfs_rq));
> +
> +	se->avg.last_detached_cpu = cpu;
> +	se->avg.last_detached_time = rq_clock_task(rq_of(cfs_rq));
> +
> +	__update_load_avg(cfs_rq->avg.last_update_time, cpu,
>  			&se->avg, se->on_rq * scale_load_down(se->load.weight),
>  			cfs_rq->curr == se, NULL);
> 
> > 
> > I _think_ we can use last_update_time for that, but I've not looked too
> > hard.
> > 
> > That is, age based on last_update_time, if all 0, reinit, or somesuch.
> > 
> > 
> > The most common case of switched_from()/switched_to() is Priority
> > Inheritance, and that typically results in very short lived stints as
> > !FAIR and the avg data should be still accurate by the time we return.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] sched: sync with the cfs_rq when changing sched class
  2015-08-15  4:24   ` Byungchul Park
@ 2015-08-15 12:34     ` T. Zhou
  0 siblings, 0 replies; 17+ messages in thread
From: T. Zhou @ 2015-08-15 12:34 UTC (permalink / raw)
  To: Byungchul Park; +Cc: mingo, peterz, linux-kernel, yuyang.du

On Sat, Aug 15, 2015 at 01:24:12PM +0900, Byungchul Park wrote:
> On Fri, Aug 14, 2015 at 08:59:02PM +0800, T. Zhou wrote:
> > Hi,
> > 
> > On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.park@lge.com wrote:
> > > +static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > +{
> > > +	se->avg.last_update_time = cfs_rq->avg.last_update_time;
> > > +	cfs_rq->avg.load_avg += se->avg.load_avg;
> > > +	cfs_rq->avg.load_sum += se->avg.load_sum;
> > > +	cfs_rq->avg.util_avg += se->avg.util_avg;
> > > +	cfs_rq->avg.util_sum += se->avg.util_sum;
> > > +}
> > 
> > I see this function is used in enqueue_entity_load_avg() also.
> > In tip tree code, enqueue_entity_load_avg() uses cfs_rq_clock_task()
> > as se->last_update_time in migration condition.
> > Here, you use cfs_rq->avg.last_update_time as se->last_update_time.
> > If se->last_update_time is different, next decay may be different too.
> > Just from inspecting code, maybe some reasonable there?
> 
> hello zhou,
> 
> update_cfs_rq_load_avg() would update cfs_rq->avg.last_update_time to now,
> which is returned from cfs_rq_clock_task(). :)
> 
> thanks,
> byungchul
> 

Hi Byungchul,

yes, you are right. se and cfs_rq update load avg at same time. :)

thanks
-- 
Tao

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-08-15 12:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-13  5:55 [PATCH] sched: sync with the cfs_rq when changing sched class byungchul.park
2015-08-12 22:41 ` Yuyang Du
2015-08-13  7:19   ` Byungchul Park
2015-08-13  2:30     ` Yuyang Du
2015-08-13  7:46 ` Peter Zijlstra
2015-08-13  8:21   ` Byungchul Park
2015-08-13  2:15     ` Yuyang Du
2015-08-13 10:56       ` Byungchul Park
2015-08-13 15:22       ` Peter Zijlstra
2015-08-13 23:20         ` Yuyang Du
2015-08-14  8:46           ` Peter Zijlstra
2015-08-15  6:52         ` Byungchul Park
2015-08-15  7:16           ` Byungchul Park
2015-08-13  8:42   ` Byungchul Park
2015-08-14 12:59 ` T. Zhou
2015-08-15  4:24   ` Byungchul Park
2015-08-15 12:34     ` T. Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).