From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751761AbbHOHQw (ORCPT <rfc822;w@1wt.eu>);
	Sat, 15 Aug 2015 03:16:52 -0400
Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:43132 "EHLO
	lgemrelse6q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751251AbbHOHQv (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Sat, 15 Aug 2015 03:16:51 -0400
X-Original-SENDERIP: 10.177.222.33
X-Original-MAILFROM: byungchul.park@lge.com
Date: Sat, 15 Aug 2015 16:16:16 +0900
From: Byungchul Park <byungchul.park@lge.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Yuyang Du <yuyang.du@intel.com>, mingo@kernel.org,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH] sched: sync with the cfs_rq when changing sched class
Message-ID: <20150815071615.GB16992@byungchulpark-X58A-UD3R>
References: <1439445355-24137-1-git-send-email-byungchul.park@lge.com>
 <20150813074600.GB16853@twins.programming.kicks-ass.net>
 <20150813082127.GO3956@byungchulpark-X58A-UD3R>
 <20150813021527.GB2143@intel.com>
 <20150813152212.GE16853@twins.programming.kicks-ass.net>
 <20150815065248.GA16992@byungchulpark-X58A-UD3R>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150815065248.GA16992@byungchulpark-X58A-UD3R>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Sat, Aug 15, 2015 at 03:52:48PM +0900, Byungchul Park wrote:
> On Thu, Aug 13, 2015 at 05:22:12PM +0200, Peter Zijlstra wrote:
> > On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
> > > On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
> > > > 
> > > > yuyang said that switched_to don't need to consider task's load because it
> > > > can have meaningless value. but i think considering task's load is better
> > > > than leaving it unattended at all. and we can also use switched_to if we 
> > > > consider task's load in switched_to.
> > > 
> > > when did I say "don't need to consider..."?
> > > 
> > > Doing more does not mean better, or just trivial. BTW, the task switched_to
> > > does not have to be switched_from before. 
> > 
> > Correct, there's a few corner cases we need to consider.
> > 
> > However, I think we unconditionally call init_entity_runnable_average()
> > on all tasks, regardless of their 'initial' sched class, so it should
> > have a valid state.
> > 
> > Another thing to consider is the state being very stale, suppose it
> > started live as FAIR, ran for a bit, got switched to !FAIR by means of
> > sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
> > time and for some reason gets switched back to FAIR, we need to age and
> > or re-init things.
> 
> hello,
> 
> what do you think about this approch for solving this problem ?
> it makes se's loads decay for detached periods for that rq. and i used
> rq instead of cfs_rq because it does not have dependency to cfs_rq
> any more.

to be honest with you, i am not sure what kind of clock do i have to
use for the detached se's decaying in this approach..
do you think it's better to use sched_clock() instead of rq task clock?

after checking if this appoach is feasible or not, and then i need to
choose a appropriate clock..

thanks,
byungchul

> 
> ---
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5b50082..8f5e2de 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1191,6 +1191,8 @@ struct load_weight {
>   */
>  struct sched_avg {
>  	u64 last_update_time, load_sum;
> +	u64 last_detached_time;
> +	int last_detached_cpu;
>  	u32 util_sum, period_contrib;
>  	unsigned long load_avg, util_avg;
>  };
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 72d13af..b2d22c8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -673,6 +673,8 @@ void init_entity_runnable_average(struct sched_entity *se)
>  	struct sched_avg *sa = &se->avg;
>  
>  	sa->last_update_time = 0;
> +	sa->last_detached_time = 0;
> +	sa->last_detached_cpu = -1;
>  	/*
>  	 * sched_avg's period_contrib should be strictly less then 1024, so
>  	 * we give it 1023 to make sure it is almost a period (1024us), and
> @@ -2711,16 +2713,47 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
>  
>  static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> -	se->avg.last_update_time = cfs_rq->avg.last_update_time;
> -	cfs_rq->avg.load_avg += se->avg.load_avg;
> -	cfs_rq->avg.load_sum += se->avg.load_sum;
> -	cfs_rq->avg.util_avg += se->avg.util_avg;
> -	cfs_rq->avg.util_sum += se->avg.util_sum;
> +	struct sched_avg *sa = &se->avg;
> +	int cpu = sa->last_detached_cpu;
> +	u64 delta;
> +
> +	if (cpu != -1) {
> +		delta = rq_clock_task(cpu_rq(cpu)) - sa->last_detached_time;
> +		/*
> +		 * compute the number of period passed, where a period is 1 msec,
> +		 * since the entity had detached from the rq, and ignore decaying
> +		 * delta which is less than a period for fast calculation.
> +		 */
> +		delta >>= 20;
> +		if (!delta)
> +			goto do_attach;
> +
> +		sa->load_sum = decay_load(sa->load_sum, delta);
> +		sa->util_sum = decay_load((u64)(sa->util_sum), delta);
> +		sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
> +		sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
> +	}
> +
> +do_attach:
> +	sa->last_detached_cpu = -1;
> +	sa->last_detached_time = 0;
> +	sa->period_contrib = 0;
> +
> +	sa->last_update_time = cfs_rq->avg.last_update_time;
> +	cfs_rq->avg.load_avg += sa->load_avg;
> +	cfs_rq->avg.load_sum += sa->load_sum;
> +	cfs_rq->avg.util_avg += sa->util_avg;
> +	cfs_rq->avg.util_sum += sa->util_sum;
>  }
>  
>  static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> -	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> +	int cpu = cpu_of(rq_of(cfs_rq));
> +
> +	se->avg.last_detached_cpu = cpu;
> +	se->avg.last_detached_time = rq_clock_task(rq_of(cfs_rq));
> +
> +	__update_load_avg(cfs_rq->avg.last_update_time, cpu,
>  			&se->avg, se->on_rq * scale_load_down(se->load.weight),
>  			cfs_rq->curr == se, NULL);
> 
> > 
> > I _think_ we can use last_update_time for that, but I've not looked too
> > hard.
> > 
> > That is, age based on last_update_time, if all 0, reinit, or somesuch.
> > 
> > 
> > The most common case of switched_from()/switched_to() is Priority
> > Inheritance, and that typically results in very short lived stints as
> > !FAIR and the avg data should be still accurate by the time we return.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/