From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751695AbbHOGxX (ORCPT ); Sat, 15 Aug 2015 02:53:23 -0400 Received: from lgeamrelo01.lge.com ([156.147.1.125]:57735 "EHLO lgeamrelo01.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751125AbbHOGxW (ORCPT ); Sat, 15 Aug 2015 02:53:22 -0400 X-Original-SENDERIP: 10.177.222.33 X-Original-MAILFROM: byungchul.park@lge.com Date: Sat, 15 Aug 2015 15:52:48 +0900 From: Byungchul Park To: Peter Zijlstra Cc: Yuyang Du , mingo@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] sched: sync with the cfs_rq when changing sched class Message-ID: <20150815065248.GA16992@byungchulpark-X58A-UD3R> References: <1439445355-24137-1-git-send-email-byungchul.park@lge.com> <20150813074600.GB16853@twins.programming.kicks-ass.net> <20150813082127.GO3956@byungchulpark-X58A-UD3R> <20150813021527.GB2143@intel.com> <20150813152212.GE16853@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150813152212.GE16853@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 13, 2015 at 05:22:12PM +0200, Peter Zijlstra wrote: > On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote: > > On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote: > > > > > > yuyang said that switched_to don't need to consider task's load because it > > > can have meaningless value. but i think considering task's load is better > > > than leaving it unattended at all. and we can also use switched_to if we > > > consider task's load in switched_to. > > > > when did I say "don't need to consider..."? > > > > Doing more does not mean better, or just trivial. BTW, the task switched_to > > does not have to be switched_from before. > > Correct, there's a few corner cases we need to consider. > > However, I think we unconditionally call init_entity_runnable_average() > on all tasks, regardless of their 'initial' sched class, so it should > have a valid state. > > Another thing to consider is the state being very stale, suppose it > started live as FAIR, ran for a bit, got switched to !FAIR by means of > sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long > time and for some reason gets switched back to FAIR, we need to age and > or re-init things. hello, what do you think about this approch for solving this problem ? it makes se's loads decay for detached periods for that rq. and i used rq instead of cfs_rq because it does not have dependency to cfs_rq any more. --- diff --git a/include/linux/sched.h b/include/linux/sched.h index 5b50082..8f5e2de 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1191,6 +1191,8 @@ struct load_weight { */ struct sched_avg { u64 last_update_time, load_sum; + u64 last_detached_time; + int last_detached_cpu; u32 util_sum, period_contrib; unsigned long load_avg, util_avg; }; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 72d13af..b2d22c8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -673,6 +673,8 @@ void init_entity_runnable_average(struct sched_entity *se) struct sched_avg *sa = &se->avg; sa->last_update_time = 0; + sa->last_detached_time = 0; + sa->last_detached_cpu = -1; /* * sched_avg's period_contrib should be strictly less then 1024, so * we give it 1023 to make sure it is almost a period (1024us), and @@ -2711,16 +2713,47 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg) static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { - se->avg.last_update_time = cfs_rq->avg.last_update_time; - cfs_rq->avg.load_avg += se->avg.load_avg; - cfs_rq->avg.load_sum += se->avg.load_sum; - cfs_rq->avg.util_avg += se->avg.util_avg; - cfs_rq->avg.util_sum += se->avg.util_sum; + struct sched_avg *sa = &se->avg; + int cpu = sa->last_detached_cpu; + u64 delta; + + if (cpu != -1) { + delta = rq_clock_task(cpu_rq(cpu)) - sa->last_detached_time; + /* + * compute the number of period passed, where a period is 1 msec, + * since the entity had detached from the rq, and ignore decaying + * delta which is less than a period for fast calculation. + */ + delta >>= 20; + if (!delta) + goto do_attach; + + sa->load_sum = decay_load(sa->load_sum, delta); + sa->util_sum = decay_load((u64)(sa->util_sum), delta); + sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX); + sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX; + } + +do_attach: + sa->last_detached_cpu = -1; + sa->last_detached_time = 0; + sa->period_contrib = 0; + + sa->last_update_time = cfs_rq->avg.last_update_time; + cfs_rq->avg.load_avg += sa->load_avg; + cfs_rq->avg.load_sum += sa->load_sum; + cfs_rq->avg.util_avg += sa->util_avg; + cfs_rq->avg.util_sum += sa->util_sum; } static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { - __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)), + int cpu = cpu_of(rq_of(cfs_rq)); + + se->avg.last_detached_cpu = cpu; + se->avg.last_detached_time = rq_clock_task(rq_of(cfs_rq)); + + __update_load_avg(cfs_rq->avg.last_update_time, cpu, &se->avg, se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == se, NULL); > > I _think_ we can use last_update_time for that, but I've not looked too > hard. > > That is, age based on last_update_time, if all 0, reinit, or somesuch. > > > The most common case of switched_from()/switched_to() is Priority > Inheritance, and that typically results in very short lived stints as > !FAIR and the avg data should be still accurate by the time we return. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/