From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752613AbdIANhh (ORCPT ); Fri, 1 Sep 2017 09:37:37 -0400 Received: from merlin.infradead.org ([205.233.59.134]:54958 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752531AbdIANfX (ORCPT ); Fri, 1 Sep 2017 09:35:23 -0400 Message-Id: <20170901132748.738108335@infradead.org> User-Agent: quilt/0.63-1 Date: Fri, 01 Sep 2017 15:21:14 +0200 From: Peter Zijlstra To: mingo@kernel.org, linux-kernel@vger.kernel.org, tj@kernel.org, josef@toxicpanda.com Cc: torvalds@linux-foundation.org, vincent.guittot@linaro.org, efault@gmx.de, pjt@google.com, clm@fb.com, dietmar.eggemann@arm.com, morten.rasmussen@arm.com, bsegall@google.com, yuyang.du@intel.com, peterz@infradead.org Subject: [PATCH -v2 15/18] sched/fair: Align PELT windows between cfs_rq and its se References: <20170901132059.342024223@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline; filename=peterz-sched-fair-align-windows.patch Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The PELT _sum values are a saw-tooth function, dropping on the decay edge and then growing back up again during the window. When these window-edges are not aligned between cfs_rq and se, we can have the situation where, for example, on dequeue, the se decays first. Its _sum values will be small(er), while the cfs_rq _sum values will still be on their way up. Because of this, the subtraction: cfs_rq->avg._sum -= se->avg._sum will result in a positive value. This will then, once the cfs_rq reaches an edge, translate into its _avg value jumping up. This is especially visible with the runnable_load bits, since they get added/subtracted a lot. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 45 +++++++++++++++++++++++++++++++-------------- 1 file changed, 31 insertions(+), 14 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -729,13 +729,8 @@ void init_entity_runnable_average(struct { struct sched_avg *sa = &se->avg; - sa->last_update_time = 0; - /* - * sched_avg's period_contrib should be strictly less then 1024, so - * we give it 1023 to make sure it is almost a period (1024us), and - * will definitely be update (after enqueue). - */ - sa->period_contrib = 1023; + memset(sa, 0, sizeof(*sa)); + /* * Tasks are intialized with full load to be seen as heavy tasks until * they get a chance to stabilize to their real load level. @@ -744,13 +739,9 @@ void init_entity_runnable_average(struct */ if (entity_is_task(se)) sa->runnable_load_avg = sa->load_avg = scale_load_down(se->load.weight); - sa->runnable_load_sum = sa->load_sum = LOAD_AVG_MAX; - /* - * At this point, util_avg won't be used in select_task_rq_fair anyway - */ - sa->util_avg = 0; - sa->util_sum = 0; + se->runnable_weight = se->load.weight; + /* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */ } @@ -798,7 +789,6 @@ void post_init_entity_util_avg(struct sc } else { sa->util_avg = cap; } - sa->util_sum = sa->util_avg * LOAD_AVG_MAX; } if (entity_is_task(se)) { @@ -3644,7 +3634,34 @@ update_cfs_rq_load_avg(u64 now, struct c */ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { + u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib; + + /* + * When we attach the @se to the @cfs_rq, we must align the decay + * window because without that, really weird and wonderful things can + * happen. + * + * XXX illustrate + */ se->avg.last_update_time = cfs_rq->avg.last_update_time; + se->avg.period_contrib = cfs_rq->avg.period_contrib; + + /* + * Hell(o) Nasty stuff.. we need to recompute _sum based on the new + * period_contrib. This isn't strictly correct, but since we're + * entirely outside of the PELT hierarchy, nobody cares if we truncate + * _sum a little. + */ + se->avg.util_sum = se->avg.util_avg * divider; + + se->avg.load_sum = divider; + if (se_weight(se)) { + se->avg.load_sum = + div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se)); + } + + se->avg.runnable_load_sum = se->avg.load_sum; + enqueue_load_avg(cfs_rq, se); cfs_rq->avg.util_avg += se->avg.util_avg; cfs_rq->avg.util_sum += se->avg.util_sum;