From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1430039AbdDYLF2 (ORCPT ); Tue, 25 Apr 2017 07:05:28 -0400 Received: from foss.arm.com ([217.140.101.70]:39616 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1031153AbdDYLFT (ORCPT ); Tue, 25 Apr 2017 07:05:19 -0400 Subject: Re: [PATCH v2] sched/cfs: make util/load_avg more stable To: Vincent Guittot , peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org References: <1492619370-29246-1-git-send-email-vincent.guittot@linaro.org> <1492620844-30979-1-git-send-email-vincent.guittot@linaro.org> Cc: Morten.Rasmussen@arm.com, yuyang.du@intel.com, pjt@google.com, bsegall@google.com From: Dietmar Eggemann Message-ID: Date: Tue, 25 Apr 2017 12:05:16 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1492620844-30979-1-git-send-email-vincent.guittot@linaro.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19/04/17 17:54, Vincent Guittot wrote: > In the current implementation of load/util_avg, we assume that the ongoing > time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX, > even if part of the time segment still remains to run. As a consequence, this > remaining part is considered as idle time and generates unexpected variations > of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should Why do you use the square brackets the other way around? Just curious. 1002 stands for 1024*y^1 w/ y = 4008/4096 or y^32 = 0.5, right ? Might be worth mentioning. > stay at 1023. > > In order to keep the metric stable, we should not consider the ongoing time > segment when computing load/util_avg but only the segments that have already > fully elapsed. Bu to not consider the current time segment adds unwanted > latency in the load/util_avg responsivness especially when the time is scaled > instead of the contribution. Instead of waiting for the current time segment > to have fully elapsed before accounting it in load/util_avg, we can already > account the elapsed part but change the range used to compute load/util_avg > accordingly. > > At the very beginning of a new time segment, the past segments have been > decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current > time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal > to MAX_LOAD_AVG. In fact, the max value is > sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment. > > Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the > range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib]. > > As the elapsed part is already accounted in load/util_sum, we update the max > value according to the current position in the time segment instead of > removing its contribution. Removing its contribution stands for '- 1024' of 'LOAD_AVG_MAX - 1024' which was added in patch 1/2? > > Suggested-by: Peter Zijlstra > Signed-off-by: Vincent Guittot > --- > > Fold both patches in one > > kernel/sched/fair.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 3f83a35..c3b8f0f 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3017,12 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, > /* > * Step 2: update *_avg. > */ > - sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX); > + sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); > if (cfs_rq) { > cfs_rq->runnable_load_avg = > - div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX); > + div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); > } > - sa->util_avg = sa->util_sum / LOAD_AVG_MAX; > + sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib); > > return 1; > } >