From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752118AbeFEOSR (ORCPT ); Tue, 5 Jun 2018 10:18:17 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:49850 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751282AbeFEOSQ (ORCPT ); Tue, 5 Jun 2018 10:18:16 -0400 Date: Tue, 5 Jun 2018 16:18:09 +0200 From: Peter Zijlstra To: Vincent Guittot Cc: Ingo Molnar , linux-kernel , "Rafael J. Wysocki" , Juri Lelli , Dietmar Eggemann , Morten Rasmussen , viresh kumar , Valentin Schneider , Quentin Perret Subject: Re: [PATCH v5 00/10] track CPU utilization Message-ID: <20180605141809.GV12180@hirez.programming.kicks-ass.net> References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> <20180604165047.GU12180@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 04, 2018 at 08:08:58PM +0200, Vincent Guittot wrote: > On 4 June 2018 at 18:50, Peter Zijlstra wrote: > > So this patch-set tracks the !cfs occupation using the same function, > > which is all good. But what, if instead of using that to compensate the > > OPP selection, we employ that to renormalize the util signal? > > > > If we normalize util against the dynamic (rt_avg affected) cpu_capacity, > > then I think your initial problem goes away. Because while the RT task > > will push the util to .5, it will at the same time push the CPU capacity > > to .5, and renormalized that gives 1. > > > > NOTE: the renorm would then become something like: > > scale_cpu = arch_scale_cpu_capacity() / rt_frac(); Should probably be: scale_cpu = atch_scale_cpu_capacity() / (1 - rt_frac()) > > > > > > On IRC I mentioned stopping the CFS clock when preempted, and while that > > would result in fixed numbers, Vincent was right in pointing out the > > numbers will be difficult to interpret, since the meaning will be purely > > CPU local and I'm not sure you can actually fix it again with > > normalization. > > > > Imagine, running a .3 RT task, that would push the (always running) CFS > > down to .7, but because we discard all !cfs time, it actually has 1. If > > we try and normalize that we'll end up with ~1.43, which is of course > > completely broken. > > > > > > _However_, all that happens for util, also happens for load. So the above > > scenario will also make the CPU appear less loaded than it actually is. > > The load will continue to increase because we track runnable state and > not running for the load Duh yes. So renormalizing it once, like proposed for util would actually do the right thing there too. Would not that allow us to get rid of much of the capacity magic in the load balance code? /me thinks more.. Bah, no.. because you don't want this dynamic renormalization part of the sums. So you want to keep it after the fact. :/ > As you mentioned, scale_rt_capacity give the remaining capacity for > cfs and it will behave like cfs util_avg now that it uses PELT. So as > long as cfs util_avg < scale_rt_capacity(we probably need a margin) > we keep using dl bandwidth + cfs util_avg + rt util_avg for selecting > OPP because we have remaining spare capacity but if cfs util_avg == > scale_rt_capacity, we make sure to use max OPP. Good point, when cfs-util < cfs-cap then there is idle time and the util number is 'right', when cfs-util == cfs-cap we're overcommitted and should go max. Since the util and cap values are aligned that should track nicely.