From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751824AbbLNOq1 (ORCPT ); Mon, 14 Dec 2015 09:46:27 -0500 Received: from foss.arm.com ([217.140.101.70]:43398 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751249AbbLNOq0 (ORCPT ); Mon, 14 Dec 2015 09:46:26 -0500 Date: Mon, 14 Dec 2015 14:46:46 +0000 From: Morten Rasmussen To: Peter Zijlstra Cc: Yuyang Du , Andrey Ryabinin , mingo@redhat.com, linux-kernel@vger.kernel.org, Paul Turner , Ben Segall Subject: Re: [PATCH] sched/fair: fix mul overflow on 32-bit systems Message-ID: <20151214144645.GA23930@e105550-lin.cambridge.arm.com> References: <1449838518-26543-1-git-send-email-aryabinin@virtuozzo.com> <20151211132551.GO6356@twins.programming.kicks-ass.net> <20151211133612.GG6373@twins.programming.kicks-ass.net> <566AD6E1.2070005@virtuozzo.com> <20151211175751.GA27552@e105550-lin.cambridge.arm.com> <20151213224224.GC28098@intel.com> <20151214115453.GN6357@twins.programming.kicks-ass.net> <20151214130723.GB9870@e105550-lin.cambridge.arm.com> <20151214142021.GO6357@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151214142021.GO6357@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 14, 2015 at 03:20:21PM +0100, Peter Zijlstra wrote: > On Mon, Dec 14, 2015 at 01:07:26PM +0000, Morten Rasmussen wrote: > > > Agreed, >100% is a transient state (which can be rather long) which only > > means over-utilized, nothing more. Would you like the metric itself to > > be changed to saturate at 100% or just cap it to 100% when used? > > We already cap it when using it IIRC. But no, I was thinking of the > measure itself. Yes, okay. > > > It is not straight forward to provide a bound on the sum. > > Agreed.. > > > There isn't one for load_avg either. > > But that one is fundamentally unbound, whereas the util thing is > fundamentally bound, except our implementation isn't. Agreed. > > > If we want to guarantee an upper bound for > > cfs_rq->avg.util_sum we have to somehow cap the se->avg.util_avg > > contributions for each sched_entity. This cap depends on the cpu and how > > many other tasks are associated with that cpu. The cap may have to > > change when tasks migrate. > > Yep, blows :-) > > > > However, I think that makes sense, but would propose doing it > > > differently. That condition is generally a maximum (assuming proper > > > functioning of the weight based scheduling etc..) for any one task, so > > > on migrate we can hard clip to this value. > > > Why use load.weight to scale util_avg? It is affected by priority. Isn't > > just the ratio 1/nr_running that you are after? > > Remember, the util thing is based on running, so assuming each task > always wants to run, each task gets to run w_i/\Sum_j w_j due to CFS > being a weighted fair queueing thingy. Of course, yes. > > > IIUC, you propose to clip the sum itself. In which case you are running > > into trouble when removing tasks. You don't know how much to remove from > > the clipped sum. > > Right, then we'll have to slowly gain it again. If you have a seriously over-utilized cpu and migrate some of the tasks to a different cpu the old cpu may temporarily look lightly utilized even if we leave some big tasks behind. That might lead us to trouble if we start using util_avg as the basis for cpufreq decisions. If we care about performance, the safe choice is to consider an cpu over-utilized still over-utilized even after we have migrated tasks away. We can only trust that the cpu is no longer over-utilized when cfs_rq->avg.util_avg 'naturally' goes below 100%. So from that point of view, it might be better to let it stay 100% and let it sort itself out. > > Another problem is that load.weight is just a snapshot while > > avg.util_avg includes tasks that are not currently on the rq so the > > scaling factor is probably bigger than what you want. > > Our weight guestimates also include non running (aka blocked) tasks, > right? The rq/cfs_rq load.weight doesn't. It is updated through update_load_{add,sub}() in account_entity_{enqueue,dequeue}(). So only runnable+running tasks I think. > > If we leave the sum as it is (unclipped) add/remove shouldn't give us > > any problems. The only problem is the overflow, which is solved by using > > a 64bit type for load_avg. That is not an acceptable solution? > > It might be. After all, any time any of this is needed we're CPU bound > and the utilization measure is pointless anyway. That measure only > matters if its small and the sum is 'small'. After that its back to the > normal load based thingy. Yes, agreed.