From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754097AbaAGUuW (ORCPT ); Tue, 7 Jan 2014 15:50:22 -0500 Received: from merlin.infradead.org ([205.233.59.134]:41323 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753339AbaAGUuQ (ORCPT ); Tue, 7 Jan 2014 15:50:16 -0500 Date: Tue, 7 Jan 2014 21:49:51 +0100 From: Peter Zijlstra To: Morten Rasmussen Cc: Vincent Guittot , Dietmar Eggemann , "linux-kernel@vger.kernel.org" , "mingo@kernel.org" , "pjt@google.com" , "cmetcalf@tilera.com" , "tony.luck@intel.com" , "alex.shi@linaro.org" , "preeti@linux.vnet.ibm.com" , "linaro-kernel@lists.linaro.org" , "paulmck@linux.vnet.ibm.com" , "corbet@lwn.net" , "tglx@linutronix.de" , "len.brown@intel.com" , "arjan@linux.intel.com" , "amit.kucheria@linaro.org" , "james.hogan@imgtec.com" , "schwidefsky@de.ibm.com" , "heiko.carstens@de.ibm.com" Subject: Re: [RFC] sched: CPU topology try Message-ID: <20140107204951.GD2480@laptop.programming.kicks-ass.net> References: <20131105222752.GD16117@laptop.programming.kicks-ass.net> <1387372431-2644-1-git-send-email-vincent.guittot@linaro.org> <52B87149.4010801@arm.com> <20140106163123.GN31570@twins.programming.kicks-ass.net> <20140107132220.GZ31570@twins.programming.kicks-ass.net> <20140107141059.GY3694@twins.programming.kicks-ass.net> <20140107154154.GH2936@e103034-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140107154154.GH2936@e103034-lin> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 07, 2014 at 03:41:54PM +0000, Morten Rasmussen wrote: > I think that could work if we sort of the priority scaling issue that I > mentioned before. We talked a bit about this on IRC a month or so ago, right? My memories from that are that your main complaint is that we don't detect the overload scenario right. That is; the point at which we should start caring about SMP-nice is when all our CPUs are fully occupied, because up to that point we're under utilized and work preservation mandates we utilize idle time. Currently we detect overload by sg.nr_running >= sg.capacity, which can be very misleading because while a cpu might have a task running 'now' it might be 99% idle. At which point I argued we should change the capacity thing anyhow. Ever since the runnable_avg patch set I've been arguing to change that into an actual utilization test. So I think that if we measure overload by something like >95% utilization on the entire group the load scaling again makes perfect sense. Given the 3 task {A,B,C} workload where A and B are niced, to land on a symmetric dual CPU system like: {A,B}+{C}, assuming they're all while(1) loops :-). The harder case is where all 3 tasks are of equal weight; in which case fairness would mandate we (slowly) rotate the tasks such that they all get 2/3 time -- we also horribly fail at this :-)