From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933048AbcECOcb (ORCPT ); Tue, 3 May 2016 10:32:31 -0400 Received: from merlin.infradead.org ([205.233.59.134]:52536 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932828AbcECOc3 (ORCPT ); Tue, 3 May 2016 10:32:29 -0400 Date: Tue, 3 May 2016 16:32:25 +0200 From: Peter Zijlstra To: Chris Mason , Mike Galbraith , Ingo Molnar , Matt Fleming , linux-kernel@vger.kernel.org Subject: Re: sched: tweak select_idle_sibling to look for idle threads Message-ID: <20160503143225.GG3448@twins.programming.kicks-ass.net> References: <20160405180822.tjtyyc3qh4leflfj@floor.thefacebook.com> <20160409190554.honue3gtian2p6vr@floor.thefacebook.com> <20160430124731.GE2975@worktop.cust.blueprintrf.com> <1462086753.9717.29.camel@suse.de> <20160502084615.GB3430@twins.programming.kicks-ass.net> <1462200604.3736.42.camel@suse.de> <20160502145817.GW3408@twins.programming.kicks-ass.net> <20160502154725.ckiewczbdubudyc7@floor.masoncoding.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160502154725.ckiewczbdubudyc7@floor.masoncoding.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 02, 2016 at 11:47:25AM -0400, Chris Mason wrote: > On Mon, May 02, 2016 at 04:58:17PM +0200, Peter Zijlstra wrote: > > On Mon, May 02, 2016 at 04:50:04PM +0200, Mike Galbraith wrote: > > > Oh btw, did you know single socket boxen have no sd_busy? That doesn't > > > look right. > > > > I suspected; didn't bother looking at yet. The 'problem' is that the LLC > > domain is the top-most, so it doesn't have a parent domain. I'm sure we > > can come up with something if we can get this all working right. > > > > And yes, I can get gains on various workloads with various options, I > > can even break all workloads, but I've so far completely failed on > > getting a win for everyone :/ > > Adding in the task_hot() check to decide if scanning idle was a good > idea ended up being really important So I'm conflicted on this patch: +static int bounce_to_target(struct task_struct *p, int cpu) +{ + s64 delta; + + /* + * as the run queue gets bigger, its more and more likely that + * balance will have distributed things for us, and less likely + * that scanning all our CPUs for an idle one will find one. + * So, if nr_running > 1, just call this CPU good enough + */ + if (cpu_rq(cpu)->cfs.nr_running > 1) + return 1; + + /* taken from task_hot() */ + delta = rq_clock_task(task_rq(p)) - p->se.exec_start; + return delta < (s64)sysctl_sched_migration_cost; +} This will work for you schbench workload because it sleep for 30ms while the migration_cost thingy is 500us, therefore you'll trigger the full LLC scan. _However_, the migration_cost is supposed the model the cost of leaving the LLC, so testing against that here seems wrong. Let me go play with something that measures the cost of doing that LLC scan and compares that against the sleepy time -- of course, now need to go figure out how to do this clock thing without rq-lock pain. + if (package_sd && !bounce_to_target(p, target)) { + for_each_cpu_and(i, sched_domain_span(package_sd), tsk_cpus_allowed(p)) { + if (idle_cpu(i)) { + target = i; + break; + } + + } + } Also note your s/sd/package_sd/ rename is, strictly speaking, wrong. Sure, on your current Intel system the LLC is the entire package, but this is not true in general. Take for instance the Intel Core2Quad and AMD Bulldozer thingies, they had two dies in one package, and correspondingly two LLC domains in one package. (also, the Intel cluster-on-die thing can split the thing in two) There were also the old P6 era SMP boards which had external LLC, where you could have an LLC shared across multiple packages -- although I'm thinking we'll never see that again, due to off package being far toooooo slooooooow these days.