From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754036AbcEBOuT (ORCPT ); Mon, 2 May 2016 10:50:19 -0400 Received: from mx2.suse.de ([195.135.220.15]:44493 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753598AbcEBOuI (ORCPT ); Mon, 2 May 2016 10:50:08 -0400 Message-ID: <1462200604.3736.42.camel@suse.de> Subject: Re: sched: tweak select_idle_sibling to look for idle threads From: Mike Galbraith To: Peter Zijlstra Cc: Chris Mason , Ingo Molnar , Matt Fleming , linux-kernel@vger.kernel.org Date: Mon, 02 May 2016 16:50:04 +0200 In-Reply-To: <20160502084615.GB3430@twins.programming.kicks-ass.net> References: <20160405180822.tjtyyc3qh4leflfj@floor.thefacebook.com> <20160409190554.honue3gtian2p6vr@floor.thefacebook.com> <20160430124731.GE2975@worktop.cust.blueprintrf.com> <1462086753.9717.29.camel@suse.de> <20160502084615.GB3430@twins.programming.kicks-ass.net> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2016-05-02 at 10:46 +0200, Peter Zijlstra wrote: > On Sun, May 01, 2016 at 09:12:33AM +0200, Mike Galbraith wrote: > > > Nah, tbench is just variance prone. It got dinged up at clients=cores > > on my desktop box, on 4 sockets the high end got seriously dinged up. > > > Ha!, check this: > > root@ivb-ep:~# echo OLD_IDLE > /debug/sched_features ; echo > NO_ORDER_IDLE > /debug/sched_features ; echo IDLE_CORE > > /debug/sched_features ; echo NO_FORCE_CORE > /debug/sched_features ; > tbench 20 -t 10 > > Throughput 5956.32 MB/sec 20 clients 20 procs max_latency=0.126 ms > > > root@ivb-ep:~# echo OLD_IDLE > /debug/sched_features ; echo ORDER_IDLE > > /debug/sched_features ; echo IDLE_CORE > /debug/sched_features ; echo > NO_FORCE_CORE > /debug/sched_features ; tbench 20 -t 10 > > Throughput 5011.86 MB/sec 20 clients 20 procs max_latency=0.116 ms > > > > That little ORDER_IDLE thing hurts silly. That's a little patch I had > lying about because some people complained that tasks hop around the > cache domain, instead of being stuck to a CPU. > > I suspect what happens is that by all CPUs starting to look for idle at > the same place (the first cpu in the domain) they all find the same idle > cpu and things pile up. > > The old behaviour, where they all start iterating from where they were > avoids some of that, at the cost of making tasks hop around. > > Lets see if I can get the same behaviour out of the cpumask iteration > code.. Order is one thing, but what the old behavior does first and foremost is when the box starts getting really busy, only looking at target's sibling shuts select_idle_sibling() down instead of letting it wreck things. Once cores are moving, there are no large piles of anything left to collect other than pain. We really need a good way to know we're not gonna turn the box into a shredder. The wake_wide() thing might help some, likely wants some twiddling, in_interrupt() might be another time to try hard. Anyway, the has_idle_cores business seems to shut select_idle_sibling() down rather nicely when the the box gets busy. Forcing either core, target's sibling or go fish turned in a top end win on 48 rq/socket. Oh btw, did you know single socket boxen have no sd_busy? That doesn't look right. fromm:~/:[0]# for i in 1 2 4 8 16 32 64 128 256; do tbench.sh $i 30 2>&1| grep Throughput; done Throughput 511.016 MB/sec 1 clients 1 procs max_latency=0.113 ms Throughput 1042.03 MB/sec 2 clients 2 procs max_latency=0.098 ms Throughput 1953.12 MB/sec 4 clients 4 procs max_latency=0.236 ms Throughput 3694.99 MB/sec 8 clients 8 procs max_latency=0.308 ms Throughput 7080.95 MB/sec 16 clients 16 procs max_latency=0.442 ms Throughput 13444.7 MB/sec 32 clients 32 procs max_latency=1.417 ms Throughput 20191.3 MB/sec 64 clients 64 procs max_latency=4.554 ms Throughput 41115.4 MB/sec 128 clients 128 procs max_latency=13.414 ms Throughput 66844.4 MB/sec 256 clients 256 procs max_latency=50.069 ms 5226 /* 5227 * If there are idle cores to be had, go find one. 5228 */ 5229 if (sched_feat(IDLE_CORE) && test_idle_cores(target)) { 5230 i = select_idle_core(p, target); 5231 if ((unsigned)i < nr_cpumask_bits) 5232 return i; 5233 5234 /* 5235 * Failed to find an idle core; stop looking for one. 5236 */ 5237 clear_idle_cores(target); 5238 } 5239 #if 1 5240 for_each_cpu(i, cpu_smt_mask(target)) { 5241 if (idle_cpu(i)) 5242 return i; 5243 } 5244 5245 return target; 5246 #endif 5247 5248 if (sched_feat(FORCE_CORE)) {