From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753824AbcEIHoT (ORCPT ); Mon, 9 May 2016 03:44:19 -0400 Received: from mx2.suse.de ([195.135.220.15]:45528 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753654AbcEIHoP (ORCPT ); Mon, 9 May 2016 03:44:15 -0400 Message-ID: <1462779853.3803.128.camel@suse.de> Subject: Re: sched: tweak select_idle_sibling to look for idle threads From: Mike Galbraith To: Yuyang Du Cc: Peter Zijlstra , Chris Mason , Ingo Molnar , Matt Fleming , linux-kernel@vger.kernel.org Date: Mon, 09 May 2016 09:44:13 +0200 In-Reply-To: <20160508202201.GM16093@intel.com> References: <20160405180822.tjtyyc3qh4leflfj@floor.thefacebook.com> <20160409190554.honue3gtian2p6vr@floor.thefacebook.com> <20160430124731.GE2975@worktop.cust.blueprintrf.com> <1462086753.9717.29.camel@suse.de> <20160501085303.GF2975@worktop.cust.blueprintrf.com> <1462094425.9717.45.camel@suse.de> <20160507012417.GK16093@intel.com> <1462694935.4155.83.camel@suse.de> <20160508185747.GL16093@intel.com> <1462765540.3803.44.camel@suse.de> <20160508202201.GM16093@intel.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2016-05-09 at 04:22 +0800, Yuyang Du wrote: > On Mon, May 09, 2016 at 05:45:40AM +0200, Mike Galbraith wrote: > > On Mon, 2016-05-09 at 02:57 +0800, Yuyang Du wrote: > > > On Sun, May 08, 2016 at 10:08:55AM +0200, Mike Galbraith wrote: > > > > > Maybe give the criteria a bit margin, not just wakees tend to equal llc_size, > > > > > but the numbers are so wild to easily break the fragile condition, like: > > > > > > > > Seems lockless traversal and averages just lets multiple CPUs select > > > > the same spot. An atomic reservation (feature) when looking for an > > > > idle spot (also for fork) might fix it up. Run the thing as RT, > > > > push/pull ensures that it reaches box saturation regardless of the > > > > number of messaging threads, whereas with fair class, any number > 1 > > > > will certainly stack tasks before the box is saturated. > > > > > > Yes, good idea, bringing order to the race to grab idle CPU is absolutely > > > helpful. > > > > Well, good ideas work, as yet this one helps jack diddly spit. > > Then a valid question is whether it is this selection screwed up in case > like this, as it should necessarily always be asked. That's a given, it's just a question of how to do a bit better cheaply. > > > Regarding wake_wide(), it seems the M:N is 1:24, not 6:6*24, if so, > > > the slave will be 0 forever (as last_wakee is never flipped). > > > > Yeah, it's irrelevant here, this load is all about instantaneous state. > > I could use a bit more of that, reserving on the wakeup side won't > > help this benchmark until everything else cares. One stack, and it's > > game over. It could help generic utilization and latency some.. but it > > seems kinda unlikely it'll be worth the cycle expenditure. > > Yes and no, it depends on how efficient work-stealing is, compared to > selection, but remember, at the end of the day, the wakee CPU measures the > latency, that CPU does not care it is selected or it steals. In a perfect world, running only Chris' benchmark on an otherwise idle box, there would never _be_ any work to steal. In the real world, we smooth utilization, optimistically peek at this/that, and intentionally throttle idle balancing (etc etc), which adds up to an imperfect world for this (based on real world load) benchmark. > En... should we try remove recording last_wakee? The more the merrier, go for it! :) -Mike