From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753861AbcDLEoM (ORCPT ); Tue, 12 Apr 2016 00:44:12 -0400 Received: from mx2.suse.de ([195.135.220.15]:54712 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750836AbcDLEoL (ORCPT ); Tue, 12 Apr 2016 00:44:11 -0400 Message-ID: <1460436248.3839.80.camel@suse.de> Subject: Re: sched: tweak select_idle_sibling to look for idle threads From: Mike Galbraith To: Chris Mason Cc: Peter Zijlstra , Ingo Molnar , Matt Fleming , linux-kernel@vger.kernel.org Date: Tue, 12 Apr 2016 06:44:08 +0200 In-Reply-To: <20160412003044.smr24xzuom3locvo@floor.thefacebook.com> References: <20160405180822.tjtyyc3qh4leflfj@floor.thefacebook.com> <20160409190554.honue3gtian2p6vr@floor.thefacebook.com> <1460282661.4251.44.camel@suse.de> <20160410195543.fp2tpixaafsts5x3@floor.thefacebook.com> <1460350461.3870.36.camel@suse.de> <20160412003044.smr24xzuom3locvo@floor.thefacebook.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2016-04-11 at 20:30 -0400, Chris Mason wrote: > On Mon, Apr 11, 2016 at 06:54:21AM +0200, Mike Galbraith wrote: > > > Ok, I was able to reproduce this by stuffing tbench_srv and tbench onto > > > just socket 0. Version 2 below fixes things for me, but I'm hoping > > > someone can suggest a way to get task_hot() buddy checks without the rq > > > lock. > > > > > > I haven't run this on production loads yet, but our 4.0 patch for this > > > uses task_hot(), so I'd expect it to be on par. If this doesn't fix it > > > for you, I'll dig up a similar machine on Monday. > > > > My box stopped caring. I personally would be reluctant to apply it > > without a "you asked for it" button or a large pile of benchmark > > results. Lock banging or not, full scan existing makes me nervous. > > > We can use a bitmap at the socket level to keep track of which cpus are > idle. I'm sure there are better places for the array and better ways to > allocate, this is just a rough cut to make sure the idle tracking works. See e0a79f529d5b: pre 15.22 MB/sec 1 procs post 252.01 MB/sec 1 procs You can make traverse cycles go away, but those cycles, while precious, are not the most costly cycles. The above was 1 tbench pair in an otherwise idle box.. ie it wasn't traverse cycles that demolished it. -Mike (p.s. SCHED_IDLE is dinky bandwidth fair class)