From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752194AbcDFNgu (ORCPT ); Wed, 6 Apr 2016 09:36:50 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:55984 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751355AbcDFNgs (ORCPT ); Wed, 6 Apr 2016 09:36:48 -0400 Date: Wed, 6 Apr 2016 09:36:14 -0400 From: Chris Mason To: Mike Galbraith CC: Peter Zijlstra , Ingo Molnar , Matt Fleming , Subject: Re: [PATCH RFC] select_idle_sibling experiments Message-ID: <20160406133614.7majuigbr7wmduks@floor.thefacebook.com> Mail-Followup-To: Chris Mason , Mike Galbraith , Peter Zijlstra , Ingo Molnar , Matt Fleming , linux-kernel@vger.kernel.org References: <20160405180822.tjtyyc3qh4leflfj@floor.thefacebook.com> <1459927644.5612.41.camel@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <1459927644.5612.41.camel@suse.de> User-Agent: Mutt/1.5.23.1 (2014-03-12) X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-04-06_07:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 06, 2016 at 09:27:24AM +0200, Mike Galbraith wrote: > > On Tue, 2016-04-05 at 14:08 -0400, Chris Mason wrote: > > > Now, on to the patch. I pushed some code around and narrowed the > > problem down to select_idle_sibling() We have cores going into and out > > of idle fast enough that even this cut our latencies in half: > > Are you using NO_HZ? If so, you may want to try the attached. I'll definitely give it a shot. When I tried using the nohz idle bitmap (Peter's idea) instead of the for_each_cpu() walks, it came out slower. It feels like the cpus aren't getting all the way down into the idle loop before more work comes, but I'll have to check. > > > static int select_idle_sibling(struct task_struct *p, int target) > > goto next; > > > > for_each_cpu(i, sched_group_cpus(sg)) { > > - if (i == target || !idle_cpu(i)) > > + if (!idle_cpu(i)) > > goto next; > > } > > > > IOW, by the time we get down to for_each_cpu(), the idle_cpu() check > > done at the top of the function is no longer valid. > > Ok, that's only an optimization, could go if it's causing trouble. It's more an indication of how long we're spending in the current scan. Long enough for the tests we're currently doing to be inaccurate. [ my beautiful patch ] > Ew. That may improve your latency is everything load, but worst case > package walk will hurt like hell on CPUs with insane number of threads. > > That full search also turns the evil face of two-faced little > select_idle_sibling() into it's only face, the one that bounces tasks > about much more than they appreciate. > > Looking for an idle core first delivers the most throughput boost, and > only looking at target's threads if you don't find one keeps the bounce > and traverse pain down to a dull roar, while at least trying to get > that latency win. To me, your patch looks like it trades harm to many, > for good for a few. Yes, I'm tossing an important optimization. The goal wasn't to get rid of that at all, but instead to find a way to get both. I just ran out of ideas ;) > > A behavior switch would be better. It can't get any dumber, but trying > to make it smarter makes it too damn fat. As it sits, it's aiming in > the general direction of the bullseye.. and occasionally hits the wall. > > -Mike > > sched: ratelimit nohz > > Entering nohz code on every micro-idle is too expensive to bear. This I really like. I'll setup a benchmark in production with it and come back with results. -chris