From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932707AbbFCOui (ORCPT ); Wed, 3 Jun 2015 10:50:38 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:56023 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932567AbbFCOub (ORCPT ); Wed, 3 Jun 2015 10:50:31 -0400 Message-ID: <556F1411.6050206@fb.com> Date: Wed, 3 Jun 2015 10:49:53 -0400 From: Josef Bacik User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Peter Zijlstra , Rik van Riel CC: , , , , kernel-team Subject: Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE References: <1432761736-22093-1-git-send-email-jbacik@fb.com> <20150528102127.GD3644@twins.programming.kicks-ass.net> <20150528110514.GR18673@twins.programming.kicks-ass.net> <5568D43D.20703@fb.com> <556CB4A8.1050509@fb.com> <1433191354.11346.22.camel@twins> <556DE3FB.9020400@fb.com> <556F0B5E.6030805@redhat.com> <1433341448.1495.4.camel@twins> In-Reply-To: <1433341448.1495.4.camel@twins> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.14.151,1.0.33,0.0.0000 definitions=2015-06-03_08:2015-06-03,2015-06-03,1970-01-01 signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/03/2015 10:24 AM, Peter Zijlstra wrote: > On Wed, 2015-06-03 at 10:12 -0400, Rik van Riel wrote: > >> There is a policy vs mechanism thing here. Ingo and Peter >> are worried about the overhead in the mechanism of finding >> an idle CPU. Your measurements show that the policy of >> finding an idle CPU is the correct one. > > For his workload; I'm sure I can find a workload where it hurts. > > In fact, I'm fairly sure Mike knows one from the top of his head, seeing > how he's the one playing about trying to shrink that idle search :-) > So the perf bench sched microbenchmarks are a pretty good analog for our workload. I run perf bench sched messaging -g 100 -l 10000 perf bench sched pipe 5 times and average the results to get an answer, really the messaging one is closest one and the one I look at. I get like 56 seconds of runtime on plain 4.0 and 47 seconds patched, it's how I check my little experiments before doing the full real workload. I don't want to tune the scheduler just for our workload, but the microbenchmarks we have are also showing the same performance improvements. I would be super interested in workloads where this patch doesn't help so we could integrate that workload into perf sched bench to make us more confident in making policy changes in the scheduler. So Mike if you have something specific in mind please elaborate and I'm happy to do the legwork to get it into perf bench and to test things until we're happy. In the meantime I really want to get this fixed for us, I do not want to pull some weird old patch around for the next year until we rebase again next year, and then do this whole dance again. What would be the way forward for getting this fixed now? Do I need to hide it behind a sysctl or config option? Thanks, Josef