From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932707AbbFCOui (ORCPT <rfc822;w@1wt.eu>);
	Wed, 3 Jun 2015 10:50:38 -0400
Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:56023 "EHLO
	mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
	by vger.kernel.org with ESMTP id S932567AbbFCOub (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 3 Jun 2015 10:50:31 -0400
Message-ID: <556F1411.6050206@fb.com>
Date: Wed, 3 Jun 2015 10:49:53 -0400
From: Josef Bacik <jbacik@fb.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: Peter Zijlstra <peterz@infradead.org>, Rik van Riel <riel@redhat.com>
CC: <mingo@redhat.com>, <linux-kernel@vger.kernel.org>,
        <umgwanakikbuti@gmail.com>, <morten.rasmussen@arm.com>,
        kernel-team <Kernel-team@fb.com>
Subject: Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for
 BALANCE_WAKE
References: <1432761736-22093-1-git-send-email-jbacik@fb.com>		 <20150528102127.GD3644@twins.programming.kicks-ass.net>		 <20150528110514.GR18673@twins.programming.kicks-ass.net>		 <5568D43D.20703@fb.com> <556CB4A8.1050509@fb.com>	 <1433191354.11346.22.camel@twins> <556DE3FB.9020400@fb.com>	 <556F0B5E.6030805@redhat.com> <1433341448.1495.4.camel@twins>
In-Reply-To: <1433341448.1495.4.camel@twins>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
X-Originating-IP: [192.168.52.123]
X-Proofpoint-Spam-Reason: safe
X-FB-Internal: Safe
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.14.151,1.0.33,0.0.0000
 definitions=2015-06-03_08:2015-06-03,2015-06-03,1970-01-01 signatures=0
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 06/03/2015 10:24 AM, Peter Zijlstra wrote:
> On Wed, 2015-06-03 at 10:12 -0400, Rik van Riel wrote:
>
>> There is a policy vs mechanism thing here. Ingo and Peter
>> are worried about the overhead in the mechanism of finding
>> an idle CPU.  Your measurements show that the policy of
>> finding an idle CPU is the correct one.
>
> For his workload; I'm sure I can find a workload where it hurts.
>
> In fact, I'm fairly sure Mike knows one from the top of his head, seeing
> how he's the one playing about trying to shrink that idle search :-)
>

So the perf bench sched microbenchmarks are a pretty good analog for our 
workload.  I run

perf bench sched messaging -g 100 -l 10000
perf bench sched pipe

5 times and average the results to get an answer, really the messaging 
one is closest one and the one I look at.  I get like 56 seconds of 
runtime on plain 4.0 and 47 seconds patched, it's how I check my little 
experiments before doing the full real workload.

I don't want to tune the scheduler just for our workload, but the 
microbenchmarks we have are also showing the same performance 
improvements.  I would be super interested in workloads where this patch 
doesn't help so we could integrate that workload into perf sched bench 
to make us more confident in making policy changes in the scheduler.  So 
Mike if you have something specific in mind please elaborate and I'm 
happy to do the legwork to get it into perf bench and to test things 
until we're happy.

In the meantime I really want to get this fixed for us, I do not want to 
pull some weird old patch around for the next year until we rebase again 
next year, and then do this whole dance again.  What would be the way 
forward for getting this fixed now?  Do I need to hide it behind a 
sysctl or config option?  Thanks,

Josef