From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752194AbcDFNgu (ORCPT <rfc822;w@1wt.eu>);
	Wed, 6 Apr 2016 09:36:50 -0400
Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:55984 "EHLO
	mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1751355AbcDFNgs (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 6 Apr 2016 09:36:48 -0400
Date: Wed, 6 Apr 2016 09:36:14 -0400
From: Chris Mason <clm@fb.com>
To: Mike Galbraith <mgalbraith@suse.de>
CC: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        Matt Fleming <matt@codeblueprint.co.uk>,
        <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC] select_idle_sibling experiments
Message-ID: <20160406133614.7majuigbr7wmduks@floor.thefacebook.com>
Mail-Followup-To: Chris Mason <clm@fb.com>,
	Mike Galbraith <mgalbraith@suse.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Matt Fleming <matt@codeblueprint.co.uk>,
	linux-kernel@vger.kernel.org
References: <20160405180822.tjtyyc3qh4leflfj@floor.thefacebook.com>
 <1459927644.5612.41.camel@suse.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <1459927644.5612.41.camel@suse.de>
User-Agent: Mutt/1.5.23.1 (2014-03-12)
X-Originating-IP: [192.168.52.123]
X-Proofpoint-Spam-Reason: safe
X-FB-Internal: Safe
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-04-06_07:,,
 signatures=0
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Apr 06, 2016 at 09:27:24AM +0200, Mike Galbraith wrote:
> > On Tue, 2016-04-05 at 14:08 -0400, Chris Mason wrote:
> 
> > Now, on to the patch.  I pushed some code around and narrowed the
> > problem down to select_idle_sibling()   We have cores going into and out
> > of idle fast enough that even this cut our latencies in half:
> 
> Are you using NO_HZ?  If so, you may want to try the attached.

I'll definitely give it a shot.  When I tried using the nohz idle bitmap
(Peter's idea) instead of the for_each_cpu() walks, it came out slower.
It feels like the cpus aren't getting all the way down into the idle
loop before more work comes, but I'll have to check.

> 
> > static int select_idle_sibling(struct task_struct *p, int target)
> >                                 goto next;
> >  
> >                         for_each_cpu(i, sched_group_cpus(sg)) {
> > -                               if (i == target || !idle_cpu(i))
> > +                               if (!idle_cpu(i))
> >                                         goto next;
> >                         }
> >  
> > IOW, by the time we get down to for_each_cpu(), the idle_cpu() check
> > done at the top of the function is no longer valid.
> 
> Ok, that's only an optimization, could go if it's causing trouble.

It's more an indication of how long we're spending in the current scan.
Long enough for the tests we're currently doing to be inaccurate.

[ my beautiful patch ]

> Ew.  That may improve your latency is everything load, but worst case
> package walk will hurt like hell on CPUs with insane number of threads.
>
>  That full search also turns the evil face of two-faced little
> select_idle_sibling() into it's only face, the one that bounces tasks
> about much more than they appreciate.
> 
> Looking for an idle core first delivers the most throughput boost, and
> only looking at target's threads if you don't find one keeps the bounce
> and traverse pain down to a dull roar, while at least trying to get
> that latency win.  To me, your patch looks like it trades harm to many,
> for good for a few.

Yes, I'm tossing an important optimization.  The goal wasn't to get rid
of that at all, but instead to find a way to get both.  I just ran out
of ideas ;)

> 
> A behavior switch would be better.  It can't get any dumber, but trying
> to make it smarter makes it too damn fat.  As it sits, it's aiming in
> the general direction of the bullseye.. and occasionally hits the wall.
> 
> 	-Mike
>
> sched: ratelimit nohz
> 
> Entering nohz code on every micro-idle is too expensive to bear.

This I really like.  I'll setup a benchmark in production with it and
come back with results.

-chris