From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1759509AbcHaKSX (ORCPT <rfc822;w@1wt.eu>);
        Wed, 31 Aug 2016 06:18:23 -0400
Received: from mx2.suse.de ([195.135.220.15]:42308 "EHLO mx2.suse.de"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1759329AbcHaKSV (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 31 Aug 2016 06:18:21 -0400
Message-ID: <1472638699.3942.14.camel@suse.de>
Subject: Re: [patch v3.18+ regression fix] sched: Further improve spurious
 CPU_IDLE active migrations
From: Mike Galbraith <mgalbraith@suse.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>, Rik van Riel <riel@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>
Date: Wed, 31 Aug 2016 12:18:19 +0200
In-Reply-To: <20160831100117.GV10121@twins.programming.kicks-ass.net>
References: <1472535775.3960.3.camel@suse.de>
         <20160831100117.GV10121@twins.programming.kicks-ass.net>
Content-Type: text/plain; charset="us-ascii"
X-Mailer: Evolution 3.16.5 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, 2016-08-31 at 12:01 +0200, Peter Zijlstra wrote:
> On Tue, Aug 30, 2016 at 07:42:55AM +0200, Mike Galbraith wrote:
> > 
> > 43f4d666 partially cured spurious migrations, but when there are
> > completely idle groups on a lightly loaded processor, and there is
> > a buddy pair occupying the busiest group, we will not attempt to
> > migrate due to select_idle_sibling() buddy placement, leaving the
> > busiest queue with one task.  We skip balancing, but increment
> > nr_balance_failed until we kick active balancing, and bounce a
> > buddy pair endlessly, demolishing throughput.
> 
> Have you ran this patch through other benchmarks? It looks like
> something that might make something else go funny.

No, but it will be going through SUSE's performance test grid.

> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7249,11 +7249,12 @@ static struct sched_group *find_busiest_
> >  > > 	> > 	> >  * This cpu is idle. If the busiest group is not overloaded
> >  > > 	> > 	> >  * and there is no imbalance between this and busiest group
> >  > > 	> > 	> >  * wrt idle cpus, it is balanced. The imbalance becomes
> > -> > 	> > 	> >  * significant if the diff is greater than 1 otherwise we
> > -> > 	> > 	> >  * might end up to just move the imbalance on another group
> > +> > 	> > 	> >  * significant if the diff is greater than 2 otherwise we
> > +> > 	> > 	> >  * may end up merely moving the imbalance to another group,
> > +> > 	> > 	> >  * or bouncing a buddy pair needlessly.
> >  > > 	> > 	> >  */
> >  > > 	> > 	> > if ((busiest->group_type != group_overloaded) &&
> > -> > 	> > 	> > 	> > 	> > (local->idle_cpus <= (busiest->idle_cpus + 1)))
> > +> > 	> > 	> > 	> > 	> > (local->idle_cpus <= (busiest->idle_cpus + 2)))
> >  > > 	> > 	> > 	> > goto out_balanced;
> 
> So 43f4d66637bc ("sched: Improve sysbench performance by fixing spurious
> active migration") 's +1 made sense in that its a tie breaker. If you
> have 3 tasks on 2 groups, one group will have to have 2 tasks, and
> bouncing the one task around just isn't going to help _anything_.

Yeah, but frequently tasks don't come in ones, so, you end up with an
endless tug of war between LB ripping communicating buddies apart, and
select_idle_sibling() pulling them back together.. bouncing cow
syndrome.

> Incrementing that to +2 has the effect that if you have two tasks on two
> groups, 0,2 is a valid distribution. Which I understand is exactly what
> you want for this workload. But if the two tasks are unrelated, 1,1
> really is a better spread.

True.  Better ideas welcome.

	-Mike