From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1754925AbcIAIKA (ORCPT <rfc822;w@1wt.eu>);
        Thu, 1 Sep 2016 04:10:00 -0400
Received: from mail-lf0-f50.google.com ([209.85.215.50]:36674 "EHLO
        mail-lf0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1753008AbcIAIJp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 1 Sep 2016 04:09:45 -0400
MIME-Version: 1.0
In-Reply-To: <1472703062.3979.60.camel@gmail.com>
References: <1472535775.3960.3.camel@suse.de> <20160831100117.GV10121@twins.programming.kicks-ass.net>
 <1472638699.3942.14.camel@suse.de> <1472639782.3942.27.camel@gmail.com>
 <CAKfTPtDjrXnpVGat+avYrER7gKFurQLF_PB1Vt2Nu1dKy2o8aQ@mail.gmail.com> <1472703062.3979.60.camel@gmail.com>
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: Thu, 1 Sep 2016 10:09:22 +0200
Message-ID: <CAKfTPtDxTH62HGrze+rSrw9+kZc6xHSfJemhWqxhyhLZzM0qDg@mail.gmail.com>
Subject: Re: [patch v3.18+ regression fix] sched: Further improve spurious
 CPU_IDLE active migrations
To: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
        LKML <linux-kernel@vger.kernel.org>, Rik van Riel <riel@redhat.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 1 September 2016 at 06:11, Mike Galbraith <umgwanakikbuti@gmail.com> wrote:
> On Wed, 2016-08-31 at 17:52 +0200, Vincent Guittot wrote:
>> On 31 August 2016 at 12:36, Mike Galbraith <umgwanakikbuti@gmail.com> wrote:
>> > On Wed, 2016-08-31 at 12:18 +0200, Mike Galbraith wrote:
>> > > On Wed, 2016-08-31 at 12:01 +0200, Peter Zijlstra wrote:
>> >
>> > > > So 43f4d66637bc ("sched: Improve sysbench performance by fixing spurious
>> > > > active migration") 's +1 made sense in that its a tie breaker. If you
>> > > > have 3 tasks on 2 groups, one group will have to have 2 tasks, and
>> > > > bouncing the one task around just isn't going to help _anything_.
>> > >
>> > > Yeah, but frequently tasks don't come in ones, so, you end up with an
>> > > endless tug of war between LB ripping communicating buddies apart, and
>> > > select_idle_sibling() pulling them back together.. bouncing cow
>> > > syndrome.
>> >
>>
>> replacing +1 by +2 fixes this use case that involves 2 threads but
>> similar behavior can happen with 3 tasks on system with 4 cores per MC
>> as an example
>>
>> IIUC, you have on
>> - one side, periodic load balance that spreads the 2 tasks in the system
>> - on the other side, wake up path that moves the task back in the same MC.
>
> Yup.
>
>> Isn't your regression more linked to spurious migration than where the
>> task is scheduled ? I don't see any direct relation between the client
>> and the server in this netperf test, isn't it ?
>
>          netperf  4360 [004]  1207.865265:       sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002
>          netperf  4360 [004]  1207.865274:       sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002
>          netperf  4360 [004]  1207.865280:       sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002
>        netserver  4361 [002]  1207.865313:       sched:sched_wakeup: netperf:4360 [120] success=1 CPU:004
>          netperf  4360 [004]  1207.865340:       sched:sched_wakeup: kworker/u16:4:89 [120] success=1 CPU:000
>          netperf  4360 [004]  1207.865345:       sched:sched_wakeup: kworker/u16:5:90 [120] success=1 CPU:006
>          netperf  4360 [004]  1207.865355:       sched:sched_wakeup: kworker/u16:5:90 [120] success=1 CPU:006
>          netperf  4360 [004]  1207.865357:       sched:sched_wakeup: kworker/u16:4:89 [120] success=1 CPU:000
>          netperf  4360 [004]  1207.865369:       sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002
>        netserver  4361 [002]  1207.865377:       sched:sched_wakeup: netperf:4360 [120] success=1 CPU:004
>          netperf  4360 [004]  1207.865476:       sched:sched_wakeup: perf:4359 [120] success=1 CPU:003

I would have expected a net_rx softirq in the middle.
Nevermind, i agree that we can find lot of use cases with communicating tasks

>
> It's not limited to this load, anything at all that is communicating
> will do the same on these or similar processors.
>
> This trying to be perfect looks like a booboo to me, as we are now
> specifically asking our left hand undo what our right hand did to crank
> up throughput.  For the diagnosed processor at least, one of those
> hands definitely wants to be slapped.
>
> This doesn't seem to be an issue for L3 equipped CPUs, but perhaps is
> for some even modern processors, dunno (the boxen where regression was
> detected are far from new).
>
>> we could either remove the condition which tries to keep an even
>> number of tasks in each group until busiest group becomes overloaded
>> but it means that unrelated tasks may have to share same resources
>> or we could try to prevent the migration at wake up. I was looking at
>> wake_affine which seems to choose local cpu  when both prev and local
>> cpu are idle. I wonder if local cpu is  really a better choice when
>> both are idle
>
> I don't see a great alternative to turning it off off the top of my
> head, at least for processors with multiple LLCs.  Yeah, unrelated
> tasks could end up sharing a cache needlessly, but will that hurt as
> badly as tasks not munching tasty hot data definitely does?

memory intensive task will probably be hurt

>
>         -Mike