From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754925AbcIAIKA (ORCPT ); Thu, 1 Sep 2016 04:10:00 -0400 Received: from mail-lf0-f50.google.com ([209.85.215.50]:36674 "EHLO mail-lf0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753008AbcIAIJp (ORCPT ); Thu, 1 Sep 2016 04:09:45 -0400 MIME-Version: 1.0 In-Reply-To: <1472703062.3979.60.camel@gmail.com> References: <1472535775.3960.3.camel@suse.de> <20160831100117.GV10121@twins.programming.kicks-ass.net> <1472638699.3942.14.camel@suse.de> <1472639782.3942.27.camel@gmail.com> <1472703062.3979.60.camel@gmail.com> From: Vincent Guittot Date: Thu, 1 Sep 2016 10:09:22 +0200 Message-ID: Subject: Re: [patch v3.18+ regression fix] sched: Further improve spurious CPU_IDLE active migrations To: Mike Galbraith Cc: Peter Zijlstra , LKML , Rik van Riel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1 September 2016 at 06:11, Mike Galbraith wrote: > On Wed, 2016-08-31 at 17:52 +0200, Vincent Guittot wrote: >> On 31 August 2016 at 12:36, Mike Galbraith wrote: >> > On Wed, 2016-08-31 at 12:18 +0200, Mike Galbraith wrote: >> > > On Wed, 2016-08-31 at 12:01 +0200, Peter Zijlstra wrote: >> > >> > > > So 43f4d66637bc ("sched: Improve sysbench performance by fixing spurious >> > > > active migration") 's +1 made sense in that its a tie breaker. If you >> > > > have 3 tasks on 2 groups, one group will have to have 2 tasks, and >> > > > bouncing the one task around just isn't going to help _anything_. >> > > >> > > Yeah, but frequently tasks don't come in ones, so, you end up with an >> > > endless tug of war between LB ripping communicating buddies apart, and >> > > select_idle_sibling() pulling them back together.. bouncing cow >> > > syndrome. >> > >> >> replacing +1 by +2 fixes this use case that involves 2 threads but >> similar behavior can happen with 3 tasks on system with 4 cores per MC >> as an example >> >> IIUC, you have on >> - one side, periodic load balance that spreads the 2 tasks in the system >> - on the other side, wake up path that moves the task back in the same MC. > > Yup. > >> Isn't your regression more linked to spurious migration than where the >> task is scheduled ? I don't see any direct relation between the client >> and the server in this netperf test, isn't it ? > > netperf 4360 [004] 1207.865265: sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002 > netperf 4360 [004] 1207.865274: sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002 > netperf 4360 [004] 1207.865280: sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002 > netserver 4361 [002] 1207.865313: sched:sched_wakeup: netperf:4360 [120] success=1 CPU:004 > netperf 4360 [004] 1207.865340: sched:sched_wakeup: kworker/u16:4:89 [120] success=1 CPU:000 > netperf 4360 [004] 1207.865345: sched:sched_wakeup: kworker/u16:5:90 [120] success=1 CPU:006 > netperf 4360 [004] 1207.865355: sched:sched_wakeup: kworker/u16:5:90 [120] success=1 CPU:006 > netperf 4360 [004] 1207.865357: sched:sched_wakeup: kworker/u16:4:89 [120] success=1 CPU:000 > netperf 4360 [004] 1207.865369: sched:sched_wakeup: netserver:4361 [120] success=1 CPU:002 > netserver 4361 [002] 1207.865377: sched:sched_wakeup: netperf:4360 [120] success=1 CPU:004 > netperf 4360 [004] 1207.865476: sched:sched_wakeup: perf:4359 [120] success=1 CPU:003 I would have expected a net_rx softirq in the middle. Nevermind, i agree that we can find lot of use cases with communicating tasks > > It's not limited to this load, anything at all that is communicating > will do the same on these or similar processors. > > This trying to be perfect looks like a booboo to me, as we are now > specifically asking our left hand undo what our right hand did to crank > up throughput. For the diagnosed processor at least, one of those > hands definitely wants to be slapped. > > This doesn't seem to be an issue for L3 equipped CPUs, but perhaps is > for some even modern processors, dunno (the boxen where regression was > detected are far from new). > >> we could either remove the condition which tries to keep an even >> number of tasks in each group until busiest group becomes overloaded >> but it means that unrelated tasks may have to share same resources >> or we could try to prevent the migration at wake up. I was looking at >> wake_affine which seems to choose local cpu when both prev and local >> cpu are idle. I wonder if local cpu is really a better choice when >> both are idle > > I don't see a great alternative to turning it off off the top of my > head, at least for processors with multiple LLCs. Yeah, unrelated > tasks could end up sharing a cache needlessly, but will that hurt as > badly as tasks not munching tasty hot data definitely does? memory intensive task will probably be hurt > > -Mike