From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1750942AbcEICj1 (ORCPT <rfc822;w@1wt.eu>);
	Sun, 8 May 2016 22:39:27 -0400
Received: from mga02.intel.com ([134.134.136.20]:38337 "EHLO mga02.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750879AbcEICj0 (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Sun, 8 May 2016 22:39:26 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.24,599,1455004800"; 
   d="scan'208";a="99569923"
Date: Mon, 9 May 2016 02:57:47 +0800
From: Yuyang Du <yuyang.du@intel.com>
To: Mike Galbraith <mgalbraith@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>, Chris Mason <clm@fb.com>,
        Ingo Molnar <mingo@kernel.org>,
        Matt Fleming <matt@codeblueprint.co.uk>, linux-kernel@vger.kernel.org
Subject: Re: sched: tweak select_idle_sibling to look for idle threads
Message-ID: <20160508185747.GL16093@intel.com>
References: <20160405180822.tjtyyc3qh4leflfj@floor.thefacebook.com>
 <20160409190554.honue3gtian2p6vr@floor.thefacebook.com>
 <20160430124731.GE2975@worktop.cust.blueprintrf.com>
 <1462086753.9717.29.camel@suse.de>
 <20160501085303.GF2975@worktop.cust.blueprintrf.com>
 <1462094425.9717.45.camel@suse.de>
 <20160507012417.GK16093@intel.com>
 <1462694935.4155.83.camel@suse.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1462694935.4155.83.camel@suse.de>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Sun, May 08, 2016 at 10:08:55AM +0200, Mike Galbraith wrote:
> > Maybe give the criteria a bit margin, not just wakees tend to equal llc_size,
> > but the numbers are so wild to easily break the fragile condition, like:
> 
> Seems lockless traversal and averages just lets multiple CPUs select
> the same spot.  An atomic reservation (feature) when looking for an
> idle spot (also for fork) might fix it up.  Run the thing as RT,
> push/pull ensures that it reaches box saturation regardless of the
> number of messaging threads, whereas with fair class, any number > 1
> will certainly stack tasks before the box is saturated.

Yes, good idea, bringing order to the race to grab idle CPU is absolutely
helpful.

In addition, I would argue maybe beefing up idle balancing is a more
productive way to spread load, as work-stealing just does what needs
to be done. And seems it has been (sub-unconsciously) neglected in this
case, :)

Regarding wake_wide(), it seems the M:N is 1:24, not 6:6*24, if so,
the slave will be 0 forever (as last_wakee is never flipped).

Basically whenever a waker has more than 1 wakee, the wakee_flips
will comfortably grow very large (with last_wakee alternating),
whereas when a waker has 0 or 1 wakee, the wakee_flips will just be 0.

So recording only the last_wakee seems not right unless you have other
good reason. If not the latter, counting waking wakee times should be
better, and then allow the statistics to happily play.