From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752883AbbDBA5L (ORCPT <rfc822;w@1wt.eu>);
	Wed, 1 Apr 2015 20:57:11 -0400
Received: from g9t5009.houston.hp.com ([15.240.92.67]:45537 "EHLO
	g9t5009.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752107AbbDBA5I (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 1 Apr 2015 20:57:08 -0400
Message-ID: <1427936126.2556.10.camel@j-VirtualBox>
Subject: Re: [PATCH V2] sched: Improve load balancing in the presence of
 idle CPUs
From: Jason Low <jason.low2@hp.com>
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>,
        Peter Zijlstra <peterz@infradead.org>,
        "mingo@kernel.org" <mingo@kernel.org>,
        "riel@redhat.com" <riel@redhat.com>,
        "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
        "vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
        "srikar@linux.vnet.ibm.com" <srikar@linux.vnet.ibm.com>,
        "pjt@google.com" <pjt@google.com>,
        "benh@kernel.crashing.org" <benh@kernel.crashing.org>,
        "efault@gmx.de" <efault@gmx.de>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "iamjoonsoo.kim@lge.com" <iamjoonsoo.kim@lge.com>,
        "svaidy@linux.vnet.ibm.com" <svaidy@linux.vnet.ibm.com>,
        "tim.c.chen@linux.intel.com" <tim.c.chen@linux.intel.com>,
        jason.low2@hp.com
Date: Wed, 01 Apr 2015 17:55:26 -0700
In-Reply-To: <20150401130355.GW18994@e105550-lin.cambridge.arm.com>
References: <20150326130014.21532.17158.stgit@preeti.in.ibm.com>
	 <20150327143839.GO18994@e105550-lin.cambridge.arm.com>
	 <55158966.4050300@linux.vnet.ibm.com>
	 <20150327175651.GR18994@e105550-lin.cambridge.arm.com>
	 <20150330110632.GT23123@twins.programming.kicks-ass.net>
	 <20150330120302.GT18994@e105550-lin.cambridge.arm.com>
	 <551A61A9.6020009@linux.vnet.ibm.com>
	 <1427823008.2492.19.camel@j-VirtualBox> <551B8FF3.70608@linux.vnet.ibm.com>
	 <20150401130355.GW18994@e105550-lin.cambridge.arm.com>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.2.3-0ubuntu6 
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, 2015-04-01 at 14:03 +0100, Morten Rasmussen wrote:

Hi Morten,

> > Alright I see. But it is one additional wake up. And the wake up will be
> > within the cluster. We will not wake up any CPU in the neighboring
> > cluster unless there are tasks to be pulled. So, we can wake up a core
> > out of a deep idle state and never a cluster in the problem described.
> > In terms of energy efficiency, this is not so bad a scenario, is it?
> 
> After Peter pointed out that it shouldn't happen across clusters due to
> group_classify()/sg_capacity_factor() it isn't as bad as I initially
> thought. It is still not an ideal solution I think. Wake-ups aren't nice
> for battery-powered devices. Waking up a cpu in an already active
> cluster may still imply powering up the core and bringing the L1 cache
> into a usable state, but it isn't as bad as waking up a cluster. I would
> prefer to avoid it if we can.

Right. I still think that the patch is justified if it addresses the 10
second latency issue, but if we could find a better solution, that would
be great :)

> Thinking more about it, don't we also risk doing a lot of iterations in
> nohz_idle_balance() leading to nothing (pure overhead) in certain corner
> cases? If find_new_ild() is the last cpu in the cluster and we have one
> task for each cpu in the cluster but one cpu is currently having two.
> Don't we end up trying all nohz-idle cpus before giving up and balancing
> the balancer cpu itself. On big machines, going through everyone could
> take a while I think. No?

Iterating through many CPUs could take a while, but since we only do
nohz_idle_balance() when the CPU is idle and exit if need_resched, then
we're only doing so if there is nothing else that needs to run.

Also, we're only attempting balancing when time_after_eq
rq->next_balance, so much of the time, we don't actually traverse all
the CPUs.

So this may not be too big of an issue.