From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751612AbcGNQlO (ORCPT <rfc822;w@1wt.eu>);
	Thu, 14 Jul 2016 12:41:14 -0400
Received: from hqemgate14.nvidia.com ([216.228.121.143]:17678 "EHLO
	hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751175AbcGNQlN (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 14 Jul 2016 12:41:13 -0400
X-PGP-Universal: processed;
	by hqnvupgp08.nvidia.com on Thu, 14 Jul 2016 09:39:09 -0700
Message-ID: <5787C03B.8010906@nvidia.com>
Date: Thu, 14 Jul 2016 09:39:23 -0700
From: Sai Gurrappadi <sgurrappadi@nvidia.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: Morten Rasmussen <morten.rasmussen@arm.com>
CC: <peterz@infradead.org>, <mingo@redhat.com>, <dietmar.eggemann@arm.com>,
        <yuyang.du@intel.com>, <vincent.guittot@linaro.org>,
        <mgalbraith@suse.de>, <linux-kernel@vger.kernel.org>,
        Peter Boonstoppel <pboonstoppel@nvidia.com>
Subject: Re: [PATCH v2 11/13] sched/fair: Avoid pulling tasks from non-overloaded
 higher capacity groups
References: <1466615004-3503-1-git-send-email-morten.rasmussen@arm.com> <1466615004-3503-12-git-send-email-morten.rasmussen@arm.com> <576C52B0.5080504@nvidia.com> <20160630074958.GA12540@e105550-lin.cambridge.arm.com>
In-Reply-To: <20160630074958.GA12540@e105550-lin.cambridge.arm.com>
X-NVConfidentiality: public
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.187.121]
X-ClientProxiedBy: HQMAIL104.nvidia.com (172.18.146.11) To
 HQMAIL101.nvidia.com (172.20.187.10)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 06/30/2016 12:49 AM, Morten Rasmussen wrote:
> On Thu, Jun 23, 2016 at 02:20:48PM -0700, Sai Gurrappadi wrote:
>> Hi Morten,
>>
>> On 06/22/2016 10:03 AM, Morten Rasmussen wrote:
>>
>> [...]
>>
>>>  
>>> +/*
>>> + * group_smaller_cpu_capacity: Returns true if sched_group sg has smaller
>>> + * per-cpu capacity than sched_group ref.
>>> + */
>>> +static inline bool
>>> +group_smaller_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
>>> +{
>>> +	return sg->sgc->max_capacity * capacity_margin <
>>> +						ref->sgc->max_capacity * 1024;
>>> +}
>>> +
>>>  static inline enum
>>>  group_type group_classify(struct sched_group *group,
>>>  			  struct sg_lb_stats *sgs)
>>> @@ -6892,6 +6903,19 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>>>  	if (sgs->avg_load <= busiest->avg_load)
>>>  		return false;
>>>  
>>> +	if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
>>> +		goto asym_packing;
>>> +
>>> +	/* Candidate sg has no more than one task per cpu and has
>>> +	 * higher per-cpu capacity. Migrating tasks to less capable
>>> +	 * cpus may harm throughput. Maximize throughput,
>>> +	 * power/energy consequences are not considered.
>>> +	 */
>>> +	if (sgs->sum_nr_running <= sgs->group_weight &&
>>> +	    group_smaller_cpu_capacity(sds->local, sg))
>>> +		return false;
>>> +
>>> +asym_packing:
>>
>> What about the case where IRQ/RT work reduces the capacity of some of
>> these bigger CPUs? sgc->max_capacity might not necessarily capture
>> that case.
> 
> Right, we could possibly improve this by using min_capacity instead, but
> we could end up allowing tasks to be pulled to lower capacity cpus just
> because one big cpu has reduced capacity due to RT/IRQ pressure and
> therefore has lowered the groups min_capacity.
> 
> Ideally we should check all the capacities, but that complicates things
> a lot.
> 
> Would you prefer min_capacity instead, or attempts to consider all the
> cpu capacities available in both groups?
> 

min_capacity as a start works I think given that we are only trying to make existing LB better, not necessarily optimizing for every case. Might have to revisit this anyways for thermals etc.

Thanks,
-Sai