From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754476AbbCMWwf (ORCPT <rfc822;w@1wt.eu>);
	Fri, 13 Mar 2015 18:52:35 -0400
Received: from hqemgate16.nvidia.com ([216.228.121.65]:19960 "EHLO
	hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751825AbbCMWwe (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 13 Mar 2015 18:52:34 -0400
X-PGP-Universal: processed;
	by hqnvupgp07.nvidia.com on Fri, 13 Mar 2015 15:50:55 -0700
Message-ID: <550368F4.5050905@nvidia.com>
Date: Fri, 13 Mar 2015 15:47:16 -0700
From: Sai Gurrappadi <sgurrappadi@nvidia.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0
MIME-Version: 1.0
To: Morten Rasmussen <morten.rasmussen@arm.com>,
        "peterz@infradead.org" <peterz@infradead.org>,
        "mingo@redhat.com" <mingo@redhat.com>
CC: "vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
        Dietmar Eggemann <Dietmar.Eggemann@arm.com>,
        "yuyang.du@intel.com" <yuyang.du@intel.com>,
        "preeti@linux.vnet.ibm.com" <preeti@linux.vnet.ibm.com>,
        "mturquette@linaro.org" <mturquette@linaro.org>,
        "nico@linaro.org" <nico@linaro.org>,
        "rjw@rjwysocki.net" <rjw@rjwysocki.net>,
        Juri Lelli <Juri.Lelli@arm.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFCv3 PATCH 33/48] sched: Energy-aware wake-up task placement
References: <1423074685-6336-1-git-send-email-morten.rasmussen@arm.com> <1423074685-6336-34-git-send-email-morten.rasmussen@arm.com>
In-Reply-To: <1423074685-6336-34-git-send-email-morten.rasmussen@arm.com>
X-NVConfidentiality: public
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 02/04/2015 10:31 AM, Morten Rasmussen wrote:
> Let available compute capacity and estimated energy impact select
> wake-up target cpu when energy-aware scheduling is enabled.
> energy_aware_wake_cpu() attempts to find group of cpus with sufficient
> compute capacity to accommodate the task and find a cpu with enough spare
> capacity to handle the task within that group. Preference is given to
> cpus with enough spare capacity at the current OPP. Finally, the energy
> impact of the new target and the previous task cpu is compared to select
> the wake-up target cpu.
> 
> cc: Ingo Molnar <mingo@redhat.com>
> cc: Peter Zijlstra <peterz@infradead.org>
> 
> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> ---
>  kernel/sched/fair.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 90 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b371f32..8713310 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5091,6 +5091,92 @@ static int select_idle_sibling(struct task_struct *p, int target)
>  done:
>  	return target;
>  }
> +
> +static unsigned long group_max_capacity(struct sched_group *sg)
> +{
> +	int max_idx;
> +
> +	if (!sg->sge)
> +		return 0;
> +
> +	max_idx = sg->sge->nr_cap_states-1;
> +
> +	return sg->sge->cap_states[max_idx].cap;
> +}
> +
> +static inline unsigned long task_utilization(struct task_struct *p)
> +{
> +	return p->se.avg.utilization_avg_contrib;
> +}
> +
> +static int cpu_overutilized(int cpu, struct sched_domain *sd)
> +{
> +	return (capacity_orig_of(cpu) * 100) <
> +				(get_cpu_usage(cpu) * sd->imbalance_pct);
> +}
> +
> +static int energy_aware_wake_cpu(struct task_struct *p)
> +{
> +	struct sched_domain *sd;
> +	struct sched_group *sg, *sg_target;
> +	int target_max_cap = SCHED_CAPACITY_SCALE;
> +	int target_cpu = task_cpu(p);
> +	int i;
> +
> +	sd = rcu_dereference(per_cpu(sd_ea, task_cpu(p)));
> +
> +	if (!sd)
> +		return -1;
> +
> +	sg = sd->groups;
> +	sg_target = sg;
> +	/* Find group with sufficient capacity */
> +	do {
> +		int sg_max_capacity = group_max_capacity(sg);
> +
> +		if (sg_max_capacity >= task_utilization(p) &&
> +				sg_max_capacity <= target_max_cap) {
> +			sg_target = sg;
> +			target_max_cap = sg_max_capacity;
> +		}
> +	} while (sg = sg->next, sg != sd->groups);

If a 'small' task suddenly becomes 'big' i.e close to 100% util, the
above loop would still pick the little/small cluster because
task_utilization(p) is upper-bounded by the arch-invariant capacity of
the little CPU/group right?

Also, this heuristic for determining sg_target is a big little
assumption. I don't think it is necessarily correct to assume that this
is true for all platforms. This heuristic should be derived from the
energy model for the platform instead.

> +
> +	/* Find cpu with sufficient capacity */
> +	for_each_cpu_and(i, tsk_cpus_allowed(p), sched_group_cpus(sg_target)) {
> +		int new_usage = get_cpu_usage(i) + task_utilization(p);

Isn't this double accounting the task's usage in case task_cpu(p)
belongs to sg_target?

> +
> +		if (new_usage >	capacity_orig_of(i))
> +			continue;
> +
> +		if (new_usage <	capacity_curr_of(i)) {
> +			target_cpu = i;
> +			if (!cpu_rq(i)->nr_running)
> +				break;
> +		}
> +
> +		/* cpu has capacity at higher OPP, keep it as fallback */
> +		if (target_cpu == task_cpu(p))
> +			target_cpu = i;
> +	}
> +
> +	if (target_cpu != task_cpu(p)) {
> +		struct energy_env eenv = {
> +			.usage_delta	= task_utilization(p),
> +			.src_cpu	= task_cpu(p),
> +			.dst_cpu	= target_cpu,
> +		};
> +
> +		/* Not enough spare capacity on previous cpu */
> +		if (cpu_overutilized(task_cpu(p), sd))
> +			return target_cpu;
> +
> +		if (energy_diff(&eenv) >= 0)
> +			return task_cpu(p);
> +	}
> +
> +	return target_cpu;
> +}
> +
>  /*
>   * select_task_rq_fair: Select target runqueue for the waking task in domains
>   * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
> @@ -5138,6 +5224,10 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
>  		prev_cpu = cpu;
>  
>  	if (sd_flag & SD_BALANCE_WAKE) {
> +		if (energy_aware()) {
> +			new_cpu = energy_aware_wake_cpu(p);
> +			goto unlock;
> +		}
>  		new_cpu = select_idle_sibling(p, prev_cpu);
>  		goto unlock;
>  	}
> 

-Sai