From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1757372AbbIURaK (ORCPT <rfc822;w@1wt.eu>);
	Mon, 21 Sep 2015 13:30:10 -0400
Received: from mail-pa0-f41.google.com ([209.85.220.41]:34083 "EHLO
	mail-pa0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1756591AbbIURaI (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 21 Sep 2015 13:30:08 -0400
From: bsegall@google.com
To: Yuyang Du <yuyang.du@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Morten Rasmussen <morten.rasmussen@arm.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Steve Muckle <steve.muckle@linaro.org>,
        "mingo\@redhat.com" <mingo@redhat.com>,
        "daniel.lezcano\@linaro.org" <daniel.lezcano@linaro.org>,
        "mturquette\@baylibre.com" <mturquette@baylibre.com>,
        "rjw\@rjwysocki.net" <rjw@rjwysocki.net>,
        Juri Lelli <Juri.Lelli@arm.com>,
        "sgurrappadi\@nvidia.com" <sgurrappadi@nvidia.com>,
        "pang.xunlei\@zte.com.cn" <pang.xunlei@zte.com.cn>,
        "linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig
References: <55EDDD5A.70904@arm.com>
	<CAKfTPtA0N-YTFMpN8-8ZbwakcsaY7=N4gnM5JivzCWsZnRRezQ@mail.gmail.com>
	<20150908122606.GH3644@twins.programming.kicks-ass.net>
	<20150908125205.GW18673@twins.programming.kicks-ass.net>
	<20150908143157.GA27098@e105550-lin.cambridge.arm.com>
	<20150908165331.GC27098@e105550-lin.cambridge.arm.com>
	<20150909094305.GO3644@twins.programming.kicks-ass.net>
	<20150909111309.GD27098@e105550-lin.cambridge.arm.com>
	<20150911172246.GI27098@e105550-lin.cambridge.arm.com>
	<20150917103825.GG3604@twins.programming.kicks-ass.net>
	<20150921011652.GD11102@intel.com>
Date: Mon, 21 Sep 2015 10:30:04 -0700
In-Reply-To: <20150921011652.GD11102@intel.com> (Yuyang Du's message of "Mon,
	21 Sep 2015 09:16:52 +0800")
Message-ID: <xm26d1xbader.fsf@sword-of-the-dawn.mtv.corp.google.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Yuyang Du <yuyang.du@intel.com> writes:

> On Thu, Sep 17, 2015 at 12:38:25PM +0200, Peter Zijlstra wrote:
>> On Fri, Sep 11, 2015 at 06:22:47PM +0100, Morten Rasmussen wrote:
>> 
>> > While at it, should I include Yuyang's patch redefining the SCALE/SHIFT
>> > mess?
>> 
>> I suspect his patch will fail to compile on ARM which uses
>> SCHED_CAPACITY_* outside of kernel/sched/*.
>> 
>> But if you all (Ben, Yuyang, you) can agree on a patch simplifying these
>> things I'm not opposed to it.
>
> Yes, indeed. So SCHED_RESOLUTION_SHIFT has to be defined in include/linux/sched.h.
>
> With this, I think the codes still need some cleanup, and importantly
> documentation.
>
> But first, I think as load_sum and load_avg can afford NICE_0_LOAD with either high
> or low resolution. So we have no reason to have low resolution (10bits) load_avg
> when NICE_0_LOAD has high resolution (20bits), because load_avg = runnable% * load,
> as opposed to now we have load_avg = runnable% * scale_load_down(load).
>
> We get rid of all scale_load_down() for runnable load average?

Hmm, LOAD_AVG_MAX * prio_to_weight[0] is 4237627662, ie barely within a
32-bit unsigned long, but in fact LOAD_AVG_MAX * MAX_SHARES is already
going to give errors on 32-bit (even with the old code in fact). This
should probably be fixed... somehow (dividing by 4 for load_sum on
32-bit would work, though be ugly. Reducing MAX_SHARES by 2 bits on
32-bit might have made sense but would be a weird difference between 32
and 64, and could break userspace anyway, so it's presumably too late
for that).

64-bit has ~30 bits free, so this would be fine so long as SLR is 0 on
32-bit.

>
> --
>
> Subject: [PATCH] sched/fair: Generalize the load/util averages resolution
>  definition
>
> The metric needs certain resolution to determine how much detail we
> can look into (or not losing detail by integer rounding), which also
> determines the range of the metrics.
>
> For instance, to increase the resolution of [0, 1] (two levels), one
> can multiply 1024 and get [0, 1024] (1025 levels).
>
> In sched/fair, a few metrics depend on the resolution: load/load_avg,
> util_avg, and capacity (frequency adjustment). In order to reduce the
> risks to make mistakes relating to resolution/range, we therefore
> generalize the resolution by defining a basic resolution constant
> number, and then formalize all metrics by depending on the basic
> resolution. The basic resolution is 1024 or (1 << 10). Further, one
> can recursively apply the basic resolution to increase the final
> resolution.
>
> Pointed out by Ben Segall, NICE_0's weight (visible to user) and load
> have independent resolution, but they must be well calibrated.
>
> Signed-off-by: Yuyang Du <yuyang.du@intel.com>
> ---
>  include/linux/sched.h |  9 ++++++---
>  kernel/sched/fair.c   |  4 ----
>  kernel/sched/sched.h  | 15 ++++++++++-----
>  3 files changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index bd38b3e..9b86f79 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -909,10 +909,13 @@ enum cpu_idle_type {
>  	CPU_MAX_IDLE_TYPES
>  };
>  
> +# define SCHED_RESOLUTION_SHIFT	10
> +# define SCHED_RESOLUTION_SCALE	(1L << SCHED_RESOLUTION_SHIFT)
> +
>  /*
>   * Increase resolution of cpu_capacity calculations
>   */
> -#define SCHED_CAPACITY_SHIFT	10
> +#define SCHED_CAPACITY_SHIFT	SCHED_RESOLUTION_SHIFT
>  #define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
>  
>  /*
> @@ -1180,8 +1183,8 @@ struct load_weight {
>   * 1) load_avg factors frequency scaling into the amount of time that a
>   * sched_entity is runnable on a rq into its weight. For cfs_rq, it is the
>   * aggregated such weights of all runnable and blocked sched_entities.
> - * 2) util_avg factors frequency and cpu scaling into the amount of time
> - * that a sched_entity is running on a CPU, in the range [0..SCHED_LOAD_SCALE].
> + * 2) util_avg factors frequency and cpu capacity scaling into the amount of time
> + * that a sched_entity is running on a CPU, in the range [0..SCHED_CAPACITY_SCALE].
>   * For cfs_rq, it is the aggregated such times of all runnable and
>   * blocked sched_entities.
>   * The 64 bit load_sum can:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4df37a4..c61fd8e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2522,10 +2522,6 @@ static u32 __compute_runnable_contrib(u64 n)
>  	return contrib + runnable_avg_yN_sum[n];
>  }
>  
> -#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT != 10
> -#error "load tracking assumes 2^10 as unit"
> -#endif
> -
>  #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>  
>  /*
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3845a71..31b4022 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -53,18 +53,23 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
>   * increased costs.
>   */
>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
> -# define SCHED_LOAD_RESOLUTION	10
> -# define scale_load(w)		((w) << SCHED_LOAD_RESOLUTION)
> -# define scale_load_down(w)	((w) >> SCHED_LOAD_RESOLUTION)
> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
> +# define scale_load(w)		((w) << SCHED_RESOLUTION_SHIFT)
> +# define scale_load_down(w)	((w) >> SCHED_RESOLUTION_SHIFT)
>  #else
> -# define SCHED_LOAD_RESOLUTION	0
> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT)
>  # define scale_load(w)		(w)
>  # define scale_load_down(w)	(w)
>  #endif
>  
> -#define SCHED_LOAD_SHIFT	(10 + SCHED_LOAD_RESOLUTION)
>  #define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)
>  
> +/*
> + * NICE_0's weight (visible to user) and its load (invisible to user) have
> + * independent resolution, but they should be well calibrated. We use scale_load()
> + * and scale_load_down(w) to convert between them, the following must be true:
> + * scale_load(prio_to_weight[20]) == NICE_0_LOAD
> + */
>  #define NICE_0_LOAD		SCHED_LOAD_SCALE
>  #define NICE_0_SHIFT		SCHED_LOAD_SHIFT

I still think tying the scale_load shift to be the same as the
SCHED_CAPACITY/etc shift is silly, and tying the NICE_0_LOAD/SHIFT in is
worse. Honestly if I was going to change anything it would be to define
NICE_0_LOAD/SHIFT entirely separately from SCHED_LOAD_SCALE/SHIFT.

However I'm not sure if calculate_imbalance's use of SCHED_LOAD_SCALE is
actually a separate use of 1024*SLR-as-percentage or is basically
assuming most tasks are nice-0 or what. It sure /looks/ like it's
comparing values with different units - it's doing (nr_running * CONST -
group_capacity) and comparing to load, so it looks like both (ie
increasing load.weight of everything on your system by X% would change
load balancer behavior here).

Given that it might make sense to make it clear that capacity units and
nice-0-task units have to be the same thing due to load balancer
approximations (though they are still entirely separate from the
SCHED_LOAD_RESOLUTION multiplier).