From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S964934AbbLOKVb (ORCPT <rfc822;w@1wt.eu>);
	Tue, 15 Dec 2015 05:21:31 -0500
Received: from mail-lf0-f50.google.com ([209.85.215.50]:36237 "EHLO
	mail-lf0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S964803AbbLOKV3 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 15 Dec 2015 05:21:29 -0500
MIME-Version: 1.0
In-Reply-To: <20151214002658.GD28098@intel.com>
References: <1448372970-8764-1-git-send-email-vincent.guittot@linaro.org> <20151214002658.GD28098@intel.com>
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: Tue, 15 Dec 2015 11:21:08 +0100
Message-ID: <CAKfTPtCe4L7BTgpONHwNbDRTZAoBRwNxtADE8TLxOh-WcL-MxA@mail.gmail.com>
Subject: Re: [PATCH] sched/fair: update scale invariance of pelt
To: Yuyang Du <yuyang.du@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Morten Rasmussen <Morten.Rasmussen@arm.com>,
        Linaro Kernel Mailman List <linaro-kernel@lists.linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Paul Turner <pjt@google.com>, Benjamin Segall <bsegall@google.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 14 December 2015 at 01:26, Yuyang Du <yuyang.du@intel.com> wrote:
> Hi Vincent,
>
> I don't quite catch what this is doing, maybe I need more time
> to ramp up to the gory detail difficult like this.
>
> Do you scale or not scale? You seem removed the scaling, but added it
> after "Remainder of delta accrued against u_0"..

I'm scaling the time before taking it in the pelt algorithm. My reply
to Morten's comment tries to explain more deeply what i'm trying to
achieve

Thanks,
Vincent

>
> Thanks,
> Yuyang
>
> On Tue, Nov 24, 2015 at 02:49:30PM +0100, Vincent Guittot wrote:
>> The current implementation of load tracking invariance scales the load
>> tracking value with current frequency and uarch performance (only for
>> utilization) of the CPU.
>>
>> One main result of the current formula is that the figures are capped by
>> the current capacity of the CPU. This limitation is the main reason of not
>> including the uarch invariance (arch_scale_cpu_capacity) in the calculation
>> of load_avg because capping the load can generate erroneous system load
>> statistic as described with this example [1]
>>
>> Instead of scaling the complete value of PELT algo, we should only scale
>> the running time by the current capacity of the CPU. It seems more correct
>> to only scale the running time because the non running time of a task
>> (sleeping or waiting for a runqueue) is the same whatever the current freq
>> and the compute capacity of the CPU.
>>
>> Then, one main advantage of this change is that the load of a task can
>> reach max value whatever the current freq and the uarch of the CPU on which
>> it run. It will just take more time at a lower freq than a max freq or on a
>> "little" CPU compared to a "big" one. The load and the utilization stay
>> invariant across system so we can still compared them between CPU but with
>> a wider range of values.
>>
>> With this change, we don't have to test if a CPU is overloaded or not in
>> order to use one metric (util) or another (load) as all metrics are always
>> valid.
>>
>> I have put below some examples of duration to reach some typical load value
>> according to the capacity of the CPU with current implementation
>> and with this patch.
>>
>> Util (%)     max capacity  half capacity(mainline)  half capacity(w/ patch)
>> 972 (95%)    138ms       not reachable            276ms
>> 486 (47.5%)  30ms        138ms                     60ms
>> 256 (25%)    13ms         32ms                     26ms
>>
>> We can see that at half capacity, we need twice the duration of max
>> capacity with this patch whereas we have a non linear increase of the
>> duration with current implementation.
>>
>> [1] https://lkml.org/lkml/2014/12/18/128
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  kernel/sched/fair.c | 28 +++++++++++++---------------
>>  1 file changed, 13 insertions(+), 15 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 824aa9f..f2a18e1 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2560,10 +2560,9 @@ static __always_inline int
>>  __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>                 unsigned long weight, int running, struct cfs_rq *cfs_rq)
>>  {
>> -     u64 delta, scaled_delta, periods;
>> +     u64 delta, periods;
>>       u32 contrib;
>> -     unsigned int delta_w, scaled_delta_w, decayed = 0;
>> -     unsigned long scale_freq, scale_cpu;
>> +     unsigned int delta_w, decayed = 0;
>>
>>       delta = now - sa->last_update_time;
>>       /*
>> @@ -2584,8 +2583,10 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>               return 0;
>>       sa->last_update_time = now;
>>
>> -     scale_freq = arch_scale_freq_capacity(NULL, cpu);
>> -     scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
>> +     if (running) {
>> +             delta = cap_scale(delta, arch_scale_freq_capacity(NULL, cpu));
>> +             delta = cap_scale(delta, arch_scale_cpu_capacity(NULL, cpu));
>> +     }
>>
>>       /* delta_w is the amount already accumulated against our next period */
>>       delta_w = sa->period_contrib;
>> @@ -2601,16 +2602,15 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>                * period and accrue it.
>>                */
>>               delta_w = 1024 - delta_w;
>> -             scaled_delta_w = cap_scale(delta_w, scale_freq);
>>               if (weight) {
>> -                     sa->load_sum += weight * scaled_delta_w;
>> +                     sa->load_sum += weight * delta_w;
>>                       if (cfs_rq) {
>>                               cfs_rq->runnable_load_sum +=
>> -                                             weight * scaled_delta_w;
>> +                                             weight * delta_w;
>>                       }
>>               }
>>               if (running)
>> -                     sa->util_sum += scaled_delta_w * scale_cpu;
>> +                     sa->util_sum += delta_w << SCHED_CAPACITY_SHIFT;
>>
>>               delta -= delta_w;
>>
>> @@ -2627,25 +2627,23 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>
>>               /* Efficiently calculate \sum (1..n_period) 1024*y^i */
>>               contrib = __compute_runnable_contrib(periods);
>> -             contrib = cap_scale(contrib, scale_freq);
>>               if (weight) {
>>                       sa->load_sum += weight * contrib;
>>                       if (cfs_rq)
>>                               cfs_rq->runnable_load_sum += weight * contrib;
>>               }
>>               if (running)
>> -                     sa->util_sum += contrib * scale_cpu;
>> +                     sa->util_sum += contrib << SCHED_CAPACITY_SHIFT;
>>       }
>>
>>       /* Remainder of delta accrued against u_0` */
>> -     scaled_delta = cap_scale(delta, scale_freq);
>>       if (weight) {
>> -             sa->load_sum += weight * scaled_delta;
>> +             sa->load_sum += weight * delta;
>>               if (cfs_rq)
>> -                     cfs_rq->runnable_load_sum += weight * scaled_delta;
>> +                     cfs_rq->runnable_load_sum += weight * delta;
>>       }
>>       if (running)
>> -             sa->util_sum += scaled_delta * scale_cpu;
>> +             sa->util_sum += delta << SCHED_CAPACITY_SHIFT;
>>
>>       sa->period_contrib += delta;
>>
>> --
>> 1.9.1