From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1755332AbcHSBnG (ORCPT <rfc822;w@1wt.eu>);
        Thu, 18 Aug 2016 21:43:06 -0400
Received: from mail-wm0-f54.google.com ([74.125.82.54]:35771 "EHLO
        mail-wm0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1754528AbcHSBnD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 Aug 2016 21:43:03 -0400
MIME-Version: 1.0
In-Reply-To: <20160818134517.GC27873@e105550-lin.cambridge.arm.com>
References: <1469453670-2660-1-git-send-email-morten.rasmussen@arm.com>
 <1469453670-2660-11-git-send-email-morten.rasmussen@arm.com>
 <20160815142342.GV6879@twins.programming.kicks-ass.net> <20160815154237.GE3391@e105550-lin.cambridge.arm.com>
 <20160818084053.GG3391@e105550-lin.cambridge.arm.com> <20160818102438.GA27873@e105550-lin.cambridge.arm.com>
 <CANRm+Czjkn+gnzEhAtf0muRhBPs0FoXYFgnEYdzvXcjqBgXUyw@mail.gmail.com> <20160818134517.GC27873@e105550-lin.cambridge.arm.com>
From: Wanpeng Li <kernellwp@gmail.com>
Date: Fri, 19 Aug 2016 09:43:00 +0800
Message-ID: <CANRm+Cx7YScEtxhSag_uqzdOei+kEjSYsTMHeoMdYq-ijLGGAQ@mail.gmail.com>
Subject: Re: [PATCH v3 10/13] sched/fair: Compute task/cpu utilization at
 wake-up more correctly
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Yuyang Du <yuyang.du@intel.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Mike Galbraith <mgalbraith@suse.de>, sgurrappadi@nvidia.com,
        Koan-Sin Tan <freedom.tan@mediatek.com>,
        =?UTF-8?B?5bCP5p6X5pWs5aSq?= <keita.kobayashi.ym@renesas.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

2016-08-18 21:45 GMT+08:00 Morten Rasmussen <morten.rasmussen@arm.com>:
> On Thu, Aug 18, 2016 at 07:46:44PM +0800, Wanpeng Li wrote:
>> 2016-08-18 18:24 GMT+08:00 Morten Rasmussen <morten.rasmussen@arm.com>:
>> > On Thu, Aug 18, 2016 at 09:40:55AM +0100, Morten Rasmussen wrote:
>> >> On Mon, Aug 15, 2016 at 04:42:37PM +0100, Morten Rasmussen wrote:
>> >> > On Mon, Aug 15, 2016 at 04:23:42PM +0200, Peter Zijlstra wrote:
>> >> > > But unlike that function, it doesn't actually use __update_load_avg().
>> >> > > Why not?
>> >> >
>> >> > Fair question :)
>> >> >
>> >> > We currently exploit the fact that the task utilization is _not_ updated
>> >> > in wake-up balancing to make sure we don't under-estimate the capacity
>> >> > requirements for tasks that have slept for a while. If we update it, we
>> >> > loose the non-decayed 'peak' utilization, but I guess we could just
>> >> > store it somewhere when we do the wake-up decay.
>> >> >
>> >> > I thought there was a better reason when I wrote the patch, but I don't
>> >> > recall right now. I will look into it again and see if we can use
>> >> > __update_load_avg() to do a proper update instead of doing things twice.
>> >>
>> >> AFAICT, we should be able to synchronize the task utilization to the
>> >> previous rq utilization using __update_load_avg() as you suggest. The
>> >> patch below is should work as a replacement without any changes to
>> >> subsequent patches. It doesn't solve the under-estimation issue, but I
>> >> have another patch for that.
>> >
>> > And here is a possible solution to the under-estimation issue. The patch
>> > would have to go at the end of this set.
>> >
>> > ---8<---
>> >
>> > From 5bc918995c6c589b833ba1f189a8b92fa22202ae Mon Sep 17 00:00:00 2001
>> > From: Morten Rasmussen <morten.rasmussen@arm.com>
>> > Date: Wed, 17 Aug 2016 15:30:43 +0100
>> > Subject: [PATCH] sched/fair: Track peak per-entity utilization
>> >
>> > When using PELT (per-entity load tracking) utilization to place tasks at
>> > wake-up using the decayed utilization (due to sleep) leads to
>> > under-estimation of true utilization of the task. This could mean
>> > putting the task on a cpu with less available capacity than is actually
>> > needed. This issue can be mitigated by using 'peak' utilization instead
>> > of the decayed utilization for placement decisions, e.g. at task
>> > wake-up.
>> >
>> > The 'peak' utilization metric, util_peak, tracks util_avg when the task
>> > is running and retains its previous value while the task is
>> > blocked/waiting on the rq. It is instantly updated to track util_avg
>> > again as soon as the task running again.
>>
>> Maybe this will lead to disable wake affine due to a spike peak value
>> for a low average load task.
>
> I assume you are referring to using task_util_peak() instead of
> task_util() in wake_cap()?

Yes.

>
> The peak value should never exceed the util_avg accumulated by the task
> last time it ran. So any spike has to be caused by the task accumulating
> more utilization last time it ran. We don't know if it a spike or a more

I see.

> permanent change in behaviour, so we have to guess. So a spike on an
> asymmetric system could cause us to disable wake affine in some
> circumstances (either prev_cpu or waker cpu has to be low compute
> capacity) for the following wake-up.
>
> SMP should be unaffected as we should bail out on the previous
> condition.

Why capacity_orig instead of capacity since it is checked each time
wakeup and maybe rt class/interrupt have already occupied many cpu
utilization.

>
> The counter-example is task with a fairly long busy period and a much
> longer period (cycle). Its util_avg might have decayed away since the
> last activation so it appears very small at wake-up and we end up
> putting it on a low capacity cpu every time even though it keeps the cpu
> busy for a long time every time it wakes up.

Agreed, that's the reason for under-estimation concern.

>
> Did that answer your question?

Yeah, thanks for the clarification.

Regards,
Wanpeng Li