From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751704AbbEFJxQ (ORCPT <rfc822;w@1wt.eu>);
	Wed, 6 May 2015 05:53:16 -0400
Received: from eu-smtp-delivery-143.mimecast.com ([207.82.80.143]:25530 "EHLO
	eu-smtp-delivery-143.mimecast.com" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1750942AbbEFJxN convert rfc822-to-8bit
	(ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Wed, 6 May 2015 05:53:13 -0400
Message-ID: <5549E3B6.2060709@arm.com>
Date: Wed, 06 May 2015 10:49:42 +0100
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: "pang.xunlei@zte.com.cn" <pang.xunlei@zte.com.cn>
CC: Juri Lelli <Juri.Lelli@arm.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "linux-kernel-owner@vger.kernel.org" 
	<linux-kernel-owner@vger.kernel.org>,
        "mingo@redhat.com" <mingo@redhat.com>,
        Morten Rasmussen <Morten.Rasmussen@arm.com>,
        "mturquette@linaro.org" <mturquette@linaro.org>,
        "nico@linaro.org" <nico@linaro.org>,
        Peter Zijlstra <peterz@infradead.org>,
        "preeti@linux.vnet.ibm.com" <preeti@linux.vnet.ibm.com>,
        "rjw@rjwysocki.net" <rjw@rjwysocki.net>,
        "vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
        "yuyang.du@intel.com" <yuyang.du@intel.com>
Subject: Re: [RFCv3 PATCH 12/48] sched: Make usage tracking cpu scale-invariant
References: <1423074685-6336-1-git-send-email-morten.rasmussen@arm.com> <1423074685-6336-13-git-send-email-morten.rasmussen@arm.com> <20150323144620.GI23123@twins.programming.kicks-ass.net> <5510674D.7050202@arm.com> <OF8A3E3617.0D4400A5-ON48257E3A.001B38D9-48257E3A.002379A4@zte.com.cn>
In-Reply-To: <OF8A3E3617.0D4400A5-ON48257E3A.001B38D9-48257E3A.002379A4@zte.com.cn>
X-OriginalArrivalTime: 06 May 2015 09:49:34.0090 (UTC) FILETIME=[F4F606A0:01D087E1]
X-MC-Unique: jLp4oEbJTOSzeqin-D2ZwA-3
Content-Type: text/plain; charset=WINDOWS-1252
Content-Transfer-Encoding: 8BIT
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 03/05/15 07:27, pang.xunlei@zte.com.cn wrote:
> Hi Dietmar,
> 
> Dietmar Eggemann <dietmar.eggemann@arm.com>  wrote 2015-03-24 AM 03:19:41:
>>
>> Re: [RFCv3 PATCH 12/48] sched: Make usage tracking cpu scale-invariant

[...]

>> In the previous patch-set https://lkml.org/lkml/2014/12/2/332we
>> cpu-scaled both (sched_avg::runnable_avg_sum (load) and
>> sched_avg::running_avg_sum (utilization)) but during the review Vincent
>> pointed out that a cpu-scaled invariant load signal messes up
>> load-balancing based on s[dg]_lb_stats::avg_load in overload scenarios.
>>
>> avg_load = load/capacity and load can't be simply replaced here by
>> 'cpu-scale invariant load' (which is load*capacity).
> 
> I can't see why it shouldn't.
> 
> For "avg_load = load/capacity", "avg_load" stands for how busy the cpu
> works,
> it is actually a value relative to its capacity. The system is seen
> balanced
> for the case that a task runs on a 512-capacity cpu contributing 50% usage,
> and two the same tasks run on the 1024-capacity cpu contributing 50% usage.
> "capacity" in this formula contains uarch capacity, "load" in this formula
> must be an absolute real load, not relative.
> 
> But with current kernel implementation, "load" computed without this patch
> is a relative value. For example, one task (1024 weight) runs on a 1024
> capacity CPU, it gets 256 load contribution(25% on this CPU). When it runs
> on a 512 capacity CPU, it will get the 512 load contribution(50% on ths
> CPU).
> See, currently runnable "load" is relative, so "avg_load" is actually wrong
> and its value equals that of "load". So I think the runnable load should be
> made cpu scale-invariant as well.
> 
> Please point me out if I was wrong.

Cpu-scaled load leads to wrong lb decisions in overload scenarios:

(1) Overload example taken from email thread between Vincent and Morten:
    https://lkml.org/lkml/2014/12/30/114

7 always running tasks, 4 on cluster 0, 3 on cluster 1:

		cluster 0	cluster 1
capacity	1024 (2*512)	1024 (1*1024)
load		4096		3072
scale_load	2048		3072

Simply using cpu-scaled load in the existing lb code would declare
cluster 1 busier than cluster 0, although the compute capacity budget
for one task is higher on cluster 1 (1024/3 = 341) than on cluster 0
(2*512/4 = 256).

(2) A non-overload example does not show this problem:

7 12.5% (scaled to 1024) tasks, 4 on cluster 0, 3 on cluster 1:

		cluster 0	cluster 1
capacity	1024 (2*512)	1024 (1*1024)
load		1024		384
scale_load	512		384

Here cluster 0 is busier taking load or cpu-scaled load.

We should continue to use avg_load based on load (maybe calculated out
of scaled load once introduced?) for overload scenarios and use
scale_load for non-overload scenarios. Since this hasn't been
implemented yet, we got rid of cpu-scaled load in
this RFC.

[...]