From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751704AbbEFJxQ (ORCPT ); Wed, 6 May 2015 05:53:16 -0400 Received: from eu-smtp-delivery-143.mimecast.com ([207.82.80.143]:25530 "EHLO eu-smtp-delivery-143.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750942AbbEFJxN convert rfc822-to-8bit (ORCPT ); Wed, 6 May 2015 05:53:13 -0400 Message-ID: <5549E3B6.2060709@arm.com> Date: Wed, 06 May 2015 10:49:42 +0100 From: Dietmar Eggemann User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: "pang.xunlei@zte.com.cn" CC: Juri Lelli , "linux-kernel@vger.kernel.org" , "linux-kernel-owner@vger.kernel.org" , "mingo@redhat.com" , Morten Rasmussen , "mturquette@linaro.org" , "nico@linaro.org" , Peter Zijlstra , "preeti@linux.vnet.ibm.com" , "rjw@rjwysocki.net" , "vincent.guittot@linaro.org" , "yuyang.du@intel.com" Subject: Re: [RFCv3 PATCH 12/48] sched: Make usage tracking cpu scale-invariant References: <1423074685-6336-1-git-send-email-morten.rasmussen@arm.com> <1423074685-6336-13-git-send-email-morten.rasmussen@arm.com> <20150323144620.GI23123@twins.programming.kicks-ass.net> <5510674D.7050202@arm.com> In-Reply-To: X-OriginalArrivalTime: 06 May 2015 09:49:34.0090 (UTC) FILETIME=[F4F606A0:01D087E1] X-MC-Unique: jLp4oEbJTOSzeqin-D2ZwA-3 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/05/15 07:27, pang.xunlei@zte.com.cn wrote: > Hi Dietmar, > > Dietmar Eggemann wrote 2015-03-24 AM 03:19:41: >> >> Re: [RFCv3 PATCH 12/48] sched: Make usage tracking cpu scale-invariant [...] >> In the previous patch-set https://lkml.org/lkml/2014/12/2/332we >> cpu-scaled both (sched_avg::runnable_avg_sum (load) and >> sched_avg::running_avg_sum (utilization)) but during the review Vincent >> pointed out that a cpu-scaled invariant load signal messes up >> load-balancing based on s[dg]_lb_stats::avg_load in overload scenarios. >> >> avg_load = load/capacity and load can't be simply replaced here by >> 'cpu-scale invariant load' (which is load*capacity). > > I can't see why it shouldn't. > > For "avg_load = load/capacity", "avg_load" stands for how busy the cpu > works, > it is actually a value relative to its capacity. The system is seen > balanced > for the case that a task runs on a 512-capacity cpu contributing 50% usage, > and two the same tasks run on the 1024-capacity cpu contributing 50% usage. > "capacity" in this formula contains uarch capacity, "load" in this formula > must be an absolute real load, not relative. > > But with current kernel implementation, "load" computed without this patch > is a relative value. For example, one task (1024 weight) runs on a 1024 > capacity CPU, it gets 256 load contribution(25% on this CPU). When it runs > on a 512 capacity CPU, it will get the 512 load contribution(50% on ths > CPU). > See, currently runnable "load" is relative, so "avg_load" is actually wrong > and its value equals that of "load". So I think the runnable load should be > made cpu scale-invariant as well. > > Please point me out if I was wrong. Cpu-scaled load leads to wrong lb decisions in overload scenarios: (1) Overload example taken from email thread between Vincent and Morten: https://lkml.org/lkml/2014/12/30/114 7 always running tasks, 4 on cluster 0, 3 on cluster 1: cluster 0 cluster 1 capacity 1024 (2*512) 1024 (1*1024) load 4096 3072 scale_load 2048 3072 Simply using cpu-scaled load in the existing lb code would declare cluster 1 busier than cluster 0, although the compute capacity budget for one task is higher on cluster 1 (1024/3 = 341) than on cluster 0 (2*512/4 = 256). (2) A non-overload example does not show this problem: 7 12.5% (scaled to 1024) tasks, 4 on cluster 0, 3 on cluster 1: cluster 0 cluster 1 capacity 1024 (2*512) 1024 (1*1024) load 1024 384 scale_load 512 384 Here cluster 0 is busier taking load or cpu-scaled load. We should continue to use avg_load based on load (maybe calculated out of scaled load once introduced?) for overload scenarios and use scale_load for non-overload scenarios. Since this hasn't been implemented yet, we got rid of cpu-scaled load in this RFC. [...]