From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S932598AbdDZQvh (ORCPT <rfc822;w@1wt.eu>);
        Wed, 26 Apr 2017 12:51:37 -0400
Received: from mail-wm0-f42.google.com ([74.125.82.42]:33815 "EHLO
        mail-wm0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S932511AbdDZQv2 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 26 Apr 2017 12:51:28 -0400
Date: Wed, 26 Apr 2017 18:51:23 +0200
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
        Chris Mason <clm@fb.com>, kernel-team@fb.com
Subject: Re: [PATCH 1/2] sched/fair: Fix how load gets propagated from cfs_rq
 to its sched_entity
Message-ID: <20170426165123.GA17921@linaro.org>
References: <20170424201344.GA14169@wtj.duckdns.org>
 <20170424201415.GB14169@wtj.duckdns.org>
 <CAKfTPtCvQwmA2awnHWLpjhMK6JKp7deopxGOoyZaQKp+O1Am1w@mail.gmail.com>
 <20170425181219.GA15593@wtj.duckdns.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20170425181219.GA15593@wtj.duckdns.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Le Tuesday 25 Apr 2017 à 11:12:19 (-0700), Tejun Heo a écrit :
> Hello,
> 
> On Tue, Apr 25, 2017 at 10:35:53AM +0200, Vincent Guittot wrote:
> > not sure to catch your example:
> > a task TA with a load_avg = 1 is the only task in a task group GB so
> > the cfs_rq load_avg = 1 too and the group_entity of this cfs_rq has
> > got a weight of 1024 (I use 10bits format for readability) which is
> > the total share of task group GB
> 
> The group_entity (the sched_entity corresponding to the cfs_rq) should
> behave as if it's a task which has the weight of 1024.
> 
> > Are you saying that the group_entity load_avg should be around 1024 and not 1 ?
> 
> Yes.
> 
> > I would say it depends of TA weight. I assume that TA weight is the
> > default value (1024) as you don't specify any value in your example
> 
> Please consider the following configuration, where GA is a group
> entity, and TA and TB are tasks.
> 
> 	ROOT - GA (weight 1024) - TA (weight 1)
> 	     \ GB (weight 1   ) - TB (weight 1)
> 
> Let's say both TA and TB are running full-tilt.  Now let's take out GA
> and GB.
> 
> 	ROOT - TA1 (weight 1024)
> 	     \ TB1 (weight 1   )
> 
> GA should behave the same as TA1 and GB TB1.  GA's load should match
> TA1's, and GA's load when seen from ROOT's cfs_rq has nothing to do
> with how much total absolute weight it has inside it.
> 
> 	ROOT - GA2 (weight 1024) - TA2 (weight 1   )
> 	     \ GB2 (weight 1   ) - TB2 (weight 1024)
> 
> If TA2 and TB2 are constantly running, GA2 and GB2's in ROOT's cfs_rq
> should match GA and GB's, respectively.

Yes I agree

> 
> > If TA directly runs at parent level, its sched_entity would have a
> > load_avg of 1 so why the group entity load_avg should be 1024 ? it
> 
> Because then the hierarchical weight configuration doesn't mean
> anything.
> 
> > will just temporally show the cfs_rq more loaded than it is really and
> > at the end the group entity load_avg will go back to 1
> 
> It's not temporary.  The weight of a group is its shares, which is its
> load fraction of the configured weight of the group.  Assuming UP, if
> you configure a group to the weight of 1024 and have any task running
> full-tilt in it, the group will converge to the load of 1024.  The
> problem is that the propagation logic is currently doing something
> completely different and temporarily push down the load whenever it
> triggers.

Ok, I see your point and agree that there is an issue when propagating
load_avg of a task group which has tasks with lower weight than the share
but your proposal has got issue because it uses runnable_load_avg instead
of load_avg and this makes propagation of loadavg_avg incorrect, something
like below which keeps using load_avg solve the problem

+	if (gcfs_rq->load.weight) {
+		long shares = scale_load_down(calc_cfs_shares(gcfs_rq, gcfs_rq->tg));
+
+		load = min(gcfs_rq->avg.load_avg *
+			   shares / scale_load_down(gcfs_rq->load.weight), shares);

I have run schbench with the change above on v4.11-rc8 and latency are ok

Thanks
Vincent
>
> 
> Thanks.
> 
> -- 
> tejun