From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1162349AbdD0HBT (ORCPT <rfc822;w@1wt.eu>);
        Thu, 27 Apr 2017 03:01:19 -0400
Received: from mail-oi0-f43.google.com ([209.85.218.43]:35428 "EHLO
        mail-oi0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S933426AbdD0HBJ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 27 Apr 2017 03:01:09 -0400
MIME-Version: 1.0
In-Reply-To: <20170426224020.GB11348@wtj.duckdns.org>
References: <20170424201344.GA14169@wtj.duckdns.org> <20170424201415.GB14169@wtj.duckdns.org>
 <CAKfTPtCvQwmA2awnHWLpjhMK6JKp7deopxGOoyZaQKp+O1Am1w@mail.gmail.com>
 <20170425181219.GA15593@wtj.duckdns.org> <20170426165123.GA17921@linaro.org> <20170426224020.GB11348@wtj.duckdns.org>
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: Thu, 27 Apr 2017 09:00:48 +0200
Message-ID: <CAKfTPtA8-Z=+ibzKgYXqf2CLemjB0h=TYSB_5Z4YJvAGTJPPHg@mail.gmail.com>
Subject: Re: [PATCH 1/2] sched/fair: Fix how load gets propagated from cfs_rq
 to its sched_entity
To: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
        Chris Mason <clm@fb.com>, kernel-team@fb.com
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 27 April 2017 at 00:40, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Wed, Apr 26, 2017 at 06:51:23PM +0200, Vincent Guittot wrote:
>> > It's not temporary.  The weight of a group is its shares, which is its
>> > load fraction of the configured weight of the group.  Assuming UP, if
>> > you configure a group to the weight of 1024 and have any task running
>> > full-tilt in it, the group will converge to the load of 1024.  The
>> > problem is that the propagation logic is currently doing something
>> > completely different and temporarily push down the load whenever it
>> > triggers.
>>
>> Ok, I see your point and agree that there is an issue when propagating
>> load_avg of a task group which has tasks with lower weight than the share
>> but your proposal has got issue because it uses runnable_load_avg instead
>> of load_avg and this makes propagation of loadavg_avg incorrect, something
>> like below which keeps using load_avg solve the problem
>>
>> +     if (gcfs_rq->load.weight) {
>> +             long shares = scale_load_down(calc_cfs_shares(gcfs_rq, gcfs_rq->tg));
>> +
>> +             load = min(gcfs_rq->avg.load_avg *
>> +                        shares / scale_load_down(gcfs_rq->load.weight), shares);
>>
>> I have run schbench with the change above on v4.11-rc8 and latency are ok
>
> Hmm... so, I'll test this but this wouldn't solve the problem of
> root's runnable_load_avg being out of sync with the approximate sum of
> all task loads, which is the cause of the latencies that I'm seeing.
>
> Are you saying that with the above change, you're not seeing the
> higher latency issue that you reported in the other reply?

yes I don't have any latency regression like v4.11-rc8 with the above
change that uses load_avg but fix the propagation for of a task with a
lower weight than task group share.

>
> Thanks.
>
> --
> tejun