From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1756126AbcIUMeZ (ORCPT <rfc822;w@1wt.eu>);
        Wed, 21 Sep 2016 08:34:25 -0400
Received: from mail-lf0-f47.google.com ([209.85.215.47]:36463 "EHLO
        mail-lf0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1753942AbcIUMeX (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Sep 2016 08:34:23 -0400
MIME-Version: 1.0
In-Reply-To: <a8b6d188-060c-9af9-177b-55059738e99c@arm.com>
References: <1473666472-13749-1-git-send-email-vincent.guittot@linaro.org>
 <1473666472-13749-3-git-send-email-vincent.guittot@linaro.org> <a8b6d188-060c-9af9-177b-55059738e99c@arm.com>
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: Wed, 21 Sep 2016 14:34:01 +0200
Message-ID: <CAKfTPtAVYgRx_xjQeUHS-bebjXKGBJv77kWdq9BwCTf7GPUbtw@mail.gmail.com>
Subject: Re: [PATCH 2/7 v3] sched: fix hierarchical order in rq->leaf_cfs_rq_list
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Yuyang Du <yuyang.du@intel.com>,
        Morten Rasmussen <Morten.Rasmussen@arm.com>,
        Linaro Kernel Mailman List <linaro-kernel@lists.linaro.org>,
        Paul Turner <pjt@google.com>, Benjamin Segall <bsegall@google.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi Dietmar,

On 21 September 2016 at 12:14, Dietmar Eggemann
<dietmar.eggemann@arm.com> wrote:
> Hi Vincent,
>
> On 12/09/16 08:47, Vincent Guittot wrote:
>> Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that
>> a child will always be called before its parent.
>>
>> The hierarchical order in shares update list has been introduced by
>> commit 67e86250f8ea ("sched: Introduce hierarchal order on shares update list")
>>
>> With the current implementation a child can be still put after its parent.
>>
>> Lets take the example of
>>        root
>>         \
>>          b
>>          /\
>>          c d*
>>            |
>>            e*
>>
>> with root -> b -> c already enqueued but not d -> e so the leaf_cfs_rq_list
>> looks like: head -> c -> b -> root -> tail
>>
>> The branch d -> e will be added the first time that they are enqueued,
>> starting with e then d.
>>
>> When e is added, its parents is not already on the list so e is put at the
>> tail : head -> c -> b -> root -> e -> tail
>>
>> Then, d is added at the head because its parent is already on the list:
>> head -> d -> c -> b -> root -> e -> tail
>>
>> e is not placed at the right position and will be called the last whereas
>> it should be called at the beginning.
>>
>> Because it follows the bottom-up enqueue sequence, we are sure that we
>> will finished to add either a cfs_rq without parent or a cfs_rq with a parent
>> that is already on the list. We can use this event to detect when we have
>> finished to add a new branch. For the others, whose parents are not already
>> added, we have to ensure that they will be added after their children that
>> have just been inserted the steps before, and after any potential parents that
>> are already in the list. The easiest way is to put the cfs_rq just after the
>> last inserted one and to keep track of it untl the branch is fully added.
>
> [...]
>
> I tested it with a multi-level task group hierarchy and it does the
> right thing for this testcase (task t runs alternately on Cpu0 and Cpu1
> in tg w/ tg_css_id=6) in a multi-level task group hierarchy (tg_css_id=2,4,6).

Thanks for testing

>
> I wonder if this patch is related to the comment "TODO: fix up out-of-order
> children on enqueue." in update_shares_cpu() of commit 82958366cfea
> ("sched: Replace update_shares weight distribution with per-entity
> computation")?

I had not that comment in mind when i have done this patch only to
ensure that the propagation of load and utilization in children will
be done before testing their parents
That being said, I agree that the comment probably points to the same
ordering issue than what this patch solves

>
> I guess in the meantime we lost the functionality to remove a cfs_rq from the
> leaf_cfs_rq list once there are no se's enqueued on it anymore. If e.g. t migrates
> away from Cpu1, all the cfs_rq's of the task hierarchy (for tg_css_id=2,4,6)
> owned by Cpu1 stay in the leaf_cfs_rq list.
>
> Shouldn't we reintegrate it? Following patch goes on top of this patch:

I see one problem: Once a cfs_rq has been removed , it will not be
periodically updated anymore until the next enqueue. A sleeping task
is only attached but not enqueued when it moves into a cfs_rq so its
load is added to cfs_rq's load which have to be periodically
updated/decayed. So adding a cfs_rq to the list only during an enqueue
is not enough in this case.

Then, only the highest child level of task group will be removed
because cfs_rq->nr_running will be > 0 as soon as a child task group
is created and enqueued into a task group. Or you should use
cfs.h_nr_running instead i don't know all implications

Regards,
Vincent

>
> -- >8 --
>
> From: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Date: Tue, 20 Sep 2016 17:30:09 +0100
> Subject: [PATCH] Re-integrate list_del_leaf_cfs_rq() into
>  update_blocked_averages()
>
> There is no functionality anymore to delete a cfs_rq from the leaf_cfs_rq
> list in case there are no se's enqueued on it.
>
> The functionality had been initially introduced by commit 82958366cfea
> ("sched: Replace update_shares weight distribution with per-entity
> computation") but has been taken out by commit 9d89c257dfb9 ("sched/fair:
> Rewrite runnable load and utilization average tracking").
>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/fair.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index dd530359ef84..951c83337e2b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6584,8 +6584,12 @@ static void update_blocked_averages(int cpu)
>                         update_tg_load_avg(cfs_rq, 0);
>
>                 /* Propagate pending load changes to the parent */
> -               if (cfs_rq->tg->se[cpu])
> +               if (cfs_rq->tg->se[cpu]) {
>                         update_load_avg(cfs_rq->tg->se[cpu], 0, 0);
> +
> +                       if (!cfs_rq->nr_running)
> +                               list_del_leaf_cfs_rq(cfs_rq);
> +               }
>         }
>         raw_spin_unlock_irqrestore(&rq->lock, flags);
>  }
> --
> 1.9.1