All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Yuyang Du <yuyang.du@intel.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Linaro Kernel Mailman List <linaro-kernel@lists.linaro.org>,
	Paul Turner <pjt@google.com>,
	Benjamin Segall <bsegall@google.com>
Subject: Re: [PATCH 2/7 v3] sched: fix hierarchical order in rq->leaf_cfs_rq_list
Date: Wed, 21 Sep 2016 14:34:01 +0200	[thread overview]
Message-ID: <CAKfTPtAVYgRx_xjQeUHS-bebjXKGBJv77kWdq9BwCTf7GPUbtw@mail.gmail.com> (raw)
In-Reply-To: <a8b6d188-060c-9af9-177b-55059738e99c@arm.com>

Hi Dietmar,

On 21 September 2016 at 12:14, Dietmar Eggemann
<dietmar.eggemann@arm.com> wrote:
> Hi Vincent,
>
> On 12/09/16 08:47, Vincent Guittot wrote:
>> Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that
>> a child will always be called before its parent.
>>
>> The hierarchical order in shares update list has been introduced by
>> commit 67e86250f8ea ("sched: Introduce hierarchal order on shares update list")
>>
>> With the current implementation a child can be still put after its parent.
>>
>> Lets take the example of
>>        root
>>         \
>>          b
>>          /\
>>          c d*
>>            |
>>            e*
>>
>> with root -> b -> c already enqueued but not d -> e so the leaf_cfs_rq_list
>> looks like: head -> c -> b -> root -> tail
>>
>> The branch d -> e will be added the first time that they are enqueued,
>> starting with e then d.
>>
>> When e is added, its parents is not already on the list so e is put at the
>> tail : head -> c -> b -> root -> e -> tail
>>
>> Then, d is added at the head because its parent is already on the list:
>> head -> d -> c -> b -> root -> e -> tail
>>
>> e is not placed at the right position and will be called the last whereas
>> it should be called at the beginning.
>>
>> Because it follows the bottom-up enqueue sequence, we are sure that we
>> will finished to add either a cfs_rq without parent or a cfs_rq with a parent
>> that is already on the list. We can use this event to detect when we have
>> finished to add a new branch. For the others, whose parents are not already
>> added, we have to ensure that they will be added after their children that
>> have just been inserted the steps before, and after any potential parents that
>> are already in the list. The easiest way is to put the cfs_rq just after the
>> last inserted one and to keep track of it untl the branch is fully added.
>
> [...]
>
> I tested it with a multi-level task group hierarchy and it does the
> right thing for this testcase (task t runs alternately on Cpu0 and Cpu1
> in tg w/ tg_css_id=6) in a multi-level task group hierarchy (tg_css_id=2,4,6).

Thanks for testing

>
> I wonder if this patch is related to the comment "TODO: fix up out-of-order
> children on enqueue." in update_shares_cpu() of commit 82958366cfea
> ("sched: Replace update_shares weight distribution with per-entity
> computation")?

I had not that comment in mind when i have done this patch only to
ensure that the propagation of load and utilization in children will
be done before testing their parents
That being said, I agree that the comment probably points to the same
ordering issue than what this patch solves

>
> I guess in the meantime we lost the functionality to remove a cfs_rq from the
> leaf_cfs_rq list once there are no se's enqueued on it anymore. If e.g. t migrates
> away from Cpu1, all the cfs_rq's of the task hierarchy (for tg_css_id=2,4,6)
> owned by Cpu1 stay in the leaf_cfs_rq list.
>
> Shouldn't we reintegrate it? Following patch goes on top of this patch:

I see one problem: Once a cfs_rq has been removed , it will not be
periodically updated anymore until the next enqueue. A sleeping task
is only attached but not enqueued when it moves into a cfs_rq so its
load is added to cfs_rq's load which have to be periodically
updated/decayed. So adding a cfs_rq to the list only during an enqueue
is not enough in this case.

Then, only the highest child level of task group will be removed
because cfs_rq->nr_running will be > 0 as soon as a child task group
is created and enqueued into a task group. Or you should use
cfs.h_nr_running instead i don't know all implications

Regards,
Vincent

>
> -- >8 --
>
> From: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Date: Tue, 20 Sep 2016 17:30:09 +0100
> Subject: [PATCH] Re-integrate list_del_leaf_cfs_rq() into
>  update_blocked_averages()
>
> There is no functionality anymore to delete a cfs_rq from the leaf_cfs_rq
> list in case there are no se's enqueued on it.
>
> The functionality had been initially introduced by commit 82958366cfea
> ("sched: Replace update_shares weight distribution with per-entity
> computation") but has been taken out by commit 9d89c257dfb9 ("sched/fair:
> Rewrite runnable load and utilization average tracking").
>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/fair.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index dd530359ef84..951c83337e2b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6584,8 +6584,12 @@ static void update_blocked_averages(int cpu)
>                         update_tg_load_avg(cfs_rq, 0);
>
>                 /* Propagate pending load changes to the parent */
> -               if (cfs_rq->tg->se[cpu])
> +               if (cfs_rq->tg->se[cpu]) {
>                         update_load_avg(cfs_rq->tg->se[cpu], 0, 0);
> +
> +                       if (!cfs_rq->nr_running)
> +                               list_del_leaf_cfs_rq(cfs_rq);
> +               }
>         }
>         raw_spin_unlock_irqrestore(&rq->lock, flags);
>  }
> --
> 1.9.1

  reply	other threads:[~2016-09-21 12:34 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-12  7:47 [PATCH 0/7 v3] sched: reflect sched_entity move into task_group's load Vincent Guittot
2016-09-12  7:47 ` [PATCH 1/7 v3] sched: factorize attach entity Vincent Guittot
2016-09-12  7:47 ` [PATCH 2/7 v3] sched: fix hierarchical order in rq->leaf_cfs_rq_list Vincent Guittot
2016-09-21 10:14   ` Dietmar Eggemann
2016-09-21 12:34     ` Vincent Guittot [this message]
2016-09-21 17:25       ` Dietmar Eggemann
2016-09-21 18:02         ` Vincent Guittot
2016-09-12  7:47 ` [PATCH 3/7 v3] sched: factorize PELT update Vincent Guittot
2016-09-15 13:09   ` Peter Zijlstra
2016-09-15 13:30     ` Vincent Guittot
2016-09-12  7:47 ` [PATCH 4/7 v3] sched: propagate load during synchronous attach/detach Vincent Guittot
2016-09-15 12:55   ` Peter Zijlstra
2016-09-15 13:01     ` Vincent Guittot
2016-09-15 12:59   ` Peter Zijlstra
2016-09-15 13:11     ` Vincent Guittot
2016-09-15 13:11   ` Dietmar Eggemann
2016-09-15 14:31     ` Vincent Guittot
2016-09-15 17:20       ` Dietmar Eggemann
2016-09-15 15:14     ` Peter Zijlstra
2016-09-15 17:36       ` Dietmar Eggemann
2016-09-15 17:54         ` Peter Zijlstra
2016-09-15 14:43   ` Peter Zijlstra
2016-09-15 14:51     ` Vincent Guittot
2016-09-19  3:19   ` Wanpeng Li
2016-09-12  7:47 ` [PATCH 5/7 v3] sched: propagate asynchrous detach Vincent Guittot
2016-09-12  7:47 ` [PATCH 6/7 v3] sched: fix task group initialization Vincent Guittot
2016-09-12  7:47 ` [PATCH 7/7 v3] sched: fix wrong utilization accounting when switching to fair class Vincent Guittot
2016-09-15 13:18   ` Peter Zijlstra
2016-09-15 15:36     ` Vincent Guittot
2016-09-16 12:16       ` Peter Zijlstra
2016-09-16 14:23         ` Vincent Guittot
2016-09-20 11:54           ` Peter Zijlstra
2016-09-20 13:06             ` Vincent Guittot
2016-09-22 12:25               ` Peter Zijlstra
2016-09-26 14:53                 ` Peter Zijlstra
2016-09-20 16:59             ` bsegall
2016-09-22  8:33               ` Peter Zijlstra
2016-09-22 17:10                 ` bsegall
2016-09-16 10:51   ` Peter Zijlstra
2016-09-16 12:45     ` Vincent Guittot
2016-09-30 12:01   ` [tip:sched/core] sched/core: Fix incorrect " tip-bot for Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfTPtAVYgRx_xjQeUHS-bebjXKGBJv77kWdq9BwCTf7GPUbtw@mail.gmail.com \
    --to=vincent.guittot@linaro.org \
    --cc=Morten.Rasmussen@arm.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.