Re: [PATCH v6 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group

From: Vincent Guittot <vincent.guittot@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Yuyang Du <yuyang.du@intel.com>, Ingo Molnar <mingo@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Benjamin Segall <bsegall@google.com>,
	Paul Turner <pjt@google.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Matt Fleming <matt@codeblueprint.co.uk>
Subject: Re: [PATCH v6 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group
Date: Thu, 16 Jun 2016 23:21:55 +0200	[thread overview]
Message-ID: <CAKfTPtDopSCBsooZdO=nO_EGja9MvHu5de0zFPb9d+aebPE16g@mail.gmail.com> (raw)
In-Reply-To: <20160616200711.GK30154@twins.programming.kicks-ass.net>

On 16 June 2016 at 22:07, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jun 16, 2016 at 09:00:57PM +0200, Vincent Guittot wrote:
>> On 16 June 2016 at 20:51, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, Jun 16, 2016 at 06:30:13PM +0200, Vincent Guittot wrote:
>> >> With patch [1] for the init of cfs_rq side, all use cases will be
>> >> covered regarding the issue linked to a last_update_time set to 0 at
>> >> init
>> >> [1] https://lkml.org/lkml/2016/5/30/508
>> >
>> > Aah, wait, now I get it :-)
>> >
>> > Still, we should put cfs_rq_clock_task(cfs_rq) in it, not 1. And since
>> > we now acquire rq->lock on init this should well be possible. Lemme sort
>> > that.
>>
>> yes with the rq->lock we can use cfs_rq_clock_task which is make more
>> sense than 1.
>> But the delta can be still significant between the creation of the
>> task group and the 1st task that will be attach to the cfs_rq
>
> Ah, I think I've spotted more fail.
>
> And I think you're right, it doesn't matter, in fact, 0 should have been
> fine too!
>
> enqueue_entity()
>   enqueue_entity_load_avg()
>     update_cfs_rq_load_avg()
>       now = clock()
>       __update_load_avg(&cfs_rq->avg)
>         cfs_rq->avg.last_load_update = now
>         // ages 0 load/util for: now - 0
>     if (migrated)
>       attach_entity_load_avg()
>         se->avg.last_load_update = cfs_rq->avg.last_load_update; // now != 0
>
> So I don't see how it can end up being attached again.

In fact it has already been attached during the sched_move_task. The
sequence for the 1st task that is attached to a cfs_rq is :

sched_move_task()
  task_move_group_fair()
    detach_task_cfs_rq()
    set_task_rq()
    attach_task_cfs_rq()
      attach_entity_load_avg()
        se->avg.last_load_update = cfs_rq->avg.last_load_update == 0

Then we enqueue the task but se->avg.last_load_update is still 0 so
migrated is set and we attach the task one more time

>
>
> Now I do see another problem, and that is that we're forgetting to
> update_cfs_rq_load_avg() in all detach_entity_load_avg() callers and all
> but the enqueue caller of attach_entity_load_avg().

Yes, calling update_cfs_rq_load_avg before all attach_entity_load_avg
will ensure that cfs_rq->avg.last_load_update will never be 0 when
attaching a task
And doing that before the detach will ensure that we move an up-to-date load

Your proposal below looks good to me

>
> Something like the below.
>
>
>
> ---
>  kernel/sched/fair.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f75930bdd326..5d8fa135bbc5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8349,6 +8349,7 @@ static void detach_task_cfs_rq(struct task_struct *p)
>  {
>         struct sched_entity *se = &p->se;
>         struct cfs_rq *cfs_rq = cfs_rq_of(se);
> +       u64 now = cfs_rq_clock_task(cfs_rq);
>
>         if (!vruntime_normalized(p)) {
>                 /*
> @@ -8360,6 +8361,7 @@ static void detach_task_cfs_rq(struct task_struct *p)
>         }
>
>         /* Catch up with the cfs_rq and remove our load when we leave */
> +       update_cfs_rq_load_avg(now, cfs_rq, false);
>         detach_entity_load_avg(cfs_rq, se);
>  }
>
> @@ -8367,6 +8369,7 @@ static void attach_task_cfs_rq(struct task_struct *p)
>  {
>         struct sched_entity *se = &p->se;
>         struct cfs_rq *cfs_rq = cfs_rq_of(se);
> +       u64 now = cfs_rq_clock_task(cfs_rq);
>
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>         /*
> @@ -8377,6 +8380,7 @@ static void attach_task_cfs_rq(struct task_struct *p)
>  #endif
>
>         /* Synchronize task with its cfs_rq */
> +       update_cfs_rq_load_avg(now, cfs_rq, false);
>         attach_entity_load_avg(cfs_rq, se);
>
>         if (!vruntime_normalized(p))