All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Yuyang Du <yuyang.du@intel.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Linaro Kernel Mailman List <linaro-kernel@lists.linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Paul Turner <pjt@google.com>,
	Benjamin Segall <bsegall@google.com>
Subject: Re: [PATCH 7/7 v3] sched: fix wrong utilization accounting when switching to fair class
Date: Tue, 20 Sep 2016 15:06:04 +0200	[thread overview]
Message-ID: <CAKfTPtDWkoxkMGbZaRk9YpyZWYzmGF1C2q4xM4iNxOn3=FAPpw@mail.gmail.com> (raw)
In-Reply-To: <20160920115458.GX5016@twins.programming.kicks-ass.net>

On 20 September 2016 at 13:54, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Sep 16, 2016 at 04:23:16PM +0200, Vincent Guittot wrote:
>> On 16 September 2016 at 14:16, Peter Zijlstra <peterz@infradead.org> wrote:
>
>> >> > Also, the normalize comment in dequeue_entity() worries me, 'someone'
>> >> > didn't update that when he moved update_min_vruntime() around.
>> >
>> > I now worry more, so we do:
>> >
>> >         dequeue_task := dequeue_task_fair (p == current)
>> >           dequeue_entity
>> >             update_curr()
>> >               update_min_vruntime()
>> >             vruntime -= min_vruntime
>> >             update_min_vruntime()
>> >               // use cfs_rq->curr, which we just normalized !
>>
>> yes but does it really change the cfs_rq->min_vruntime in this case ?
>
> So let me see; it does:
>
>         vruntime = cfs_rq->min_vruntime;
>
>         if (curr) // true
>           vruntime = curr->vruntime; // == vruntime - min_vruntime
>
>         if (leftmost) // possible
>           if (curr) // true
>             vruntime = min_vruntime(vruntime, se->vruntime);
>               if (se->vruntime - (curr->vruntime - min_vruntime)) < 0 // false
>
>         min_vruntime = max_vruntime(min_vruntime, vruntime);
>           if ((curr->vruntime - min_vruntime) - min_vruntime) > 0)
>
>
> The problem is that double subtraction of min_vruntime can wrap.
> The thing is, min_vruntime is the 0-point in our modular space, it
> normalizes vruntime (ideally min_vruntime would be our 0-lag point,
> resulting in vruntime - min_vruntime being the lag).
>
> The moment min_vruntime grows past S64_MAX/2 -2*min_vruntime wraps into

fair enough

> positive space again and the test above becomes true and we'll select
> the normalized @curr vruntime as new min_vruntime and weird stuff will
> happen.
>
>
> Also, even it things magically worked out, its still very icky to mix
> the normalized vruntime into things.

I agree

>
>> >         put_prev_task := put_prev_task_fair
>> >           put_prev_entity
>> >             cfs_rq->curr = NULL;
>> >
>> >
>> > Now the point of the latter update_min_vruntime() is to advance
>> > min_vruntime when the task we removed was the one holding it back.
>> >
>> > However, it means that if we do dequeue+enqueue, we're further in the
>> > future (ie. we get penalized).
>> >
>> > So I'm inclined to simply remove the (2nd) update_min_vruntime() call.
>> > But as said above, my brain isn't co-operating much today.
>
> OK, so not sure we can actually remove it, we do want it to move
> min_vruntime forward (sometimes). We just don't want it to do so when
> DEQUEUE_SAVE -- we want to get back where we left off, nor do we want to
> muck about with touching normalized values.
>
> Another fun corner case is DEQUEUE_SLEEP; in that case we do not
> normalize, but we still want advance min_vruntime if this was the one
> holding it back.
>
> I ended up with the below, but I'm not sure I like it much. Let me prod
> a wee bit more to see if there's not something else we can do.
>
> Google has this patch-set replacing min_vruntime with an actual global
> 0-lag, which greatly simplifies things. If only they'd post it sometime
> :/ /me prods pjt and ben with a sharp stick :-)
>
> ---
>  kernel/sched/fair.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 986c10c25176..77566a340cbf 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -462,17 +462,23 @@ static inline int entity_before(struct sched_entity *a,
>
>  static void update_min_vruntime(struct cfs_rq *cfs_rq)
>  {
> +       struct sched_entity *curr = cfs_rq->curr;
> +
>         u64 vruntime = cfs_rq->min_vruntime;
>
> -       if (cfs_rq->curr)
> -               vruntime = cfs_rq->curr->vruntime;
> +       if (curr) {
> +               if (curr->on_rq)
> +                       vruntime = curr->vruntime;
> +               else
> +                       curr = NULL;
> +       }
>
>         if (cfs_rq->rb_leftmost) {
>                 struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost,
>                                                    struct sched_entity,
>                                                    run_node);
>
> -               if (!cfs_rq->curr)
> +               if (!curr)
>                         vruntime = se->vruntime;
>                 else
>                         vruntime = min_vruntime(vruntime, se->vruntime);
> @@ -3483,8 +3489,16 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>         /* return excess runtime on last dequeue */
>         return_cfs_rq_runtime(cfs_rq);
>
> -       update_min_vruntime(cfs_rq);
>         update_cfs_shares(cfs_rq);
> +
> +       /*
> +        * Now advance min_vruntime if @se was the entity holding it back,
> +        * except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
> +        * put back on, and if we advance min_vruntime, we'll be placed back
> +        * further than we started -- ie. we'll be penalized.
> +        */
> +       if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE)
> +               update_min_vruntime(cfs_rq);
>  }
>
>  /*

  reply	other threads:[~2016-09-20 13:06 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-12  7:47 [PATCH 0/7 v3] sched: reflect sched_entity move into task_group's load Vincent Guittot
2016-09-12  7:47 ` [PATCH 1/7 v3] sched: factorize attach entity Vincent Guittot
2016-09-12  7:47 ` [PATCH 2/7 v3] sched: fix hierarchical order in rq->leaf_cfs_rq_list Vincent Guittot
2016-09-21 10:14   ` Dietmar Eggemann
2016-09-21 12:34     ` Vincent Guittot
2016-09-21 17:25       ` Dietmar Eggemann
2016-09-21 18:02         ` Vincent Guittot
2016-09-12  7:47 ` [PATCH 3/7 v3] sched: factorize PELT update Vincent Guittot
2016-09-15 13:09   ` Peter Zijlstra
2016-09-15 13:30     ` Vincent Guittot
2016-09-12  7:47 ` [PATCH 4/7 v3] sched: propagate load during synchronous attach/detach Vincent Guittot
2016-09-15 12:55   ` Peter Zijlstra
2016-09-15 13:01     ` Vincent Guittot
2016-09-15 12:59   ` Peter Zijlstra
2016-09-15 13:11     ` Vincent Guittot
2016-09-15 13:11   ` Dietmar Eggemann
2016-09-15 14:31     ` Vincent Guittot
2016-09-15 17:20       ` Dietmar Eggemann
2016-09-15 15:14     ` Peter Zijlstra
2016-09-15 17:36       ` Dietmar Eggemann
2016-09-15 17:54         ` Peter Zijlstra
2016-09-15 14:43   ` Peter Zijlstra
2016-09-15 14:51     ` Vincent Guittot
2016-09-19  3:19   ` Wanpeng Li
2016-09-12  7:47 ` [PATCH 5/7 v3] sched: propagate asynchrous detach Vincent Guittot
2016-09-12  7:47 ` [PATCH 6/7 v3] sched: fix task group initialization Vincent Guittot
2016-09-12  7:47 ` [PATCH 7/7 v3] sched: fix wrong utilization accounting when switching to fair class Vincent Guittot
2016-09-15 13:18   ` Peter Zijlstra
2016-09-15 15:36     ` Vincent Guittot
2016-09-16 12:16       ` Peter Zijlstra
2016-09-16 14:23         ` Vincent Guittot
2016-09-20 11:54           ` Peter Zijlstra
2016-09-20 13:06             ` Vincent Guittot [this message]
2016-09-22 12:25               ` Peter Zijlstra
2016-09-26 14:53                 ` Peter Zijlstra
2016-09-20 16:59             ` bsegall
2016-09-22  8:33               ` Peter Zijlstra
2016-09-22 17:10                 ` bsegall
2016-09-16 10:51   ` Peter Zijlstra
2016-09-16 12:45     ` Vincent Guittot
2016-09-30 12:01   ` [tip:sched/core] sched/core: Fix incorrect " tip-bot for Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtDWkoxkMGbZaRk9YpyZWYzmGF1C2q4xM4iNxOn3=FAPpw@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=Morten.Rasmussen@arm.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.