All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wanpeng Li <kernellwp@gmail.com>
To: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	stable@vger.kernel.org,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
Date: Wed, 8 Mar 2017 16:47:40 +0800	[thread overview]
Message-ID: <CANRm+CxXov_A+ufuQNOmTkfm8vkw2Gogb=J7iN07O=ukokGWug@mail.gmail.com> (raw)
In-Reply-To: <20170217120731.11868-2-matt@codeblueprint.co.uk>

2017-02-17 20:07 GMT+08:00 Matt Fleming <matt@codeblueprint.co.uk>:
> If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
> the pending sample window time on exit, setting the next update not
> one window into the future, but two.
>
> This situation on exiting NO_HZ is described by:
>
>   this_rq->calc_load_update < jiffies < calc_load_update
>
> In this scenario, what we should be doing is:
>
>   this_rq->calc_load_update = calc_load_update               [ next window ]
>
> But what we actually do is:
>
>   this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]
>
> This has the effect of delaying load average updates for potentially
> up to ~9seconds.
>
> This can result in huge spikes in the load average values due to
> per-cpu uninterruptible task counts being out of sync when accumulated
> across all CPUs.
>
> It's safe to update the per-cpu active count if we wake between sample
> windows because any load that we left in 'calc_load_idle' will have
> been zero'd when the idle load was folded in calc_global_load().
>
> This issue is easy to reproduce before,
>
>   commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
>
> just by forking short-lived process pipelines built from ps(1) and
> grep(1) in a loop. I'm unable to reproduce the spikes after that
> commit, but the bug still seems to be present from code review.
>
> Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
> Cc: Morten Rasmussen <morten.rasmussen@arm.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: <stable@vger.kernel.org> # v3.5+
> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>

> ---
>  kernel/sched/loadavg.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> Changes in v2:
>
>  - Folded in Peter's suggestion for how to fix this.
>
>  - Tried to clairfy the changelog based on feedback from Peter and
>    Frederic
>
> diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
> index a2d6eb71f06b..ec91fcc09bfe 100644
> --- a/kernel/sched/loadavg.c
> +++ b/kernel/sched/loadavg.c
> @@ -201,8 +201,9 @@ void calc_load_exit_idle(void)
>         struct rq *this_rq = this_rq();
>
>         /*
> -        * If we're still before the sample window, we're done.
> +        * If we're still before the pending sample window, we're done.
>          */
> +       this_rq->calc_load_update = calc_load_update;
>         if (time_before(jiffies, this_rq->calc_load_update))
>                 return;
>
> @@ -211,7 +212,6 @@ void calc_load_exit_idle(void)
>          * accounted through the nohz accounting, so skip the entire deal and
>          * sync up for the next window.
>          */
> -       this_rq->calc_load_update = calc_load_update;
>         if (time_before(jiffies, this_rq->calc_load_update + 10))
>                 this_rq->calc_load_update += LOAD_FREQ;
>  }
> --
> 2.10.0
>

  parent reply	other threads:[~2017-03-08  9:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-17 12:07 [PATCH v2 0/2] sched/loadavg: Fix loadavg spikes and sprinkle {READ,WRITE}_ONCE() Matt Fleming
2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
2017-02-22 15:18   ` Frederic Weisbecker
2017-03-08  8:47   ` Wanpeng Li [this message]
2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming
2017-02-17 12:07 ` [PATCH v2 2/2] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window Matt Fleming
2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANRm+CxXov_A+ufuQNOmTkfm8vkw2Gogb=J7iN07O=ukokGWug@mail.gmail.com' \
    --to=kernellwp@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=umgwanakikbuti@gmail.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.