All of lore.kernel.org
 help / color / mirror / Atom feed
From: tip-bot for Matt Fleming <tipbot@zytor.com>
To: linux-tip-commits@vger.kernel.org
Cc: umgwanakikbuti@gmail.com, morten.rasmussen@arm.com,
	hpa@zytor.com, peterz@infradead.org, vincent.guittot@linaro.org,
	tglx@linutronix.de, matt@codeblueprint.co.uk,
	linux-kernel@vger.kernel.org, fweisbec@gmail.com,
	mingo@kernel.org, efault@gmx.de, torvalds@linux-foundation.org
Subject: [tip:sched/core] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
Date: Thu, 16 Mar 2017 04:13:20 -0700	[thread overview]
Message-ID: <tip-6e5f32f7a43f45ee55c401c0b9585eb01f9629a8@git.kernel.org> (raw)
In-Reply-To: <20170217120731.11868-2-matt@codeblueprint.co.uk>

Commit-ID:  6e5f32f7a43f45ee55c401c0b9585eb01f9629a8
Gitweb:     http://git.kernel.org/tip/6e5f32f7a43f45ee55c401c0b9585eb01f9629a8
Author:     Matt Fleming <matt@codeblueprint.co.uk>
AuthorDate: Fri, 17 Feb 2017 12:07:30 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 16 Mar 2017 09:21:00 +0100

sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting

If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
the pending sample window time on exit, setting the next update not
one window into the future, but two.

This situation on exiting NO_HZ is described by:

  this_rq->calc_load_update < jiffies < calc_load_update

In this scenario, what we should be doing is:

  this_rq->calc_load_update = calc_load_update		     [ next window ]

But what we actually do is:

  this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]

This has the effect of delaying load average updates for potentially
up to ~9seconds.

This can result in huge spikes in the load average values due to
per-cpu uninterruptible task counts being out of sync when accumulated
across all CPUs.

It's safe to update the per-cpu active count if we wake between sample
windows because any load that we left in 'calc_load_idle' will have
been zero'd when the idle load was folded in calc_global_load().

This issue is easy to reproduce before,

  commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")

just by forking short-lived process pipelines built from ps(1) and
grep(1) in a loop. I'm unable to reproduce the spikes after that
commit, but the bug still seems to be present from code review.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
Link: http://lkml.kernel.org/r/20170217120731.11868-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/loadavg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index 7296b73..3a55f3f 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -202,8 +202,9 @@ void calc_load_exit_idle(void)
 	struct rq *this_rq = this_rq();
 
 	/*
-	 * If we're still before the sample window, we're done.
+	 * If we're still before the pending sample window, we're done.
 	 */
+	this_rq->calc_load_update = calc_load_update;
 	if (time_before(jiffies, this_rq->calc_load_update))
 		return;
 
@@ -212,7 +213,6 @@ void calc_load_exit_idle(void)
 	 * accounted through the nohz accounting, so skip the entire deal and
 	 * sync up for the next window.
 	 */
-	this_rq->calc_load_update = calc_load_update;
 	if (time_before(jiffies, this_rq->calc_load_update + 10))
 		this_rq->calc_load_update += LOAD_FREQ;
 }

  parent reply	other threads:[~2017-03-16 11:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-17 12:07 [PATCH v2 0/2] sched/loadavg: Fix loadavg spikes and sprinkle {READ,WRITE}_ONCE() Matt Fleming
2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
2017-02-22 15:18   ` Frederic Weisbecker
2017-03-08  8:47   ` Wanpeng Li
2017-03-16 11:13   ` tip-bot for Matt Fleming [this message]
2017-02-17 12:07 ` [PATCH v2 2/2] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window Matt Fleming
2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tip-6e5f32f7a43f45ee55c401c0b9585eb01f9629a8@git.kernel.org \
    --to=tipbot@zytor.com \
    --cc=efault@gmx.de \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=umgwanakikbuti@gmail.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.