[PATCH v2 0/2] sched/loadavg: Fix loadavg spikes and sprinkle {READ,WRITE}

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] sched/loadavg: Fix loadavg spikes and sprinkle {READ,WRITE}_ONCE()
@ 2017-02-17 12:07 Matt Fleming
  2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
  2017-02-17 12:07 ` [PATCH v2 2/2] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window Matt Fleming
  0 siblings, 2 replies; 7+ messages in thread
From: Matt Fleming @ 2017-02-17 12:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Frederic Weisbecker
  Cc: linux-kernel, Matt Fleming

Peter, Frederic, 

Here's a v2 of the patch series to fix the loadavg spikes I'm seeing
on a v3.12 based kernel, caused by delaying load sampling if a sample
period was crossed while in NO_HZ idle.

I tried to make the changelog for PATCH 1 clearer this time around by
incorporating suggestions from both of you. Please let me know if it's
still unclear.

PATCH 2 addresses Peter's comment:

  "Irrespective the above though; should we not make this:

   +       this_rq->calc_load_update = READ_ONCE(calc_load_update);

   because if for some reason we do a double load of calc_load_update and
   see two different values, weird stuff could happen.

   And because, on general principle, a READ_ONCE() should be paired with a
   WRITE_ONCE(), that should be done too I suppose."

The v1 of the patch can be found here:

  https://lkml.kernel.org/r/20170208132924.3038-1-matt@codeblueprint.co.uk

Matt Fleming (2):
  sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
  sched/loadavg: Use {READ,WRITE}_ONCE() for sample window

 kernel/sched/loadavg.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

-- 
2.10.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
  2017-02-17 12:07 [PATCH v2 0/2] sched/loadavg: Fix loadavg spikes and sprinkle {READ,WRITE}_ONCE() Matt Fleming
@ 2017-02-17 12:07 ` Matt Fleming
  2017-02-22 15:18   ` Frederic Weisbecker
                     ` (2 more replies)
  2017-02-17 12:07 ` [PATCH v2 2/2] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window Matt Fleming
  1 sibling, 3 replies; 7+ messages in thread
From: Matt Fleming @ 2017-02-17 12:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Frederic Weisbecker
  Cc: linux-kernel, Matt Fleming, Mike Galbraith, Morten Rasmussen,
	stable, Vincent Guittot

If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
the pending sample window time on exit, setting the next update not
one window into the future, but two.

This situation on exiting NO_HZ is described by:

  this_rq->calc_load_update < jiffies < calc_load_update

In this scenario, what we should be doing is:

  this_rq->calc_load_update = calc_load_update		     [ next window ]

But what we actually do is:

  this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]

This has the effect of delaying load average updates for potentially
up to ~9seconds.

This can result in huge spikes in the load average values due to
per-cpu uninterruptible task counts being out of sync when accumulated
across all CPUs.

It's safe to update the per-cpu active count if we wake between sample
windows because any load that we left in 'calc_load_idle' will have
been zero'd when the idle load was folded in calc_global_load().

This issue is easy to reproduce before,

  commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")

just by forking short-lived process pipelines built from ps(1) and
grep(1) in a loop. I'm unable to reproduce the spikes after that
commit, but the bug still seems to be present from code review.

Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@vger.kernel.org> # v3.5+
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/loadavg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Changes in v2:

 - Folded in Peter's suggestion for how to fix this.

 - Tried to clairfy the changelog based on feedback from Peter and
   Frederic

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index a2d6eb71f06b..ec91fcc09bfe 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -201,8 +201,9 @@ void calc_load_exit_idle(void)
 	struct rq *this_rq = this_rq();
 
 	/*
-	 * If we're still before the sample window, we're done.
+	 * If we're still before the pending sample window, we're done.
 	 */
+	this_rq->calc_load_update = calc_load_update;
 	if (time_before(jiffies, this_rq->calc_load_update))
 		return;
 
@@ -211,7 +212,6 @@ void calc_load_exit_idle(void)
 	 * accounted through the nohz accounting, so skip the entire deal and
 	 * sync up for the next window.
 	 */
-	this_rq->calc_load_update = calc_load_update;
 	if (time_before(jiffies, this_rq->calc_load_update + 10))
 		this_rq->calc_load_update += LOAD_FREQ;
 }
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/2] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window
  2017-02-17 12:07 [PATCH v2 0/2] sched/loadavg: Fix loadavg spikes and sprinkle {READ,WRITE}_ONCE() Matt Fleming
  2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
@ 2017-02-17 12:07 ` Matt Fleming
  2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming
  1 sibling, 1 reply; 7+ messages in thread
From: Matt Fleming @ 2017-02-17 12:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Frederic Weisbecker
  Cc: linux-kernel, Matt Fleming, Mike Galbraith, Morten Rasmussen,
	Vincent Guittot

'calc_load_update' is accessed without any kind of locking and there's
a clear assumption in the code that only a single value is read or
written.

Make this explicit by using READ_ONCE() and WRITE_ONCE(), and avoid
unintentionally seeing multiple values, or having the load/stores
split.

Technically the loads in calc_global_*() don't require this since
those are the only functions that update 'calc_load_update', but I've
added the READ_ONCE() for consistency.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/loadavg.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index ec91fcc09bfe..3f96a7d014ce 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -168,7 +168,7 @@ static inline int calc_load_write_idx(void)
 	 * If the folding window started, make sure we start writing in the
 	 * next idle-delta.
 	 */
-	if (!time_before(jiffies, calc_load_update))
+	if (!time_before(jiffies, READ_ONCE(calc_load_update)))
 		idx++;
 
 	return idx & 1;
@@ -203,7 +203,7 @@ void calc_load_exit_idle(void)
 	/*
 	 * If we're still before the pending sample window, we're done.
 	 */
-	this_rq->calc_load_update = calc_load_update;
+	this_rq->calc_load_update = READ_ONCE(calc_load_update);
 	if (time_before(jiffies, this_rq->calc_load_update))
 		return;
 
@@ -307,13 +307,15 @@ calc_load_n(unsigned long load, unsigned long exp,
  */
 static void calc_global_nohz(void)
 {
+	unsigned long sample_window;
 	long delta, active, n;
 
-	if (!time_before(jiffies, calc_load_update + 10)) {
+	sample_window = READ_ONCE(calc_load_update);
+	if (!time_before(jiffies, sample_window + 10)) {
 		/*
 		 * Catch-up, fold however many we are behind still
 		 */
-		delta = jiffies - calc_load_update - 10;
+		delta = jiffies - sample_window - 10;
 		n = 1 + (delta / LOAD_FREQ);
 
 		active = atomic_long_read(&calc_load_tasks);
@@ -323,7 +325,7 @@ static void calc_global_nohz(void)
 		avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
 		avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
 
-		calc_load_update += n * LOAD_FREQ;
+		WRITE_ONCE(calc_load_update, sample_window + n * LOAD_FREQ);
 	}
 
 	/*
@@ -351,9 +353,11 @@ static inline void calc_global_nohz(void) { }
  */
 void calc_global_load(unsigned long ticks)
 {
+	unsigned long sample_window;
 	long active, delta;
 
-	if (time_before(jiffies, calc_load_update + 10))
+	sample_window = READ_ONCE(calc_load_update);
+	if (time_before(jiffies, sample_window + 10))
 		return;
 
 	/*
@@ -370,7 +374,7 @@ void calc_global_load(unsigned long ticks)
 	avenrun[1] = calc_load(avenrun[1], EXP_5, active);
 	avenrun[2] = calc_load(avenrun[2], EXP_15, active);
 
-	calc_load_update += LOAD_FREQ;
+	WRITE_ONCE(calc_load_update, sample_window + LOAD_FREQ);
 
 	/*
 	 * In case we idled for multiple LOAD_FREQ intervals, catch up in bulk.
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
  2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
@ 2017-02-22 15:18   ` Frederic Weisbecker
  2017-03-08  8:47   ` Wanpeng Li
  2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming
  2 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2017-02-22 15:18 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Mike Galbraith,
	Morten Rasmussen, stable, Vincent Guittot

On Fri, Feb 17, 2017 at 12:07:30PM +0000, Matt Fleming wrote:
> If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
> the pending sample window time on exit, setting the next update not
> one window into the future, but two.
> 
> This situation on exiting NO_HZ is described by:
> 
>   this_rq->calc_load_update < jiffies < calc_load_update
> 
> In this scenario, what we should be doing is:
> 
>   this_rq->calc_load_update = calc_load_update		     [ next window ]
> 
> But what we actually do is:
> 
>   this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]
> 
> This has the effect of delaying load average updates for potentially
> up to ~9seconds.
> 
> This can result in huge spikes in the load average values due to
> per-cpu uninterruptible task counts being out of sync when accumulated
> across all CPUs.
> 
> It's safe to update the per-cpu active count if we wake between sample
> windows because any load that we left in 'calc_load_idle' will have
> been zero'd when the idle load was folded in calc_global_load().
> 
> This issue is easy to reproduce before,
> 
>   commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
> 
> just by forking short-lived process pipelines built from ps(1) and
> grep(1) in a loop. I'm unable to reproduce the spikes after that
> commit, but the bug still seems to be present from code review.
> 
> Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
> Cc: Morten Rasmussen <morten.rasmussen@arm.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

Thanks it's much clearer now!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
  2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
  2017-02-22 15:18   ` Frederic Weisbecker
@ 2017-03-08  8:47   ` Wanpeng Li
  2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming
  2 siblings, 0 replies; 7+ messages in thread
From: Wanpeng Li @ 2017-03-08  8:47 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, linux-kernel,
	Mike Galbraith, Morten Rasmussen, stable, Vincent Guittot

2017-02-17 20:07 GMT+08:00 Matt Fleming <matt@codeblueprint.co.uk>:
> If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
> the pending sample window time on exit, setting the next update not
> one window into the future, but two.
>
> This situation on exiting NO_HZ is described by:
>
>   this_rq->calc_load_update < jiffies < calc_load_update
>
> In this scenario, what we should be doing is:
>
>   this_rq->calc_load_update = calc_load_update               [ next window ]
>
> But what we actually do is:
>
>   this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]
>
> This has the effect of delaying load average updates for potentially
> up to ~9seconds.
>
> This can result in huge spikes in the load average values due to
> per-cpu uninterruptible task counts being out of sync when accumulated
> across all CPUs.
>
> It's safe to update the per-cpu active count if we wake between sample
> windows because any load that we left in 'calc_load_idle' will have
> been zero'd when the idle load was folded in calc_global_load().
>
> This issue is easy to reproduce before,
>
>   commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
>
> just by forking short-lived process pipelines built from ps(1) and
> grep(1) in a loop. I'm unable to reproduce the spikes after that
> commit, but the bug still seems to be present from code review.
>
> Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
> Cc: Morten Rasmussen <morten.rasmussen@arm.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: <stable@vger.kernel.org> # v3.5+
> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>

> ---
>  kernel/sched/loadavg.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> Changes in v2:
>
>  - Folded in Peter's suggestion for how to fix this.
>
>  - Tried to clairfy the changelog based on feedback from Peter and
>    Frederic
>
> diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
> index a2d6eb71f06b..ec91fcc09bfe 100644
> --- a/kernel/sched/loadavg.c
> +++ b/kernel/sched/loadavg.c
> @@ -201,8 +201,9 @@ void calc_load_exit_idle(void)
>         struct rq *this_rq = this_rq();
>
>         /*
> -        * If we're still before the sample window, we're done.
> +        * If we're still before the pending sample window, we're done.
>          */
> +       this_rq->calc_load_update = calc_load_update;
>         if (time_before(jiffies, this_rq->calc_load_update))
>                 return;
>
> @@ -211,7 +212,6 @@ void calc_load_exit_idle(void)
>          * accounted through the nohz accounting, so skip the entire deal and
>          * sync up for the next window.
>          */
> -       this_rq->calc_load_update = calc_load_update;
>         if (time_before(jiffies, this_rq->calc_load_update + 10))
>                 this_rq->calc_load_update += LOAD_FREQ;
>  }
> --
> 2.10.0
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:sched/core] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
  2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
  2017-02-22 15:18   ` Frederic Weisbecker
  2017-03-08  8:47   ` Wanpeng Li
@ 2017-03-16 11:13   ` tip-bot for Matt Fleming
  2 siblings, 0 replies; 7+ messages in thread
From: tip-bot for Matt Fleming @ 2017-03-16 11:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: umgwanakikbuti, morten.rasmussen, hpa, peterz, vincent.guittot,
	tglx, matt, linux-kernel, fweisbec, mingo, efault, torvalds

Commit-ID:  6e5f32f7a43f45ee55c401c0b9585eb01f9629a8
Gitweb:     http://git.kernel.org/tip/6e5f32f7a43f45ee55c401c0b9585eb01f9629a8
Author:     Matt Fleming <matt@codeblueprint.co.uk>
AuthorDate: Fri, 17 Feb 2017 12:07:30 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 16 Mar 2017 09:21:00 +0100

sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting

If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
the pending sample window time on exit, setting the next update not
one window into the future, but two.

This situation on exiting NO_HZ is described by:

  this_rq->calc_load_update < jiffies < calc_load_update

In this scenario, what we should be doing is:

  this_rq->calc_load_update = calc_load_update		     [ next window ]

But what we actually do is:

  this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]

This has the effect of delaying load average updates for potentially
up to ~9seconds.

This can result in huge spikes in the load average values due to
per-cpu uninterruptible task counts being out of sync when accumulated
across all CPUs.

It's safe to update the per-cpu active count if we wake between sample
windows because any load that we left in 'calc_load_idle' will have
been zero'd when the idle load was folded in calc_global_load().

This issue is easy to reproduce before,

  commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")

just by forking short-lived process pipelines built from ps(1) and
grep(1) in a loop. I'm unable to reproduce the spikes after that
commit, but the bug still seems to be present from code review.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
Link: http://lkml.kernel.org/r/20170217120731.11868-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/loadavg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index 7296b73..3a55f3f 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -202,8 +202,9 @@ void calc_load_exit_idle(void)
 	struct rq *this_rq = this_rq();
 
 	/*
-	 * If we're still before the sample window, we're done.
+	 * If we're still before the pending sample window, we're done.
 	 */
+	this_rq->calc_load_update = calc_load_update;
 	if (time_before(jiffies, this_rq->calc_load_update))
 		return;
 
@@ -212,7 +213,6 @@ void calc_load_exit_idle(void)
 	 * accounted through the nohz accounting, so skip the entire deal and
 	 * sync up for the next window.
 	 */
-	this_rq->calc_load_update = calc_load_update;
 	if (time_before(jiffies, this_rq->calc_load_update + 10))
 		this_rq->calc_load_update += LOAD_FREQ;
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [tip:sched/core] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window
  2017-02-17 12:07 ` [PATCH v2 2/2] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window Matt Fleming
@ 2017-03-16 11:13   ` tip-bot for Matt Fleming
  0 siblings, 0 replies; 7+ messages in thread
From: tip-bot for Matt Fleming @ 2017-03-16 11:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, fweisbec, torvalds, linux-kernel, hpa, morten.rasmussen,
	umgwanakikbuti, vincent.guittot, efault, mingo, matt, peterz

Commit-ID:  caeb5882979bc6f3c8766fcf59c6269b38f521bc
Gitweb:     http://git.kernel.org/tip/caeb5882979bc6f3c8766fcf59c6269b38f521bc
Author:     Matt Fleming <matt@codeblueprint.co.uk>
AuthorDate: Fri, 17 Feb 2017 12:07:31 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 16 Mar 2017 09:21:01 +0100

sched/loadavg: Use {READ,WRITE}_ONCE() for sample window

'calc_load_update' is accessed without any kind of locking and there's
a clear assumption in the code that only a single value is read or
written.

Make this explicit by using READ_ONCE() and WRITE_ONCE(), and avoid
unintentionally seeing multiple values, or having the load/stores
split.

Technically the loads in calc_global_*() don't require this since
those are the only functions that update 'calc_load_update', but I've
added the READ_ONCE() for consistency.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/20170217120731.11868-3-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/loadavg.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index 3a55f3f..f15fb2b 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -169,7 +169,7 @@ static inline int calc_load_write_idx(void)
 	 * If the folding window started, make sure we start writing in the
 	 * next idle-delta.
 	 */
-	if (!time_before(jiffies, calc_load_update))
+	if (!time_before(jiffies, READ_ONCE(calc_load_update)))
 		idx++;
 
 	return idx & 1;
@@ -204,7 +204,7 @@ void calc_load_exit_idle(void)
 	/*
 	 * If we're still before the pending sample window, we're done.
 	 */
-	this_rq->calc_load_update = calc_load_update;
+	this_rq->calc_load_update = READ_ONCE(calc_load_update);
 	if (time_before(jiffies, this_rq->calc_load_update))
 		return;
 
@@ -308,13 +308,15 @@ calc_load_n(unsigned long load, unsigned long exp,
  */
 static void calc_global_nohz(void)
 {
+	unsigned long sample_window;
 	long delta, active, n;
 
-	if (!time_before(jiffies, calc_load_update + 10)) {
+	sample_window = READ_ONCE(calc_load_update);
+	if (!time_before(jiffies, sample_window + 10)) {
 		/*
 		 * Catch-up, fold however many we are behind still
 		 */
-		delta = jiffies - calc_load_update - 10;
+		delta = jiffies - sample_window - 10;
 		n = 1 + (delta / LOAD_FREQ);
 
 		active = atomic_long_read(&calc_load_tasks);
@@ -324,7 +326,7 @@ static void calc_global_nohz(void)
 		avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
 		avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
 
-		calc_load_update += n * LOAD_FREQ;
+		WRITE_ONCE(calc_load_update, sample_window + n * LOAD_FREQ);
 	}
 
 	/*
@@ -352,9 +354,11 @@ static inline void calc_global_nohz(void) { }
  */
 void calc_global_load(unsigned long ticks)
 {
+	unsigned long sample_window;
 	long active, delta;
 
-	if (time_before(jiffies, calc_load_update + 10))
+	sample_window = READ_ONCE(calc_load_update);
+	if (time_before(jiffies, sample_window + 10))
 		return;
 
 	/*
@@ -371,7 +375,7 @@ void calc_global_load(unsigned long ticks)
 	avenrun[1] = calc_load(avenrun[1], EXP_5, active);
 	avenrun[2] = calc_load(avenrun[2], EXP_15, active);
 
-	calc_load_update += LOAD_FREQ;
+	WRITE_ONCE(calc_load_update, sample_window + LOAD_FREQ);
 
 	/*
 	 * In case we idled for multiple LOAD_FREQ intervals, catch up in bulk.

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-16 11:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-17 12:07 [PATCH v2 0/2] sched/loadavg: Fix loadavg spikes and sprinkle {READ,WRITE}_ONCE() Matt Fleming
2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
2017-02-22 15:18   ` Frederic Weisbecker
2017-03-08  8:47   ` Wanpeng Li
2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming
2017-02-17 12:07 ` [PATCH v2 2/2] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window Matt Fleming
2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.