linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] sched/loadavg: Fix loadavg spikes and sprinkle {READ,WRITE}_ONCE()
@ 2017-02-17 12:07 Matt Fleming
  2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
  2017-02-17 12:07 ` [PATCH v2 2/2] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window Matt Fleming
  0 siblings, 2 replies; 7+ messages in thread
From: Matt Fleming @ 2017-02-17 12:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Frederic Weisbecker
  Cc: linux-kernel, Matt Fleming

Peter, Frederic, 

Here's a v2 of the patch series to fix the loadavg spikes I'm seeing
on a v3.12 based kernel, caused by delaying load sampling if a sample
period was crossed while in NO_HZ idle.

I tried to make the changelog for PATCH 1 clearer this time around by
incorporating suggestions from both of you. Please let me know if it's
still unclear.

PATCH 2 addresses Peter's comment:

  "Irrespective the above though; should we not make this:
   
   +       this_rq->calc_load_update = READ_ONCE(calc_load_update);
   
   because if for some reason we do a double load of calc_load_update and
   see two different values, weird stuff could happen.
   
   And because, on general principle, a READ_ONCE() should be paired with a
   WRITE_ONCE(), that should be done too I suppose."

The v1 of the patch can be found here:

  https://lkml.kernel.org/r/20170208132924.3038-1-matt@codeblueprint.co.uk

Matt Fleming (2):
  sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
  sched/loadavg: Use {READ,WRITE}_ONCE() for sample window

 kernel/sched/loadavg.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

-- 
2.10.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
  2017-02-17 12:07 [PATCH v2 0/2] sched/loadavg: Fix loadavg spikes and sprinkle {READ,WRITE}_ONCE() Matt Fleming
@ 2017-02-17 12:07 ` Matt Fleming
  2017-02-22 15:18   ` Frederic Weisbecker
                     ` (2 more replies)
  2017-02-17 12:07 ` [PATCH v2 2/2] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window Matt Fleming
  1 sibling, 3 replies; 7+ messages in thread
From: Matt Fleming @ 2017-02-17 12:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Frederic Weisbecker
  Cc: linux-kernel, Matt Fleming, Mike Galbraith, Morten Rasmussen,
	stable, Vincent Guittot

If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
the pending sample window time on exit, setting the next update not
one window into the future, but two.

This situation on exiting NO_HZ is described by:

  this_rq->calc_load_update < jiffies < calc_load_update

In this scenario, what we should be doing is:

  this_rq->calc_load_update = calc_load_update		     [ next window ]

But what we actually do is:

  this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]

This has the effect of delaying load average updates for potentially
up to ~9seconds.

This can result in huge spikes in the load average values due to
per-cpu uninterruptible task counts being out of sync when accumulated
across all CPUs.

It's safe to update the per-cpu active count if we wake between sample
windows because any load that we left in 'calc_load_idle' will have
been zero'd when the idle load was folded in calc_global_load().

This issue is easy to reproduce before,

  commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")

just by forking short-lived process pipelines built from ps(1) and
grep(1) in a loop. I'm unable to reproduce the spikes after that
commit, but the bug still seems to be present from code review.

Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@vger.kernel.org> # v3.5+
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/loadavg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Changes in v2:

 - Folded in Peter's suggestion for how to fix this.

 - Tried to clairfy the changelog based on feedback from Peter and
   Frederic

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index a2d6eb71f06b..ec91fcc09bfe 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -201,8 +201,9 @@ void calc_load_exit_idle(void)
 	struct rq *this_rq = this_rq();
 
 	/*
-	 * If we're still before the sample window, we're done.
+	 * If we're still before the pending sample window, we're done.
 	 */
+	this_rq->calc_load_update = calc_load_update;
 	if (time_before(jiffies, this_rq->calc_load_update))
 		return;
 
@@ -211,7 +212,6 @@ void calc_load_exit_idle(void)
 	 * accounted through the nohz accounting, so skip the entire deal and
 	 * sync up for the next window.
 	 */
-	this_rq->calc_load_update = calc_load_update;
 	if (time_before(jiffies, this_rq->calc_load_update + 10))
 		this_rq->calc_load_update += LOAD_FREQ;
 }
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/2] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window
  2017-02-17 12:07 [PATCH v2 0/2] sched/loadavg: Fix loadavg spikes and sprinkle {READ,WRITE}_ONCE() Matt Fleming
  2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
@ 2017-02-17 12:07 ` Matt Fleming
  2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming
  1 sibling, 1 reply; 7+ messages in thread
From: Matt Fleming @ 2017-02-17 12:07 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Frederic Weisbecker
  Cc: linux-kernel, Matt Fleming, Mike Galbraith, Morten Rasmussen,
	Vincent Guittot

'calc_load_update' is accessed without any kind of locking and there's
a clear assumption in the code that only a single value is read or
written.

Make this explicit by using READ_ONCE() and WRITE_ONCE(), and avoid
unintentionally seeing multiple values, or having the load/stores
split.

Technically the loads in calc_global_*() don't require this since
those are the only functions that update 'calc_load_update', but I've
added the READ_ONCE() for consistency.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/loadavg.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index ec91fcc09bfe..3f96a7d014ce 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -168,7 +168,7 @@ static inline int calc_load_write_idx(void)
 	 * If the folding window started, make sure we start writing in the
 	 * next idle-delta.
 	 */
-	if (!time_before(jiffies, calc_load_update))
+	if (!time_before(jiffies, READ_ONCE(calc_load_update)))
 		idx++;
 
 	return idx & 1;
@@ -203,7 +203,7 @@ void calc_load_exit_idle(void)
 	/*
 	 * If we're still before the pending sample window, we're done.
 	 */
-	this_rq->calc_load_update = calc_load_update;
+	this_rq->calc_load_update = READ_ONCE(calc_load_update);
 	if (time_before(jiffies, this_rq->calc_load_update))
 		return;
 
@@ -307,13 +307,15 @@ calc_load_n(unsigned long load, unsigned long exp,
  */
 static void calc_global_nohz(void)
 {
+	unsigned long sample_window;
 	long delta, active, n;
 
-	if (!time_before(jiffies, calc_load_update + 10)) {
+	sample_window = READ_ONCE(calc_load_update);
+	if (!time_before(jiffies, sample_window + 10)) {
 		/*
 		 * Catch-up, fold however many we are behind still
 		 */
-		delta = jiffies - calc_load_update - 10;
+		delta = jiffies - sample_window - 10;
 		n = 1 + (delta / LOAD_FREQ);
 
 		active = atomic_long_read(&calc_load_tasks);
@@ -323,7 +325,7 @@ static void calc_global_nohz(void)
 		avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
 		avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
 
-		calc_load_update += n * LOAD_FREQ;
+		WRITE_ONCE(calc_load_update, sample_window + n * LOAD_FREQ);
 	}
 
 	/*
@@ -351,9 +353,11 @@ static inline void calc_global_nohz(void) { }
  */
 void calc_global_load(unsigned long ticks)
 {
+	unsigned long sample_window;
 	long active, delta;
 
-	if (time_before(jiffies, calc_load_update + 10))
+	sample_window = READ_ONCE(calc_load_update);
+	if (time_before(jiffies, sample_window + 10))
 		return;
 
 	/*
@@ -370,7 +374,7 @@ void calc_global_load(unsigned long ticks)
 	avenrun[1] = calc_load(avenrun[1], EXP_5, active);
 	avenrun[2] = calc_load(avenrun[2], EXP_15, active);
 
-	calc_load_update += LOAD_FREQ;
+	WRITE_ONCE(calc_load_update, sample_window + LOAD_FREQ);
 
 	/*
 	 * In case we idled for multiple LOAD_FREQ intervals, catch up in bulk.
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
  2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
@ 2017-02-22 15:18   ` Frederic Weisbecker
  2017-03-08  8:47   ` Wanpeng Li
  2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming
  2 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2017-02-22 15:18 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Mike Galbraith,
	Morten Rasmussen, stable, Vincent Guittot

On Fri, Feb 17, 2017 at 12:07:30PM +0000, Matt Fleming wrote:
> If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
> the pending sample window time on exit, setting the next update not
> one window into the future, but two.
> 
> This situation on exiting NO_HZ is described by:
> 
>   this_rq->calc_load_update < jiffies < calc_load_update
> 
> In this scenario, what we should be doing is:
> 
>   this_rq->calc_load_update = calc_load_update		     [ next window ]
> 
> But what we actually do is:
> 
>   this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]
> 
> This has the effect of delaying load average updates for potentially
> up to ~9seconds.
> 
> This can result in huge spikes in the load average values due to
> per-cpu uninterruptible task counts being out of sync when accumulated
> across all CPUs.
> 
> It's safe to update the per-cpu active count if we wake between sample
> windows because any load that we left in 'calc_load_idle' will have
> been zero'd when the idle load was folded in calc_global_load().
> 
> This issue is easy to reproduce before,
> 
>   commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
> 
> just by forking short-lived process pipelines built from ps(1) and
> grep(1) in a loop. I'm unable to reproduce the spikes after that
> commit, but the bug still seems to be present from code review.
> 
> Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
> Cc: Morten Rasmussen <morten.rasmussen@arm.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

Thanks it's much clearer now!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
  2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
  2017-02-22 15:18   ` Frederic Weisbecker
@ 2017-03-08  8:47   ` Wanpeng Li
  2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming
  2 siblings, 0 replies; 7+ messages in thread
From: Wanpeng Li @ 2017-03-08  8:47 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, linux-kernel,
	Mike Galbraith, Morten Rasmussen, stable, Vincent Guittot

2017-02-17 20:07 GMT+08:00 Matt Fleming <matt@codeblueprint.co.uk>:
> If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
> the pending sample window time on exit, setting the next update not
> one window into the future, but two.
>
> This situation on exiting NO_HZ is described by:
>
>   this_rq->calc_load_update < jiffies < calc_load_update
>
> In this scenario, what we should be doing is:
>
>   this_rq->calc_load_update = calc_load_update               [ next window ]
>
> But what we actually do is:
>
>   this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]
>
> This has the effect of delaying load average updates for potentially
> up to ~9seconds.
>
> This can result in huge spikes in the load average values due to
> per-cpu uninterruptible task counts being out of sync when accumulated
> across all CPUs.
>
> It's safe to update the per-cpu active count if we wake between sample
> windows because any load that we left in 'calc_load_idle' will have
> been zero'd when the idle load was folded in calc_global_load().
>
> This issue is easy to reproduce before,
>
>   commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
>
> just by forking short-lived process pipelines built from ps(1) and
> grep(1) in a loop. I'm unable to reproduce the spikes after that
> commit, but the bug still seems to be present from code review.
>
> Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
> Cc: Morten Rasmussen <morten.rasmussen@arm.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: <stable@vger.kernel.org> # v3.5+
> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>

> ---
>  kernel/sched/loadavg.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> Changes in v2:
>
>  - Folded in Peter's suggestion for how to fix this.
>
>  - Tried to clairfy the changelog based on feedback from Peter and
>    Frederic
>
> diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
> index a2d6eb71f06b..ec91fcc09bfe 100644
> --- a/kernel/sched/loadavg.c
> +++ b/kernel/sched/loadavg.c
> @@ -201,8 +201,9 @@ void calc_load_exit_idle(void)
>         struct rq *this_rq = this_rq();
>
>         /*
> -        * If we're still before the sample window, we're done.
> +        * If we're still before the pending sample window, we're done.
>          */
> +       this_rq->calc_load_update = calc_load_update;
>         if (time_before(jiffies, this_rq->calc_load_update))
>                 return;
>
> @@ -211,7 +212,6 @@ void calc_load_exit_idle(void)
>          * accounted through the nohz accounting, so skip the entire deal and
>          * sync up for the next window.
>          */
> -       this_rq->calc_load_update = calc_load_update;
>         if (time_before(jiffies, this_rq->calc_load_update + 10))
>                 this_rq->calc_load_update += LOAD_FREQ;
>  }
> --
> 2.10.0
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:sched/core] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
  2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
  2017-02-22 15:18   ` Frederic Weisbecker
  2017-03-08  8:47   ` Wanpeng Li
@ 2017-03-16 11:13   ` tip-bot for Matt Fleming
  2 siblings, 0 replies; 7+ messages in thread
From: tip-bot for Matt Fleming @ 2017-03-16 11:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: umgwanakikbuti, morten.rasmussen, hpa, peterz, vincent.guittot,
	tglx, matt, linux-kernel, fweisbec, mingo, efault, torvalds

Commit-ID:  6e5f32f7a43f45ee55c401c0b9585eb01f9629a8
Gitweb:     http://git.kernel.org/tip/6e5f32f7a43f45ee55c401c0b9585eb01f9629a8
Author:     Matt Fleming <matt@codeblueprint.co.uk>
AuthorDate: Fri, 17 Feb 2017 12:07:30 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 16 Mar 2017 09:21:00 +0100

sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting

If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
the pending sample window time on exit, setting the next update not
one window into the future, but two.

This situation on exiting NO_HZ is described by:

  this_rq->calc_load_update < jiffies < calc_load_update

In this scenario, what we should be doing is:

  this_rq->calc_load_update = calc_load_update		     [ next window ]

But what we actually do is:

  this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]

This has the effect of delaying load average updates for potentially
up to ~9seconds.

This can result in huge spikes in the load average values due to
per-cpu uninterruptible task counts being out of sync when accumulated
across all CPUs.

It's safe to update the per-cpu active count if we wake between sample
windows because any load that we left in 'calc_load_idle' will have
been zero'd when the idle load was folded in calc_global_load().

This issue is easy to reproduce before,

  commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")

just by forking short-lived process pipelines built from ps(1) and
grep(1) in a loop. I'm unable to reproduce the spikes after that
commit, but the bug still seems to be present from code review.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
Link: http://lkml.kernel.org/r/20170217120731.11868-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/loadavg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index 7296b73..3a55f3f 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -202,8 +202,9 @@ void calc_load_exit_idle(void)
 	struct rq *this_rq = this_rq();
 
 	/*
-	 * If we're still before the sample window, we're done.
+	 * If we're still before the pending sample window, we're done.
 	 */
+	this_rq->calc_load_update = calc_load_update;
 	if (time_before(jiffies, this_rq->calc_load_update))
 		return;
 
@@ -212,7 +213,6 @@ void calc_load_exit_idle(void)
 	 * accounted through the nohz accounting, so skip the entire deal and
 	 * sync up for the next window.
 	 */
-	this_rq->calc_load_update = calc_load_update;
 	if (time_before(jiffies, this_rq->calc_load_update + 10))
 		this_rq->calc_load_update += LOAD_FREQ;
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [tip:sched/core] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window
  2017-02-17 12:07 ` [PATCH v2 2/2] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window Matt Fleming
@ 2017-03-16 11:13   ` tip-bot for Matt Fleming
  0 siblings, 0 replies; 7+ messages in thread
From: tip-bot for Matt Fleming @ 2017-03-16 11:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, fweisbec, torvalds, linux-kernel, hpa, morten.rasmussen,
	umgwanakikbuti, vincent.guittot, efault, mingo, matt, peterz

Commit-ID:  caeb5882979bc6f3c8766fcf59c6269b38f521bc
Gitweb:     http://git.kernel.org/tip/caeb5882979bc6f3c8766fcf59c6269b38f521bc
Author:     Matt Fleming <matt@codeblueprint.co.uk>
AuthorDate: Fri, 17 Feb 2017 12:07:31 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 16 Mar 2017 09:21:01 +0100

sched/loadavg: Use {READ,WRITE}_ONCE() for sample window

'calc_load_update' is accessed without any kind of locking and there's
a clear assumption in the code that only a single value is read or
written.

Make this explicit by using READ_ONCE() and WRITE_ONCE(), and avoid
unintentionally seeing multiple values, or having the load/stores
split.

Technically the loads in calc_global_*() don't require this since
those are the only functions that update 'calc_load_update', but I've
added the READ_ONCE() for consistency.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/20170217120731.11868-3-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/loadavg.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index 3a55f3f..f15fb2b 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -169,7 +169,7 @@ static inline int calc_load_write_idx(void)
 	 * If the folding window started, make sure we start writing in the
 	 * next idle-delta.
 	 */
-	if (!time_before(jiffies, calc_load_update))
+	if (!time_before(jiffies, READ_ONCE(calc_load_update)))
 		idx++;
 
 	return idx & 1;
@@ -204,7 +204,7 @@ void calc_load_exit_idle(void)
 	/*
 	 * If we're still before the pending sample window, we're done.
 	 */
-	this_rq->calc_load_update = calc_load_update;
+	this_rq->calc_load_update = READ_ONCE(calc_load_update);
 	if (time_before(jiffies, this_rq->calc_load_update))
 		return;
 
@@ -308,13 +308,15 @@ calc_load_n(unsigned long load, unsigned long exp,
  */
 static void calc_global_nohz(void)
 {
+	unsigned long sample_window;
 	long delta, active, n;
 
-	if (!time_before(jiffies, calc_load_update + 10)) {
+	sample_window = READ_ONCE(calc_load_update);
+	if (!time_before(jiffies, sample_window + 10)) {
 		/*
 		 * Catch-up, fold however many we are behind still
 		 */
-		delta = jiffies - calc_load_update - 10;
+		delta = jiffies - sample_window - 10;
 		n = 1 + (delta / LOAD_FREQ);
 
 		active = atomic_long_read(&calc_load_tasks);
@@ -324,7 +326,7 @@ static void calc_global_nohz(void)
 		avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
 		avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
 
-		calc_load_update += n * LOAD_FREQ;
+		WRITE_ONCE(calc_load_update, sample_window + n * LOAD_FREQ);
 	}
 
 	/*
@@ -352,9 +354,11 @@ static inline void calc_global_nohz(void) { }
  */
 void calc_global_load(unsigned long ticks)
 {
+	unsigned long sample_window;
 	long active, delta;
 
-	if (time_before(jiffies, calc_load_update + 10))
+	sample_window = READ_ONCE(calc_load_update);
+	if (time_before(jiffies, sample_window + 10))
 		return;
 
 	/*
@@ -371,7 +375,7 @@ void calc_global_load(unsigned long ticks)
 	avenrun[1] = calc_load(avenrun[1], EXP_5, active);
 	avenrun[2] = calc_load(avenrun[2], EXP_15, active);
 
-	calc_load_update += LOAD_FREQ;
+	WRITE_ONCE(calc_load_update, sample_window + LOAD_FREQ);
 
 	/*
 	 * In case we idled for multiple LOAD_FREQ intervals, catch up in bulk.

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-16 11:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-17 12:07 [PATCH v2 0/2] sched/loadavg: Fix loadavg spikes and sprinkle {READ,WRITE}_ONCE() Matt Fleming
2017-02-17 12:07 ` [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting Matt Fleming
2017-02-22 15:18   ` Frederic Weisbecker
2017-03-08  8:47   ` Wanpeng Li
2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming
2017-02-17 12:07 ` [PATCH v2 2/2] sched/loadavg: Use {READ,WRITE}_ONCE() for sample window Matt Fleming
2017-03-16 11:13   ` [tip:sched/core] " tip-bot for Matt Fleming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).