All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/cputime: Fix steal time accounting vs. cpu hotplug
@ 2016-03-04 14:59 Thomas Gleixner
  2016-03-04 15:15 ` Rik van Riel
  2016-03-05 11:27 ` [tip:sched/urgent] sched/cputime: Fix steal time accounting vs. CPU hotplug tip-bot for Thomas Gleixner
  0 siblings, 2 replies; 3+ messages in thread
From: Thomas Gleixner @ 2016-03-04 14:59 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Glauber Costa, Frederic Weisbecker,
	Rik van Riel

On cpu hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over cpu down and up. So after the cpu comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:

	 u64 steal = paravirt_steal_clock(smp_processor_id());
	 
	 steal -= this_rq()->prev_steal_time;

So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per cpu stats in /proc/stat
become stale.

Nice trick to tell the world how idle the system is (100%) while the cpu is
100% busy running tasks. Though we prefer realistic numbers.

None of the accounting values which use a previous value to account for
fractions is reset at cpu hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.

Solution is simple: Reset rq->prev_*_time when the cpu is plugged in again.

Fixes: commit e6e6685accfa "KVM guest: Steal time accounting"
Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa483808516c "sched: Remove irq time from available CPU power"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org

---
 kernel/sched/core.c  |    1 +
 kernel/sched/sched.h |   13 +++++++++++++
 2 files changed, 14 insertions(+)

Index: b/kernel/sched/core.c
===================================================================
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5627,6 +5627,7 @@ migration_call(struct notifier_block *nf
 
 	case CPU_UP_PREPARE:
 		rq->calc_load_update = calc_load_update;
+		account_reset_rq(rq);
 		break;
 
 	case CPU_ONLINE:
Index: b/kernel/sched/sched.h
===================================================================
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1738,3 +1738,16 @@ static inline u64 irq_time_read(int cpu)
 }
 #endif /* CONFIG_64BIT */
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+
+static inline void account_reset_rq(struct rq *rq)
+{
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+	rq->prev_irq_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT
+	rq->prev_steal_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+	rq->prev_steal_time_rq = 0;
+#endif
+}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched/cputime: Fix steal time accounting vs. cpu hotplug
  2016-03-04 14:59 [PATCH] sched/cputime: Fix steal time accounting vs. cpu hotplug Thomas Gleixner
@ 2016-03-04 15:15 ` Rik van Riel
  2016-03-05 11:27 ` [tip:sched/urgent] sched/cputime: Fix steal time accounting vs. CPU hotplug tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 3+ messages in thread
From: Rik van Riel @ 2016-03-04 15:15 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Glauber Costa, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

On Fri, 2016-03-04 at 15:59 +0100, Thomas Gleixner wrote:
> On cpu hotplug the steal time accounting can keep a stale rq-
> >prev_steal_time
> value over cpu down and up. So after the cpu comes up again the delta
> calculation in steal_account_process_tick() wreckages itself due to
> the
> unsigned math:
> 
>          u64 steal = paravirt_steal_clock(smp_processor_id());
>          
>          steal -= this_rq()->prev_steal_time;
> 
> So if steal is smaller than rq->prev_steal_time we end up with an
> insane large
> value which then gets added to rq->prev_steal_time, resulting in a
> permanent
> wreckage of the accounting. As a consequence the per cpu stats in
> /proc/stat
> become stale.
> 
> Nice trick to tell the world how idle the system is (100%) while the
> cpu is
> 100% busy running tasks. Though we prefer realistic numbers.
> 
> None of the accounting values which use a previous value to account
> for
> fractions is reset at cpu hotplug time. update_rq_clock_task() has a
> sanity
> check for prev_irq_time and prev_steal_time_rq, but that sanity check
> solely
> deals with clock warps and limits the /proc/stat visible wreckage.
> The
> prev_time values are still wrong.
> 
> Solution is simple: Reset rq->prev_*_time when the cpu is plugged in
> again.
> 
> Fixes: commit e6e6685accfa "KVM guest: Steal time accounting"
> Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for
> stolen time"
> Fixes: commit aa483808516c "sched: Remove irq time from available CPU
> power"
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip:sched/urgent] sched/cputime: Fix steal time accounting vs. CPU hotplug
  2016-03-04 14:59 [PATCH] sched/cputime: Fix steal time accounting vs. cpu hotplug Thomas Gleixner
  2016-03-04 15:15 ` Rik van Riel
@ 2016-03-05 11:27 ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-03-05 11:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: glommer, fweisbec, peterz, linux-kernel, hpa, mingo, torvalds,
	stable, riel, tglx

Commit-ID:  e9532e69b8d1d1284e8ecf8d2586de34aec61244
Gitweb:     http://git.kernel.org/tip/e9532e69b8d1d1284e8ecf8d2586de34aec61244
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 4 Mar 2016 15:59:42 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 5 Mar 2016 09:17:20 +0100

sched/cputime: Fix steal time accounting vs. CPU hotplug

On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over CPU down and up. So after the CPU comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:

	 u64 steal = paravirt_steal_clock(smp_processor_id());

	 steal -= this_rq()->prev_steal_time;

So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per CPU stats in /proc/stat
become stale.

Nice trick to tell the world how idle the system is (100%) while the CPU is
100% busy running tasks. Though we prefer realistic numbers.

None of the accounting values which use a previous value to account for
fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.

Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa483808516c "sched: Remove irq time from available CPU power"
Fixes: commit e6e6685accfa "KVM guest: Steal time accounting"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c  |  1 +
 kernel/sched/sched.h | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ab814bf..406182a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5627,6 +5627,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 
 	case CPU_UP_PREPARE:
 		rq->calc_load_update = calc_load_update;
+		account_reset_rq(rq);
 		break;
 
 	case CPU_ONLINE:
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 30ea2d8..4f6598a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1738,3 +1738,16 @@ static inline u64 irq_time_read(int cpu)
 }
 #endif /* CONFIG_64BIT */
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+
+static inline void account_reset_rq(struct rq *rq)
+{
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+	rq->prev_irq_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT
+	rq->prev_steal_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+	rq->prev_steal_time_rq = 0;
+#endif
+}

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-03-05 11:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-04 14:59 [PATCH] sched/cputime: Fix steal time accounting vs. cpu hotplug Thomas Gleixner
2016-03-04 15:15 ` Rik van Riel
2016-03-05 11:27 ` [tip:sched/urgent] sched/cputime: Fix steal time accounting vs. CPU hotplug tip-bot for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.