All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/cputime: Fix steal time accounting vs. cpu hotplug
@ 2016-03-04 14:59 Thomas Gleixner
  2016-03-04 15:15 ` Rik van Riel
  2016-03-05 11:27 ` [tip:sched/urgent] sched/cputime: Fix steal time accounting vs. CPU hotplug tip-bot for Thomas Gleixner
  0 siblings, 2 replies; 3+ messages in thread
From: Thomas Gleixner @ 2016-03-04 14:59 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Glauber Costa, Frederic Weisbecker,
	Rik van Riel

On cpu hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over cpu down and up. So after the cpu comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:

	 u64 steal = paravirt_steal_clock(smp_processor_id());
	 
	 steal -= this_rq()->prev_steal_time;

So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per cpu stats in /proc/stat
become stale.

Nice trick to tell the world how idle the system is (100%) while the cpu is
100% busy running tasks. Though we prefer realistic numbers.

None of the accounting values which use a previous value to account for
fractions is reset at cpu hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.

Solution is simple: Reset rq->prev_*_time when the cpu is plugged in again.

Fixes: commit e6e6685accfa "KVM guest: Steal time accounting"
Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa483808516c "sched: Remove irq time from available CPU power"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org

---
 kernel/sched/core.c  |    1 +
 kernel/sched/sched.h |   13 +++++++++++++
 2 files changed, 14 insertions(+)

Index: b/kernel/sched/core.c
===================================================================
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5627,6 +5627,7 @@ migration_call(struct notifier_block *nf
 
 	case CPU_UP_PREPARE:
 		rq->calc_load_update = calc_load_update;
+		account_reset_rq(rq);
 		break;
 
 	case CPU_ONLINE:
Index: b/kernel/sched/sched.h
===================================================================
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1738,3 +1738,16 @@ static inline u64 irq_time_read(int cpu)
 }
 #endif /* CONFIG_64BIT */
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+
+static inline void account_reset_rq(struct rq *rq)
+{
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+	rq->prev_irq_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT
+	rq->prev_steal_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+	rq->prev_steal_time_rq = 0;
+#endif
+}

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-03-05 11:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-04 14:59 [PATCH] sched/cputime: Fix steal time accounting vs. cpu hotplug Thomas Gleixner
2016-03-04 15:15 ` Rik van Riel
2016-03-05 11:27 ` [tip:sched/urgent] sched/cputime: Fix steal time accounting vs. CPU hotplug tip-bot for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.