From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760724AbcCEL2l (ORCPT ); Sat, 5 Mar 2016 06:28:41 -0500 Received: from torg.zytor.com ([198.137.202.12]:33770 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754290AbcCEL2d (ORCPT ); Sat, 5 Mar 2016 06:28:33 -0500 Date: Sat, 5 Mar 2016 03:27:38 -0800 From: tip-bot for Thomas Gleixner Message-ID: Cc: glommer@parallels.com, fweisbec@gmail.com, peterz@infradead.org, linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@kernel.org, torvalds@linux-foundation.org, stable@vger.kernel.org, riel@redhat.com, tglx@linutronix.de Reply-To: tglx@linutronix.de, torvalds@linux-foundation.org, riel@redhat.com, stable@vger.kernel.org, mingo@kernel.org, peterz@infradead.org, hpa@zytor.com, linux-kernel@vger.kernel.org, fweisbec@gmail.com, glommer@parallels.com In-Reply-To: References: To: linux-tip-commits@vger.kernel.org Subject: [tip:sched/urgent] sched/cputime: Fix steal time accounting vs. CPU hotplug Git-Commit-ID: e9532e69b8d1d1284e8ecf8d2586de34aec61244 X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: e9532e69b8d1d1284e8ecf8d2586de34aec61244 Gitweb: http://git.kernel.org/tip/e9532e69b8d1d1284e8ecf8d2586de34aec61244 Author: Thomas Gleixner AuthorDate: Fri, 4 Mar 2016 15:59:42 +0100 Committer: Ingo Molnar CommitDate: Sat, 5 Mar 2016 09:17:20 +0100 sched/cputime: Fix steal time accounting vs. CPU hotplug On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time value over CPU down and up. So after the CPU comes up again the delta calculation in steal_account_process_tick() wreckages itself due to the unsigned math: u64 steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; So if steal is smaller than rq->prev_steal_time we end up with an insane large value which then gets added to rq->prev_steal_time, resulting in a permanent wreckage of the accounting. As a consequence the per CPU stats in /proc/stat become stale. Nice trick to tell the world how idle the system is (100%) while the CPU is 100% busy running tasks. Though we prefer realistic numbers. None of the accounting values which use a previous value to account for fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity check for prev_irq_time and prev_steal_time_rq, but that sanity check solely deals with clock warps and limits the /proc/stat visible wreckage. The prev_time values are still wrong. Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again. Signed-off-by: Thomas Gleixner Acked-by: Rik van Riel Cc: Cc: Frederic Weisbecker Cc: Glauber Costa Cc: Linus Torvalds Cc: Peter Zijlstra Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time" Fixes: commit aa483808516c "sched: Remove irq time from available CPU power" Fixes: commit e6e6685accfa "KVM guest: Steal time accounting" Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 1 + kernel/sched/sched.h | 13 +++++++++++++ 2 files changed, 14 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ab814bf..406182a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5627,6 +5627,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu) case CPU_UP_PREPARE: rq->calc_load_update = calc_load_update; + account_reset_rq(rq); break; case CPU_ONLINE: diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 30ea2d8..4f6598a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1738,3 +1738,16 @@ static inline u64 irq_time_read(int cpu) } #endif /* CONFIG_64BIT */ #endif /* CONFIG_IRQ_TIME_ACCOUNTING */ + +static inline void account_reset_rq(struct rq *rq) +{ +#ifdef CONFIG_IRQ_TIME_ACCOUNTING + rq->prev_irq_time = 0; +#endif +#ifdef CONFIG_PARAVIRT + rq->prev_steal_time = 0; +#endif +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING + rq->prev_steal_time_rq = 0; +#endif +}