From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752916AbcHJSID (ORCPT ); Wed, 10 Aug 2016 14:08:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35622 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932629AbcHJSIA (ORCPT ); Wed, 10 Aug 2016 14:08:00 -0400 Date: Wed, 10 Aug 2016 12:52:12 -0400 From: Rik van Riel To: Wanpeng Li Cc: Frederic Weisbecker , Ingo Molnar , LKML , Paolo Bonzini , Peter Zijlstra , Wanpeng Li , Thomas Gleixner , Radim Krcmar , Mike Galbraith Subject: [PATCH] time,virt: resync steal time when guest & host lose sync Message-ID: <20160810125212.78564dc2@annuminas.surriel.com> In-Reply-To: References: <1468421405-20056-1-git-send-email-fweisbec@gmail.com> <1468421405-20056-2-git-send-email-fweisbec@gmail.com> <1470751579.13905.77.camel@redhat.com> Organization: Red Hat, Inc. MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 10 Aug 2016 16:52:16 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 10 Aug 2016 07:39:08 +0800 Wanpeng Li wrote: > The regression is caused by your commit "sched,time: Count actually > elapsed irq & softirq time". Wanpeng, does this patch fix your issue? Paolo, what is your opinion on this issue? I can think of all kinds of ways in which guest and host might lose sync with steal time, from uninitialized values at boot, to guest pause, followed by save to disk, and reload, to live migration, to... ---8<--- Subject: time,virt: resync steal time when guest & host lose sync When guest and host wildly disagree on steal time, a guest can do several things: 1) Quickly account all the steal time at once (the kernel did this before 57430218317e ("sched/cputime: Count actually elapsed irq & softirq time"), when steal_account_process_ticks got ULONG_MAX as its maximum value. 2) Stay out of sync for an indeterminate amount of time. This is what the system does today. 3) Sync up the guest value to the host-provided value, without accounting an absurdly large value in the cpu time statistics. This patch makes the kernel do (3), which seems like the right thing to do. The exact value of the threshold use probably does not matter too much, as long as it is long enough to cover all the timer ticks that passed during an idle period, because (irqtime_)account_idle_ticks can process a large amount of time all at once. Signed-off-by: Rik van Riel Reported-by: Wanpeng Li --- kernel/sched/cputime.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 1934f658c036..c18f9e717af6 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -273,7 +273,17 @@ static __always_inline cputime_t steal_account_process_time(cputime_t maxtime) steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; - steal_cputime = min(nsecs_to_cputime(steal), maxtime); + steal_cputime = nsecs_to_cputime(steal); + if (steal_cputime > 32 * maxtime) { + /* + * Guest and host steal time values are way out of + * sync. Sync up the guest steal time with the host. + */ + this_rq()->prev_steal_time += + cputime_to_nsecs(steal_cputime); + return 0; + } + steal_cputime = min(steal_cputime, maxtime); account_steal_time(steal_cputime); this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime);