* [PATCH v4 1/3] KVM: fix steal clock warp during guest cpu hotplug
@ 2016-06-07 8:33 Wanpeng Li
2016-06-07 8:33 ` [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during " Wanpeng Li
2016-06-07 8:33 ` [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Wanpeng Li
0 siblings, 2 replies; 5+ messages in thread
From: Wanpeng Li @ 2016-06-07 8:33 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Wanpeng Li, Paolo Bonzini, Radim Krčmář,
Ingo Molnar, Peter Zijlstra (Intel),
Rik van Riel, Thomas Gleixner, Frederic Weisbecker
From: Wanpeng Li <wanpeng.li@hotmail.com>
I observed that sometimes st is 100% instantaneous, then idle is 100%
even if there is a cpu hog on the guest cpu after the cpu hotplug comes
back(N.B. this can not always be readily reproduced). I add trace to
capture it as below:
cpuhp/1-12 [001] d.h1 167.461657: account_process_tick: steal = 1291385514, prev_steal_time = 0
cpuhp/1-12 [001] d.h1 167.461659: account_process_tick: steal_jiffies = 1291
<idle>-0 [001] d.h1 167.462663: account_process_tick: steal = 18732255, prev_steal_time = 1291000000
<idle>-0 [001] d.h1 167.462664: account_process_tick: steal_jiffies = 18446744072437
The steal clock warp and then steal_jiffies underflow.
Rik also pointed out to me:
| I have seen stuff like that with live migration too, in the past
The root cause of steal clock warp during hotplug is kvm_steal_time reset
to 0 after cpu hotplug comes back which should be preexiting guest value.
This patch fix it by don't reset kvm_steal_time during guest cpu hotplug.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v2 -> v3:
* fix the root cause
v1 -> v2:
* update patch subject, description and comments
* deal with the case where steal time suddenly increases by a ludicrous amount
arch/x86/kernel/kvm.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index eea2a6f..1ef5e48 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
if (!has_steal_clock)
return;
- memset(st, 0, sizeof(*st));
-
wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
pr_info("kvm-stealtime: cpu %d, msr %llx\n",
cpu, (unsigned long long) slow_virt_to_phys(st));
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug
2016-06-07 8:33 [PATCH v4 1/3] KVM: fix steal clock warp during guest cpu hotplug Wanpeng Li
@ 2016-06-07 8:33 ` Wanpeng Li
2016-06-07 16:18 ` kbuild test robot
2016-06-07 8:33 ` [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Wanpeng Li
1 sibling, 1 reply; 5+ messages in thread
From: Wanpeng Li @ 2016-06-07 8:33 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra (Intel),
Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
Paolo Bonzini, Radim Krčmář
From: Wanpeng Li <wanpeng.li@hotmail.com>
Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")
set rq->prev_* to 0 after a cpu hotplug comes back in order to fix the scenario:
| steal is smaller than rq->prev_steal_time we end up with an insane large
| value which then gets added to rq->prev_steal_time, resulting in a permanent
| wreckage of the accounting.
However, it is still buggy.
rq->prev_steal_time = 0:
As Rik pointed out:
| setting rq->prev_irq_time to 0 in the guest, and then getting a giant value from
| the host, could result in a very large of steal_jiffies.
rq->prev_steal_time_rq = 0:
| steal = paravirt_steal_clock(cpu_of(rq));
| steal -= rq->prev_steal_time_rq;
|
| if (unlikely(steal > delta))
| steal = delta;
|
| rq->prev_steal_time_rq += steal;
| delta -= steal;
|
| rq->clock_task += delta;
steal is a giant value and rq->prev_steal_time_rq is 0, rq->prev_steal_time_rq
grows in delta granularity, rq->clock_task can't ramp up until rq->prev_steal_time_rq
catches up steal clock since delta value will be 0 after reducing steal time from
normal execution time. That's why I obersved that cpuhg/1-12 continue running
until rq->prev_steal_time_rq catches up steal clock timestamp.
I believe rq->prev_irq_time has similar issue. So this patch fix it by setting
rq->prev_* to current irq time and steal clock timestamp after a cpu hotplug
comes back.
Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")'
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
kernel/sched/sched.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 72f1f30..e6758af 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1813,12 +1813,12 @@ static inline void cpufreq_trigger_update(u64 time) {}
static inline void account_reset_rq(struct rq *rq)
{
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
- rq->prev_irq_time = 0;
+ rq->prev_irq_time = irq_time_read(cpu_of(rq));
#endif
#ifdef CONFIG_PARAVIRT
- rq->prev_steal_time = 0;
+ rq->prev_steal_time = paravirt_steal_clock(cpu_of(rq));
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
- rq->prev_steal_time_rq = 0;
+ rq->prev_steal_time_rq = paravirt_steal_clock(cpu_of(rq));
#endif
}
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug
2016-06-07 8:33 ` [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during " Wanpeng Li
@ 2016-06-07 16:18 ` kbuild test robot
0 siblings, 0 replies; 5+ messages in thread
From: kbuild test robot @ 2016-06-07 16:18 UTC (permalink / raw)
To: Wanpeng Li
Cc: kbuild-all, linux-kernel, kvm, Wanpeng Li, Ingo Molnar,
Peter Zijlstra (Intel),
Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
Paolo Bonzini, Radim Krčmář
[-- Attachment #1: Type: text/plain, Size: 2786 bytes --]
Hi,
[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.7-rc2 next-20160607]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Wanpeng-Li/KVM-fix-steal-clock-warp-during-guest-cpu-hotplug/20160607-163708
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm
All errors (new ones prefixed by >>):
In file included from kernel/sched/cputime.c:7:0:
kernel/sched/sched.h: In function 'account_reset_rq':
kernel/sched/sched.h:1819:24: error: implicit declaration of function 'paravirt_steal_clock' [-Werror=implicit-function-declaration]
rq->prev_steal_time = paravirt_steal_clock(cpu_of(rq));
^
In file included from kernel/sched/cputime.c:9:0:
arch/arm/include/asm/paravirt.h: At top level:
>> arch/arm/include/asm/paravirt.h:14:19: error: conflicting types for 'paravirt_steal_clock'
static inline u64 paravirt_steal_clock(int cpu)
^
In file included from kernel/sched/cputime.c:7:0:
kernel/sched/sched.h:1819:24: note: previous implicit declaration of 'paravirt_steal_clock' was here
rq->prev_steal_time = paravirt_steal_clock(cpu_of(rq));
^
cc1: some warnings being treated as errors
vim +/paravirt_steal_clock +14 arch/arm/include/asm/paravirt.h
02c2433b Stefano Stabellini 2015-11-23 8
02c2433b Stefano Stabellini 2015-11-23 9 struct pv_time_ops {
02c2433b Stefano Stabellini 2015-11-23 10 unsigned long long (*steal_clock)(int cpu);
02c2433b Stefano Stabellini 2015-11-23 11 };
02c2433b Stefano Stabellini 2015-11-23 12 extern struct pv_time_ops pv_time_ops;
02c2433b Stefano Stabellini 2015-11-23 13
02c2433b Stefano Stabellini 2015-11-23 @14 static inline u64 paravirt_steal_clock(int cpu)
02c2433b Stefano Stabellini 2015-11-23 15 {
02c2433b Stefano Stabellini 2015-11-23 16 return pv_time_ops.steal_clock(cpu);
02c2433b Stefano Stabellini 2015-11-23 17 }
:::::: The code at line 14 was first introduced by commit
:::::: 02c2433b3aa6b57313c261c9811bbbe49528101c arm: introduce CONFIG_PARAVIRT, PARAVIRT_TIME_ACCOUNTING and pv_time_ops
:::::: TO: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
:::::: CC: David Vrabel <david.vrabel@citrix.com>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 57606 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting
2016-06-07 8:33 [PATCH v4 1/3] KVM: fix steal clock warp during guest cpu hotplug Wanpeng Li
2016-06-07 8:33 ` [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during " Wanpeng Li
@ 2016-06-07 8:33 ` Wanpeng Li
1 sibling, 0 replies; 5+ messages in thread
From: Wanpeng Li @ 2016-06-07 8:33 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra (Intel),
Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
Paolo Bonzini, Radim Krčmář
From: Wanpeng Li <wanpeng.li@hotmail.com>
This patch adds guest steal-time support to full dynticks CPU
time accounting. After the following commit:
ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")
... time sampling became jiffy based, even if it's still listened
to ring boundaries, so steal_account_process_tick() is reused
to account how many 'ticks' are stolen-time, after the last accumulation.
Suggested-and-Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v3 -> v4:
* fix grammar errors, thanks Ingo
* cleanup fragile codes, thanks Ingo
v2 -> v3:
* convert steal time jiffies to cputime
v1 -> v2:
* fix divide zero bug, thanks Rik
kernel/sched/cputime.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 75f98c5..9ff036b 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime)
cpustat[CPUTIME_IDLE] += (__force u64) cputime;
}
-static __always_inline bool steal_account_process_tick(void)
+static __always_inline unsigned long steal_account_process_tick(void)
{
#ifdef CONFIG_PARAVIRT
if (static_key_false(¶virt_steal_enabled)) {
@@ -279,7 +279,7 @@ static __always_inline bool steal_account_process_tick(void)
return steal_jiffies;
}
#endif
- return false;
+ return 0;
}
/*
@@ -691,9 +691,13 @@ static cputime_t get_vtime_delta(struct task_struct *tsk)
static void __vtime_account_system(struct task_struct *tsk)
{
- cputime_t delta_cpu = get_vtime_delta(tsk);
+ cputime_t delta_time = get_vtime_delta(tsk);
+ cputime_t steal_time = jiffies_to_cputime(steal_account_process_tick());
- account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
+ if (steal_time < delta_time) {
+ delta_time -= steal_time;
+ account_system_time(tsk, irq_count(), delta_time, cputime_to_scaled(delta_time));
+ }
}
void vtime_account_system(struct task_struct *tsk)
@@ -718,13 +722,18 @@ void vtime_gen_account_irq_exit(struct task_struct *tsk)
void vtime_account_user(struct task_struct *tsk)
{
- cputime_t delta_cpu;
+ cputime_t delta_time, steal_time;
write_seqcount_begin(&tsk->vtime_seqcount);
tsk->vtime_snap_whence = VTIME_SYS;
if (vtime_delta(tsk)) {
- delta_cpu = get_vtime_delta(tsk);
- account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+ delta_time = get_vtime_delta(tsk);
+ steal_time = jiffies_to_cputime(steal_account_process_tick());
+
+ if (steal_time < delta_time) {
+ delta_time -= steal_time;
+ account_user_time(tsk, delta_time, cputime_to_scaled(delta_time));
+ }
}
write_seqcount_end(&tsk->vtime_seqcount);
}
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
[parent not found: <1465661590-4732-1-git-send-email-wanpeng.li@hotmail.com>]
end of thread, other threads:[~2016-06-07 16:15 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-07 8:33 [PATCH v4 1/3] KVM: fix steal clock warp during guest cpu hotplug Wanpeng Li
2016-06-07 8:33 ` [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during " Wanpeng Li
2016-06-07 16:18 ` kbuild test robot
2016-06-07 8:33 ` [PATCH v4 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Wanpeng Li
[not found] <1465661590-4732-1-git-send-email-wanpeng.li@hotmail.com>
[not found] ` <SG2PR02MB15507073C531BAA82E9C8E67805D0@SG2PR02MB1550.apcprd02.prod.outlook.com>
2016-06-07 10:34 ` [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).