* [PATCH v6 1/3] KVM: fix steal clock warp during guest cpu hotplug
2016-06-13 10:32 [PATCH v6 0/3] Sched, KVM: st: Add steal time support to full dynticks CPU time accounting Wanpeng Li
@ 2016-06-13 10:32 ` Wanpeng Li
2016-06-13 10:44 ` Paolo Bonzini
2016-06-14 11:26 ` [tip:sched/core] KVM: Fix steal clock warp during guest CPU hotplug tip-bot for Wanpeng Li
2016-06-13 10:32 ` [PATCH v6 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug Wanpeng Li
` (2 subsequent siblings)
3 siblings, 2 replies; 13+ messages in thread
From: Wanpeng Li @ 2016-06-13 10:32 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Wanpeng Li, Paolo Bonzini, Radim Krčmář,
Ingo Molnar, Peter Zijlstra (Intel),
Rik van Riel, Thomas Gleixner, Frederic Weisbecker, John Stultz
From: Wanpeng Li <wanpeng.li@hotmail.com>
Sometimes, after CPU hotplug you can observe a spike in stolen time
(100%) followed by the CPU being marked as 100% idle when it's actually
busy with a CPU hog task. The trace looks like the following:
cpuhp/1-12 [001] d.h1 167.461657: account_process_tick: steal = 1291385514, prev_steal_time = 0
cpuhp/1-12 [001] d.h1 167.461659: account_process_tick: steal_jiffies = 1291
<idle>-0 [001] d.h1 167.462663: account_process_tick: steal = 18732255, prev_steal_time = 1291000000
<idle>-0 [001] d.h1 167.462664: account_process_tick: steal_jiffies = 18446744072437
The sudden decrease of "steal" causes steal_jiffies to underflow.
The root cause is kvm_steal_time being reset to 0 after hot-plugging
back in a CPU. Instead, the preexisting value can be used, which is
what the core scheduler code expects.
John Stultz also reported a similar issue after guest S3.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
arch/x86/kernel/kvm.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index eea2a6f..1ef5e48 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
if (!has_steal_clock)
return;
- memset(st, 0, sizeof(*st));
-
wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
pr_info("kvm-stealtime: cpu %d, msr %llx\n",
cpu, (unsigned long long) slow_virt_to_phys(st));
--
1.9.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v6 1/3] KVM: fix steal clock warp during guest cpu hotplug
2016-06-13 10:32 ` [PATCH v6 1/3] KVM: fix steal clock warp during guest cpu hotplug Wanpeng Li
@ 2016-06-13 10:44 ` Paolo Bonzini
2016-06-13 11:28 ` Peter Zijlstra
2016-06-13 11:31 ` Wanpeng Li
2016-06-14 11:26 ` [tip:sched/core] KVM: Fix steal clock warp during guest CPU hotplug tip-bot for Wanpeng Li
1 sibling, 2 replies; 13+ messages in thread
From: Paolo Bonzini @ 2016-06-13 10:44 UTC (permalink / raw)
To: Wanpeng Li, linux-kernel, kvm
Cc: Wanpeng Li, Radim Krčmář,
Ingo Molnar, Peter Zijlstra (Intel),
Rik van Riel, Thomas Gleixner, Frederic Weisbecker, John Stultz
On 13/06/2016 12:32, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> Sometimes, after CPU hotplug you can observe a spike in stolen time
> (100%) followed by the CPU being marked as 100% idle when it's actually
> busy with a CPU hog task. The trace looks like the following:
>
> cpuhp/1-12 [001] d.h1 167.461657: account_process_tick: steal = 1291385514, prev_steal_time = 0
> cpuhp/1-12 [001] d.h1 167.461659: account_process_tick: steal_jiffies = 1291
> <idle>-0 [001] d.h1 167.462663: account_process_tick: steal = 18732255, prev_steal_time = 1291000000
> <idle>-0 [001] d.h1 167.462664: account_process_tick: steal_jiffies = 18446744072437
>
> The sudden decrease of "steal" causes steal_jiffies to underflow.
> The root cause is kvm_steal_time being reset to 0 after hot-plugging
> back in a CPU. Instead, the preexisting value can be used, which is
> what the core scheduler code expects.
>
> John Stultz also reported a similar issue after guest S3.
>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> arch/x86/kernel/kvm.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index eea2a6f..1ef5e48 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
> if (!has_steal_clock)
> return;
>
> - memset(st, 0, sizeof(*st));
> -
> wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
> pr_info("kvm-stealtime: cpu %d, msr %llx\n",
> cpu, (unsigned long long) slow_virt_to_phys(st));
>
Because there's no cover letter, I guess I have to ack each patch
independently.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Also, there's really no relation between patches 1-2 and 3...
Paolo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v6 1/3] KVM: fix steal clock warp during guest cpu hotplug
2016-06-13 10:44 ` Paolo Bonzini
@ 2016-06-13 11:28 ` Peter Zijlstra
2016-06-13 11:31 ` Wanpeng Li
1 sibling, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2016-06-13 11:28 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Wanpeng Li, linux-kernel, kvm, Wanpeng Li,
Radim Krčmář,
Ingo Molnar, Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
John Stultz
On Mon, Jun 13, 2016 at 12:44:46PM +0200, Paolo Bonzini wrote:
> Because there's no cover letter, I guess I have to ack each patch
> independently.
>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Thanks, I'll take the lot through the sched tree.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v6 1/3] KVM: fix steal clock warp during guest cpu hotplug
2016-06-13 10:44 ` Paolo Bonzini
2016-06-13 11:28 ` Peter Zijlstra
@ 2016-06-13 11:31 ` Wanpeng Li
1 sibling, 0 replies; 13+ messages in thread
From: Wanpeng Li @ 2016-06-13 11:31 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Wanpeng Li, Radim Krčmář,
Ingo Molnar, Peter Zijlstra (Intel),
Rik van Riel, Thomas Gleixner, Frederic Weisbecker, John Stultz
2016-06-13 18:44 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 13/06/2016 12:32, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> Sometimes, after CPU hotplug you can observe a spike in stolen time
>> (100%) followed by the CPU being marked as 100% idle when it's actually
>> busy with a CPU hog task. The trace looks like the following:
>>
>> cpuhp/1-12 [001] d.h1 167.461657: account_process_tick: steal = 1291385514, prev_steal_time = 0
>> cpuhp/1-12 [001] d.h1 167.461659: account_process_tick: steal_jiffies = 1291
>> <idle>-0 [001] d.h1 167.462663: account_process_tick: steal = 18732255, prev_steal_time = 1291000000
>> <idle>-0 [001] d.h1 167.462664: account_process_tick: steal_jiffies = 18446744072437
>>
>> The sudden decrease of "steal" causes steal_jiffies to underflow.
>> The root cause is kvm_steal_time being reset to 0 after hot-plugging
>> back in a CPU. Instead, the preexisting value can be used, which is
>> what the core scheduler code expects.
>>
>> John Stultz also reported a similar issue after guest S3.
>>
>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Cc: John Stultz <john.stultz@linaro.org>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> arch/x86/kernel/kvm.c | 2 --
>> 1 file changed, 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index eea2a6f..1ef5e48 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
>> if (!has_steal_clock)
>> return;
>>
>> - memset(st, 0, sizeof(*st));
>> -
>> wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
>> pr_info("kvm-stealtime: cpu %d, msr %llx\n",
>> cpu, (unsigned long long) slow_virt_to_phys(st));
>>
>
> Because there's no cover letter, I guess I have to ack each patch
> independently.
>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Thanks for your and Rik's review, actually there is a cover letter for
this version, it seems that it just send to ML and forgot to Cc
maintainers/reviewers.
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 13+ messages in thread
* [tip:sched/core] KVM: Fix steal clock warp during guest CPU hotplug
2016-06-13 10:32 ` [PATCH v6 1/3] KVM: fix steal clock warp during guest cpu hotplug Wanpeng Li
2016-06-13 10:44 ` Paolo Bonzini
@ 2016-06-14 11:26 ` tip-bot for Wanpeng Li
1 sibling, 0 replies; 13+ messages in thread
From: tip-bot for Wanpeng Li @ 2016-06-14 11:26 UTC (permalink / raw)
To: linux-tip-commits
Cc: rkrcmar, peterz, mingo, tglx, linux-kernel, hpa, john.stultz,
efault, fweisbec, wanpeng.li, torvalds, pbonzini, riel
Commit-ID: 2348140d58f4f4245e9635ea8f1a77e940a4d877
Gitweb: http://git.kernel.org/tip/2348140d58f4f4245e9635ea8f1a77e940a4d877
Author: Wanpeng Li <wanpeng.li@hotmail.com>
AuthorDate: Mon, 13 Jun 2016 18:32:44 +0800
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 14 Jun 2016 11:13:14 +0200
KVM: Fix steal clock warp during guest CPU hotplug
Sometimes, after CPU hotplug you can observe a spike in stolen time
(100%) followed by the CPU being marked as 100% idle when it's actually
busy with a CPU hog task. The trace looks like the following:
cpuhp/1-12 [001] d.h1 167.461657: account_process_tick: steal = 1291385514, prev_steal_time = 0
cpuhp/1-12 [001] d.h1 167.461659: account_process_tick: steal_jiffies = 1291
<idle>-0 [001] d.h1 167.462663: account_process_tick: steal = 18732255, prev_steal_time = 1291000000
<idle>-0 [001] d.h1 167.462664: account_process_tick: steal_jiffies = 18446744072437
The sudden decrease of "steal" causes steal_jiffies to underflow.
The root cause is kvm_steal_time being reset to 0 after hot-plugging
back in a CPU. Instead, the preexisting value can be used, which is
what the core scheduler code expects.
John Stultz also reported a similar issue after guest S3.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1465813966-3116-2-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/kernel/kvm.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index eea2a6f..1ef5e48 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
if (!has_steal_clock)
return;
- memset(st, 0, sizeof(*st));
-
wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
pr_info("kvm-stealtime: cpu %d, msr %llx\n",
cpu, (unsigned long long) slow_virt_to_phys(st));
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v6 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug
2016-06-13 10:32 [PATCH v6 0/3] Sched, KVM: st: Add steal time support to full dynticks CPU time accounting Wanpeng Li
2016-06-13 10:32 ` [PATCH v6 1/3] KVM: fix steal clock warp during guest cpu hotplug Wanpeng Li
@ 2016-06-13 10:32 ` Wanpeng Li
2016-06-13 10:44 ` Paolo Bonzini
2016-06-14 11:26 ` [tip:sched/core] sched/cputime: Fix prev steal time accouting during CPU hotplug tip-bot for Wanpeng Li
2016-06-13 10:32 ` [PATCH v6 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Wanpeng Li
2016-06-13 11:28 ` [PATCH v6 0/3] Sched, KVM: st: " Wanpeng Li
3 siblings, 2 replies; 13+ messages in thread
From: Wanpeng Li @ 2016-06-13 10:32 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra (Intel),
Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
Paolo Bonzini, Radim Krčmář
From: Wanpeng Li <wanpeng.li@hotmail.com>
Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU
hotplug") set rq->prev_* to 0 after a cpu hotplug comes back in order to
fix the case where (after CPU hotplug) steal is smaller than
rq->prev_steal_time.
However, this should never happen. steal was only smaller because of the
KVM-specific bug fixed by the previous patch. Worse, the previous patch
triggers a bug on CPU hot-unplug/plug operation: because
rq->prev_steal_time is cleared, all of the CPU's past steal time will be
accounted again on hot-plug.
Since the root cause has been fixed, we can just revert commit e9532e69b8d1.
Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")'
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
kernel/sched/core.c | 1 -
kernel/sched/sched.h | 13 -------------
2 files changed, 14 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4..7d45bb3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7213,7 +7213,6 @@ static void sched_rq_cpu_starting(unsigned int cpu)
struct rq *rq = cpu_rq(cpu);
rq->calc_load_update = calc_load_update;
- account_reset_rq(rq);
update_max_interval();
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 72f1f30..de607e4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1809,16 +1809,3 @@ static inline void cpufreq_trigger_update(u64 time) {}
#else /* arch_scale_freq_capacity */
#define arch_scale_freq_invariant() (false)
#endif
-
-static inline void account_reset_rq(struct rq *rq)
-{
-#ifdef CONFIG_IRQ_TIME_ACCOUNTING
- rq->prev_irq_time = 0;
-#endif
-#ifdef CONFIG_PARAVIRT
- rq->prev_steal_time = 0;
-#endif
-#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
- rq->prev_steal_time_rq = 0;
-#endif
-}
--
1.9.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v6 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug
2016-06-13 10:32 ` [PATCH v6 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug Wanpeng Li
@ 2016-06-13 10:44 ` Paolo Bonzini
2016-06-14 11:26 ` [tip:sched/core] sched/cputime: Fix prev steal time accouting during CPU hotplug tip-bot for Wanpeng Li
1 sibling, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2016-06-13 10:44 UTC (permalink / raw)
To: Wanpeng Li, linux-kernel, kvm
Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra (Intel),
Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
Radim Krčmář
On 13/06/2016 12:32, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU
> hotplug") set rq->prev_* to 0 after a cpu hotplug comes back in order to
> fix the case where (after CPU hotplug) steal is smaller than
> rq->prev_steal_time.
>
> However, this should never happen. steal was only smaller because of the
> KVM-specific bug fixed by the previous patch. Worse, the previous patch
> triggers a bug on CPU hot-unplug/plug operation: because
> rq->prev_steal_time is cleared, all of the CPU's past steal time will be
> accounted again on hot-plug.
>
> Since the root cause has been fixed, we can just revert commit e9532e69b8d1.
>
> Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")'
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> kernel/sched/core.c | 1 -
> kernel/sched/sched.h | 13 -------------
> 2 files changed, 14 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f2cae4..7d45bb3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7213,7 +7213,6 @@ static void sched_rq_cpu_starting(unsigned int cpu)
> struct rq *rq = cpu_rq(cpu);
>
> rq->calc_load_update = calc_load_update;
> - account_reset_rq(rq);
> update_max_interval();
> }
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 72f1f30..de607e4 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1809,16 +1809,3 @@ static inline void cpufreq_trigger_update(u64 time) {}
> #else /* arch_scale_freq_capacity */
> #define arch_scale_freq_invariant() (false)
> #endif
> -
> -static inline void account_reset_rq(struct rq *rq)
> -{
> -#ifdef CONFIG_IRQ_TIME_ACCOUNTING
> - rq->prev_irq_time = 0;
> -#endif
> -#ifdef CONFIG_PARAVIRT
> - rq->prev_steal_time = 0;
> -#endif
> -#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> - rq->prev_steal_time_rq = 0;
> -#endif
> -}
>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [tip:sched/core] sched/cputime: Fix prev steal time accouting during CPU hotplug
2016-06-13 10:32 ` [PATCH v6 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug Wanpeng Li
2016-06-13 10:44 ` Paolo Bonzini
@ 2016-06-14 11:26 ` tip-bot for Wanpeng Li
1 sibling, 0 replies; 13+ messages in thread
From: tip-bot for Wanpeng Li @ 2016-06-14 11:26 UTC (permalink / raw)
To: linux-tip-commits
Cc: fweisbec, peterz, riel, linux-kernel, pbonzini, torvalds, hpa,
tglx, mingo, wanpeng.li, rkrcmar, efault
Commit-ID: 3d89e5478bf550a50c99e93adf659369798263b0
Gitweb: http://git.kernel.org/tip/3d89e5478bf550a50c99e93adf659369798263b0
Author: Wanpeng Li <wanpeng.li@hotmail.com>
AuthorDate: Mon, 13 Jun 2016 18:32:45 +0800
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 14 Jun 2016 11:13:15 +0200
sched/cputime: Fix prev steal time accouting during CPU hotplug
Commit:
e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")
... set rq->prev_* to 0 after a CPU hotplug comes back, in order to
fix the case where (after CPU hotplug) steal time is smaller than
rq->prev_steal_time.
However, this should never happen. Steal time was only smaller because of the
KVM-specific bug fixed by the previous patch. Worse, the previous patch
triggers a bug on CPU hot-unplug/plug operation: because
rq->prev_steal_time is cleared, all of the CPU's past steal time will be
accounted again on hot-plug.
Since the root cause has been fixed, we can just revert commit e9532e69b8d1.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")'
Link: http://lkml.kernel.org/r/1465813966-3116-3-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/core.c | 1 -
kernel/sched/sched.h | 13 -------------
2 files changed, 14 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 13d0896..c1b537b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7227,7 +7227,6 @@ static void sched_rq_cpu_starting(unsigned int cpu)
struct rq *rq = cpu_rq(cpu);
rq->calc_load_update = calc_load_update;
- account_reset_rq(rq);
update_max_interval();
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 72f1f30..de607e4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1809,16 +1809,3 @@ static inline void cpufreq_trigger_update(u64 time) {}
#else /* arch_scale_freq_capacity */
#define arch_scale_freq_invariant() (false)
#endif
-
-static inline void account_reset_rq(struct rq *rq)
-{
-#ifdef CONFIG_IRQ_TIME_ACCOUNTING
- rq->prev_irq_time = 0;
-#endif
-#ifdef CONFIG_PARAVIRT
- rq->prev_steal_time = 0;
-#endif
-#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
- rq->prev_steal_time_rq = 0;
-#endif
-}
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v6 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting
2016-06-13 10:32 [PATCH v6 0/3] Sched, KVM: st: Add steal time support to full dynticks CPU time accounting Wanpeng Li
2016-06-13 10:32 ` [PATCH v6 1/3] KVM: fix steal clock warp during guest cpu hotplug Wanpeng Li
2016-06-13 10:32 ` [PATCH v6 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug Wanpeng Li
@ 2016-06-13 10:32 ` Wanpeng Li
2016-06-13 10:44 ` Paolo Bonzini
2016-06-14 11:27 ` [tip:sched/core] " tip-bot for Wanpeng Li
2016-06-13 11:28 ` [PATCH v6 0/3] Sched, KVM: st: " Wanpeng Li
3 siblings, 2 replies; 13+ messages in thread
From: Wanpeng Li @ 2016-06-13 10:32 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra (Intel),
Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
Paolo Bonzini, Radim Krčmář
From: Wanpeng Li <wanpeng.li@hotmail.com>
This patch adds guest steal-time support to full dynticks CPU
time accounting. After the following commit:
ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")
... time sampling became jiffy based, even if it's still listening to ring
boundaries, so steal_account_process_tick() is reused to account how many
'ticks' are stolen-time, after the last accumulation.
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
kernel/sched/cputime.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 75f98c5..3d60e5d 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime)
cpustat[CPUTIME_IDLE] += (__force u64) cputime;
}
-static __always_inline bool steal_account_process_tick(void)
+static __always_inline unsigned long steal_account_process_tick(unsigned long max_jiffies)
{
#ifdef CONFIG_PARAVIRT
if (static_key_false(¶virt_steal_enabled)) {
@@ -272,14 +272,14 @@ static __always_inline bool steal_account_process_tick(void)
* time in jiffies. Lets cast the result to jiffies
* granularity and account the rest on the next rounds.
*/
- steal_jiffies = nsecs_to_jiffies(steal);
+ steal_jiffies = min(nsecs_to_jiffies(steal), max_jiffies);
this_rq()->prev_steal_time += jiffies_to_nsecs(steal_jiffies);
account_steal_time(jiffies_to_cputime(steal_jiffies));
return steal_jiffies;
}
#endif
- return false;
+ return 0;
}
/*
@@ -346,7 +346,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
u64 cputime = (__force u64) cputime_one_jiffy;
u64 *cpustat = kcpustat_this_cpu->cpustat;
- if (steal_account_process_tick())
+ if (steal_account_process_tick(ULONG_MAX))
return;
cputime *= ticks;
@@ -477,7 +477,7 @@ void account_process_tick(struct task_struct *p, int user_tick)
return;
}
- if (steal_account_process_tick())
+ if (steal_account_process_tick(ULONG_MAX))
return;
if (user_tick)
@@ -681,12 +681,14 @@ static cputime_t vtime_delta(struct task_struct *tsk)
static cputime_t get_vtime_delta(struct task_struct *tsk)
{
unsigned long now = READ_ONCE(jiffies);
- unsigned long delta = now - tsk->vtime_snap;
+ unsigned long delta_jiffies, steal_jiffies;
+ delta_jiffies = now - tsk->vtime_snap;
+ steal_jiffies = steal_account_process_tick(delta_jiffies);
WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
tsk->vtime_snap = now;
- return jiffies_to_cputime(delta);
+ return jiffies_to_cputime(delta_jiffies - steal_jiffies);
}
static void __vtime_account_system(struct task_struct *tsk)
--
1.9.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v6 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting
2016-06-13 10:32 ` [PATCH v6 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Wanpeng Li
@ 2016-06-13 10:44 ` Paolo Bonzini
2016-06-14 11:27 ` [tip:sched/core] " tip-bot for Wanpeng Li
1 sibling, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2016-06-13 10:44 UTC (permalink / raw)
To: Wanpeng Li, linux-kernel, kvm
Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra (Intel),
Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
Radim Krčmář
On 13/06/2016 12:32, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> This patch adds guest steal-time support to full dynticks CPU
> time accounting. After the following commit:
>
> ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")
>
> ... time sampling became jiffy based, even if it's still listening to ring
> boundaries, so steal_account_process_tick() is reused to account how many
> 'ticks' are stolen-time, after the last accumulation.
>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> kernel/sched/cputime.c | 16 +++++++++-------
> 1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 75f98c5..3d60e5d 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime)
> cpustat[CPUTIME_IDLE] += (__force u64) cputime;
> }
>
> -static __always_inline bool steal_account_process_tick(void)
> +static __always_inline unsigned long steal_account_process_tick(unsigned long max_jiffies)
> {
> #ifdef CONFIG_PARAVIRT
> if (static_key_false(¶virt_steal_enabled)) {
> @@ -272,14 +272,14 @@ static __always_inline bool steal_account_process_tick(void)
> * time in jiffies. Lets cast the result to jiffies
> * granularity and account the rest on the next rounds.
> */
> - steal_jiffies = nsecs_to_jiffies(steal);
> + steal_jiffies = min(nsecs_to_jiffies(steal), max_jiffies);
> this_rq()->prev_steal_time += jiffies_to_nsecs(steal_jiffies);
>
> account_steal_time(jiffies_to_cputime(steal_jiffies));
> return steal_jiffies;
> }
> #endif
> - return false;
> + return 0;
> }
>
> /*
> @@ -346,7 +346,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
> u64 cputime = (__force u64) cputime_one_jiffy;
> u64 *cpustat = kcpustat_this_cpu->cpustat;
>
> - if (steal_account_process_tick())
> + if (steal_account_process_tick(ULONG_MAX))
> return;
>
> cputime *= ticks;
> @@ -477,7 +477,7 @@ void account_process_tick(struct task_struct *p, int user_tick)
> return;
> }
>
> - if (steal_account_process_tick())
> + if (steal_account_process_tick(ULONG_MAX))
> return;
>
> if (user_tick)
> @@ -681,12 +681,14 @@ static cputime_t vtime_delta(struct task_struct *tsk)
> static cputime_t get_vtime_delta(struct task_struct *tsk)
> {
> unsigned long now = READ_ONCE(jiffies);
> - unsigned long delta = now - tsk->vtime_snap;
> + unsigned long delta_jiffies, steal_jiffies;
>
> + delta_jiffies = now - tsk->vtime_snap;
> + steal_jiffies = steal_account_process_tick(delta_jiffies);
> WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
> tsk->vtime_snap = now;
>
> - return jiffies_to_cputime(delta);
> + return jiffies_to_cputime(delta_jiffies - steal_jiffies);
> }
>
> static void __vtime_account_system(struct task_struct *tsk)
>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [tip:sched/core] sched/cputime: Add steal time support to full dynticks CPU time accounting
2016-06-13 10:32 ` [PATCH v6 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Wanpeng Li
2016-06-13 10:44 ` Paolo Bonzini
@ 2016-06-14 11:27 ` tip-bot for Wanpeng Li
1 sibling, 0 replies; 13+ messages in thread
From: tip-bot for Wanpeng Li @ 2016-06-14 11:27 UTC (permalink / raw)
To: linux-tip-commits
Cc: efault, wanpeng.li, riel, tglx, rkrcmar, pbonzini, torvalds,
fweisbec, peterz, mingo, linux-kernel, hpa
Commit-ID: 807e5b80687c06715d62df51a5473b231e3e8b15
Gitweb: http://git.kernel.org/tip/807e5b80687c06715d62df51a5473b231e3e8b15
Author: Wanpeng Li <wanpeng.li@hotmail.com>
AuthorDate: Mon, 13 Jun 2016 18:32:46 +0800
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 14 Jun 2016 11:13:16 +0200
sched/cputime: Add steal time support to full dynticks CPU time accounting
This patch adds guest steal-time support to full dynticks CPU
time accounting. After the following commit:
ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")
... time sampling became jiffy based, even if we do the sampling from the
context tracking code, so steal_account_process_tick() can be reused
to account how many 'ticks' are stolen-time, after the last accumulation.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1465813966-3116-4-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/cputime.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 75f98c5..3d60e5d 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime)
cpustat[CPUTIME_IDLE] += (__force u64) cputime;
}
-static __always_inline bool steal_account_process_tick(void)
+static __always_inline unsigned long steal_account_process_tick(unsigned long max_jiffies)
{
#ifdef CONFIG_PARAVIRT
if (static_key_false(¶virt_steal_enabled)) {
@@ -272,14 +272,14 @@ static __always_inline bool steal_account_process_tick(void)
* time in jiffies. Lets cast the result to jiffies
* granularity and account the rest on the next rounds.
*/
- steal_jiffies = nsecs_to_jiffies(steal);
+ steal_jiffies = min(nsecs_to_jiffies(steal), max_jiffies);
this_rq()->prev_steal_time += jiffies_to_nsecs(steal_jiffies);
account_steal_time(jiffies_to_cputime(steal_jiffies));
return steal_jiffies;
}
#endif
- return false;
+ return 0;
}
/*
@@ -346,7 +346,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
u64 cputime = (__force u64) cputime_one_jiffy;
u64 *cpustat = kcpustat_this_cpu->cpustat;
- if (steal_account_process_tick())
+ if (steal_account_process_tick(ULONG_MAX))
return;
cputime *= ticks;
@@ -477,7 +477,7 @@ void account_process_tick(struct task_struct *p, int user_tick)
return;
}
- if (steal_account_process_tick())
+ if (steal_account_process_tick(ULONG_MAX))
return;
if (user_tick)
@@ -681,12 +681,14 @@ static cputime_t vtime_delta(struct task_struct *tsk)
static cputime_t get_vtime_delta(struct task_struct *tsk)
{
unsigned long now = READ_ONCE(jiffies);
- unsigned long delta = now - tsk->vtime_snap;
+ unsigned long delta_jiffies, steal_jiffies;
+ delta_jiffies = now - tsk->vtime_snap;
+ steal_jiffies = steal_account_process_tick(delta_jiffies);
WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
tsk->vtime_snap = now;
- return jiffies_to_cputime(delta);
+ return jiffies_to_cputime(delta_jiffies - steal_jiffies);
}
static void __vtime_account_system(struct task_struct *tsk)
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v6 0/3] Sched, KVM: st: Add steal time support to full dynticks CPU time accounting
2016-06-13 10:32 [PATCH v6 0/3] Sched, KVM: st: Add steal time support to full dynticks CPU time accounting Wanpeng Li
` (2 preceding siblings ...)
2016-06-13 10:32 ` [PATCH v6 3/3] sched/cputime: Add steal time support to full dynticks CPU time accounting Wanpeng Li
@ 2016-06-13 11:28 ` Wanpeng Li
3 siblings, 0 replies; 13+ messages in thread
From: Wanpeng Li @ 2016-06-13 11:28 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Ingo Molnar, Peter Zijlstra (Intel),
Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
Paolo Bonzini, Radim Krčmář,
Wanpeng Li, John Stultz
Cc maintainers/reviewers,
2016-06-13 18:32 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> Periodic/NOHZ idle which don't use vtime have logic account steal time,
> however, vtime(depends on context tracking) which is just used in full
> dynticks doesn't account steal time, this patchset adds the steal time
> acccount support in vtime which will be used in full dynticks guest.
>
> Patch 1 and patch 2 fix steal clock warp and prev steal time account
> during cpu hotplug bugs.
> Patch 3 adds the steal time support to full dynticks CPU time accounting.
>
> N.B. This version of patchset drops previous Acked-by and Reviewed-by since
> they are different from earlier version. :)
>
> v5 -> v6:
> * improve commit message of patch 2/3, 3/3
> * fix account st twice
> v4 -> v5:
> * improve commit message of patch 1/3
> * revert commit e9532e69b8d1
> * apply same logic to account_idle_time, so change get_vtime_delta instead
> v3 -> v4:
> * fix grammar errors, thanks Ingo
> * cleanup fragile codes, thanks Ingo
> v2 -> v3:
> * fix the root cause
> * convert steal time jiffies to cputime
> v1 -> v2:
> * update patch subject, description and comments
> * deal with the case where steal time suddenly increases by a ludicrous amount
> * fix divide zero bug, thanks Rik
>
> Wanpeng Li (3):
> KVM: fix steal clock warp during guest cpu hotplug
> sched/cputime: Fix prev steal time accouting during cpu hotplug
> sched/cputime: Add steal time support to full dynticks CPU time
> accounting
>
> arch/x86/kernel/kvm.c | 2 --
> kernel/sched/core.c | 1 -
> kernel/sched/cputime.c | 16 +++++++++-------
> kernel/sched/sched.h | 13 -------------
> 4 files changed, 9 insertions(+), 23 deletions(-)
>
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 13+ messages in thread