linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug
@ 2016-06-02 11:57 Wanpeng Li
  2016-06-02 12:00 ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Wanpeng Li @ 2016-06-02 11:57 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Wanpeng Li, Ingo Molnar, Peter Zijlstra (Intel),
	Rik van Riel, Thomas Gleixner, Frederic Weisbecker,
	Paolo Bonzini, Radim

From: Wanpeng Li <wanpeng.li@hotmail.com>

I observed that sometimes st is 100% instantaneous, then idle is 100% 
even if there is a cpu hog on the guest cpu after the cpu hotplug comes 
back(N.B. both guest and host are latest 4.7-rc1, this can not always 
be readily reproduced). I add trace to capture it as below:

cpuhp/1-12    [001] d.h1   167.461657: account_process_tick: steal = 1291385514, prev_steal_time = 0         
cpuhp/1-12    [001] d.h1   167.461659: account_process_tick: steal_jiffies = 1291          
<idle>-0     [001] d.h1   167.462663: account_process_tick: steal = 18732255, prev_steal_time = 1291000000          
<idle>-0     [001] d.h1   167.462664: account_process_tick: steal_jiffies = 18446744072437

The steal clock warps and then steal_jiffies overflow, this patch align 
prev_steal_time to the new steal clock timestamp, in order to avoid 
overflow and st stuff can continue to work.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 kernel/sched/cputime.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 75f98c5..d0eebc3 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -265,7 +265,13 @@ static __always_inline bool steal_account_process_tick(void)
 		unsigned long steal_jiffies;
 
 		steal = paravirt_steal_clock(smp_processor_id());
-		steal -= this_rq()->prev_steal_time;
+		if (likely(steal > this_rq()->prev_steal_time))
+			steal -= this_rq()->prev_steal_time;
+		else {
+			/* steal clock warp */
+			this_rq()->prev_steal_time = steal;
+			return false;
+		}
 
 		/*
 		 * steal is in nsecs but our caller is expecting steal
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug
  2016-06-02 11:57 [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug Wanpeng Li
@ 2016-06-02 12:00 ` Peter Zijlstra
  2016-06-02 13:59   ` Rik van Riel
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2016-06-02 12:00 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Wanpeng Li, Ingo Molnar, Rik van Riel,
	Thomas Gleixner, Frederic Weisbecker, Paolo Bonzini, Radim

On Thu, Jun 02, 2016 at 07:57:19PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> I observed that sometimes st is 100% instantaneous, then idle is 100% 
> even if there is a cpu hog on the guest cpu after the cpu hotplug comes 
> back(N.B. both guest and host are latest 4.7-rc1, this can not always 
> be readily reproduced). I add trace to capture it as below:
> 
> cpuhp/1-12    [001] d.h1   167.461657: account_process_tick: steal = 1291385514, prev_steal_time = 0         
> cpuhp/1-12    [001] d.h1   167.461659: account_process_tick: steal_jiffies = 1291          
> <idle>-0     [001] d.h1   167.462663: account_process_tick: steal = 18732255, prev_steal_time = 1291000000          
> <idle>-0     [001] d.h1   167.462664: account_process_tick: steal_jiffies = 18446744072437
> 
> The steal clock warps and then steal_jiffies overflow, this patch align 
> prev_steal_time to the new steal clock timestamp, in order to avoid 
> overflow and st stuff can continue to work.

I would rather suggest fixing the steal clock thing to not jump like
that; is that at all possible?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug
  2016-06-02 12:00 ` Peter Zijlstra
@ 2016-06-02 13:59   ` Rik van Riel
  2016-06-03  5:34     ` Wanpeng Li
  2016-06-06 13:40     ` Paolo Bonzini
  0 siblings, 2 replies; 9+ messages in thread
From: Rik van Riel @ 2016-06-02 13:59 UTC (permalink / raw)
  To: Peter Zijlstra, Wanpeng Li
  Cc: linux-kernel, kvm, Wanpeng Li, Ingo Molnar, Thomas Gleixner,
	Frederic Weisbecker, Paolo Bonzini, Radim

[-- Attachment #1: Type: text/plain, Size: 2075 bytes --]

On Thu, 2016-06-02 at 14:00 +0200, Peter Zijlstra wrote:
> On Thu, Jun 02, 2016 at 07:57:19PM +0800, Wanpeng Li wrote:
> > 
> > From: Wanpeng Li <wanpeng.li@hotmail.com>
> > 
> > I observed that sometimes st is 100% instantaneous, then idle is
> > 100% 
> > even if there is a cpu hog on the guest cpu after the cpu hotplug
> > comes 
> > back(N.B. both guest and host are latest 4.7-rc1, this can not
> > always 
> > be readily reproduced). I add trace to capture it as below:
> > 
> > cpuhp/1-12    [001] d.h1   167.461657: account_process_tick: steal
> > = 1291385514, prev_steal_time = 0         
> > cpuhp/1-12    [001] d.h1   167.461659: account_process_tick:
> > steal_jiffies = 1291          
> > <idle>-0     [001] d.h1   167.462663: account_process_tick: steal =
> > 18732255, prev_steal_time = 1291000000          
> > <idle>-0     [001] d.h1   167.462664: account_process_tick:
> > steal_jiffies = 18446744072437
> > 
> > The steal clock warps and then steal_jiffies overflow, this patch
> > align 
> > prev_steal_time to the new steal clock timestamp, in order to
> > avoid 
> > overflow and st stuff can continue to work.
> I would rather suggest fixing the steal clock thing to not jump like
> that; is that at all possible?

Not always possible, I suspect.

If a guest is saved to disk and later restored (eg. after
a host reboot), or live migrated to another host, I would
expect to get totally disjoint steal time statistics from
the "new run" of the guest (which is the same run of the
guest OS).

In fact, this code may also need to deal with the case
where steal time suddenly increases by a ludicrous amount,
and ignore those events, too.

A safe threshold might be to only apply steal times that
are positive and smaller than one second (as long as nohz_full
has the one second timer tick left), ignoring intervals that
are negative or longer than a second, and using those to sync
up the guest with the host.

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug
  2016-06-02 13:59   ` Rik van Riel
@ 2016-06-03  5:34     ` Wanpeng Li
  2016-06-06 13:40     ` Paolo Bonzini
  1 sibling, 0 replies; 9+ messages in thread
From: Wanpeng Li @ 2016-06-03  5:34 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Peter Zijlstra, linux-kernel, kvm, Wanpeng Li, Ingo Molnar,
	Thomas Gleixner, Frederic Weisbecker, Paolo Bonzini, Radim

2016-06-02 21:59 GMT+08:00 Rik van Riel <riel@redhat.com>:
> On Thu, 2016-06-02 at 14:00 +0200, Peter Zijlstra wrote:
>> On Thu, Jun 02, 2016 at 07:57:19PM +0800, Wanpeng Li wrote:
>> >
>> > From: Wanpeng Li <wanpeng.li@hotmail.com>
>> >
>> > I observed that sometimes st is 100% instantaneous, then idle is
>> > 100%
>> > even if there is a cpu hog on the guest cpu after the cpu hotplug
>> > comes
>> > back(N.B. both guest and host are latest 4.7-rc1, this can not
>> > always
>> > be readily reproduced). I add trace to capture it as below:
>> >
>> > cpuhp/1-12    [001] d.h1   167.461657: account_process_tick: steal
>> > = 1291385514, prev_steal_time = 0
>> > cpuhp/1-12    [001] d.h1   167.461659: account_process_tick:
>> > steal_jiffies = 1291
>> > <idle>-0     [001] d.h1   167.462663: account_process_tick: steal =
>> > 18732255, prev_steal_time = 1291000000
>> > <idle>-0     [001] d.h1   167.462664: account_process_tick:
>> > steal_jiffies = 18446744072437
>> >
>> > The steal clock warps and then steal_jiffies overflow, this patch
>> > align
>> > prev_steal_time to the new steal clock timestamp, in order to
>> > avoid
>> > overflow and st stuff can continue to work.
>> I would rather suggest fixing the steal clock thing to not jump like
>> that; is that at all possible?
>
> Not always possible, I suspect.
>
> If a guest is saved to disk and later restored (eg. after
> a host reboot), or live migrated to another host, I would
> expect to get totally disjoint steal time statistics from
> the "new run" of the guest (which is the same run of the
> guest OS).
>
> In fact, this code may also need to deal with the case
> where steal time suddenly increases by a ludicrous amount,
> and ignore those events, too.
>
> A safe threshold might be to only apply steal times that
> are positive and smaller than one second (as long as nohz_full
> has the one second timer tick left), ignoring intervals that
> are negative or longer than a second, and using those to sync
> up the guest with the host.

Good point, thanks for your review, Rik. :) Just send out v2 to do it.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug
  2016-06-02 13:59   ` Rik van Riel
  2016-06-03  5:34     ` Wanpeng Li
@ 2016-06-06 13:40     ` Paolo Bonzini
  2016-06-06 22:42       ` Wanpeng Li
  2016-06-07  1:24       ` Rik van Riel
  1 sibling, 2 replies; 9+ messages in thread
From: Paolo Bonzini @ 2016-06-06 13:40 UTC (permalink / raw)
  To: Rik van Riel, Peter Zijlstra, Wanpeng Li
  Cc: linux-kernel, kvm, Wanpeng Li, Ingo Molnar, Thomas Gleixner,
	Frederic Weisbecker, Radim



On 02/06/2016 15:59, Rik van Riel wrote:
> If a guest is saved to disk and later restored (eg. after
> a host reboot), or live migrated to another host, I would
> expect to get totally disjoint steal time statistics from
> the "new run" of the guest (which is the same run of the
> guest OS).

Why?  The preexisting guest steal time is always added to by
KVM, so the time won't restart from zero.

Continuing the previous count on CPU hot-unplug followed by hot-plug
is less obvious, but I think it's overall the right thing to do.

In fact, I was going to test a patch this week as simple as this:

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index eea2a6f72b31..1ef5e48b3a36 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
 	if (!has_steal_clock)
 		return;
 
-	memset(st, 0, sizeof(*st));
-
 	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
 	pr_info("kvm-stealtime: cpu %d, msr %llx\n",
 		cpu, (unsigned long long) slow_virt_to_phys(st));


Thanks,

Paolo

> In fact, this code may also need to deal with the case
> where steal time suddenly increases by a ludicrous amount,
> and ignore those events, too.
> 
> A safe threshold might be to only apply steal times that
> are positive and smaller than one second (as long as nohz_full
> has the one second timer tick left), ignoring intervals that
> are negative or longer than a second, and using those to sync
> up the guest with the host.

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug
  2016-06-06 13:40     ` Paolo Bonzini
@ 2016-06-06 22:42       ` Wanpeng Li
  2016-06-07  1:24       ` Rik van Riel
  1 sibling, 0 replies; 9+ messages in thread
From: Wanpeng Li @ 2016-06-06 22:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rik van Riel, Peter Zijlstra, linux-kernel, kvm, Wanpeng Li,
	Ingo Molnar, Thomas Gleixner, Frederic Weisbecker, Radim

2016-06-06 21:40 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 02/06/2016 15:59, Rik van Riel wrote:
>> If a guest is saved to disk and later restored (eg. after
>> a host reboot), or live migrated to another host, I would
>> expect to get totally disjoint steal time statistics from
>> the "new run" of the guest (which is the same run of the
>> guest OS).
>
> Why?  The preexisting guest steal time is always added to by
> KVM, so the time won't restart from zero.
>
> Continuing the previous count on CPU hot-unplug followed by hot-plug
> is less obvious, but I think it's overall the right thing to do.
>
> In fact, I was going to test a patch this week as simple as this:
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index eea2a6f72b31..1ef5e48b3a36 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
>         if (!has_steal_clock)
>                 return;
>
> -       memset(st, 0, sizeof(*st));
> -
>         wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
>         pr_info("kvm-stealtime: cpu %d, msr %llx\n",
>                 cpu, (unsigned long long) slow_virt_to_phys(st));

Thanks for the suggestion, I will try it today.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug
  2016-06-06 13:40     ` Paolo Bonzini
  2016-06-06 22:42       ` Wanpeng Li
@ 2016-06-07  1:24       ` Rik van Riel
  2016-06-07  7:31         ` Paolo Bonzini
  1 sibling, 1 reply; 9+ messages in thread
From: Rik van Riel @ 2016-06-07  1:24 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Zijlstra, Wanpeng Li
  Cc: linux-kernel, kvm, Wanpeng Li, Ingo Molnar, Thomas Gleixner,
	Frederic Weisbecker, Radim

[-- Attachment #1: Type: text/plain, Size: 1654 bytes --]

On Mon, 2016-06-06 at 15:40 +0200, Paolo Bonzini wrote:
> 
> On 02/06/2016 15:59, Rik van Riel wrote:
> > 
> > If a guest is saved to disk and later restored (eg. after
> > a host reboot), or live migrated to another host, I would
> > expect to get totally disjoint steal time statistics from
> > the "new run" of the guest (which is the same run of the
> > guest OS).
> Why?  The preexisting guest steal time is always added to by
> KVM, so the time won't restart from zero.
> 
> Continuing the previous count on CPU hot-unplug followed by hot-plug
> is less obvious, but I think it's overall the right thing to do.
> 
> In fact, I was going to test a patch this week as simple as this:
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index eea2a6f72b31..1ef5e48b3a36 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
>  	if (!has_steal_clock)
>  		return;
>  
> -	memset(st, 0, sizeof(*st));
> -
>  	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) |
> KVM_MSR_ENABLED));

By removing the memset from initial bootup allocation,
can't that cause the steal time to "increase by a ludicrous
amount" the very first time it is compared with the arch
independent value in the scheduler code?

In other words, would removal of the memset result in still
requiring Wanpeng's patch?

What am I overlooking?

Is there something preventing a non-zero value right at
the beginning?

Also, is there a chance of ending up with a non-zero bit
in the seqcount if the memset is removed?

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug
  2016-06-07  1:24       ` Rik van Riel
@ 2016-06-07  7:31         ` Paolo Bonzini
  2016-06-07  7:35           ` Wanpeng Li
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2016-06-07  7:31 UTC (permalink / raw)
  To: Rik van Riel, Peter Zijlstra, Wanpeng Li
  Cc: linux-kernel, kvm, Wanpeng Li, Ingo Molnar, Thomas Gleixner,
	Frederic Weisbecker, Radim



On 07/06/2016 03:24, Rik van Riel wrote:
> On Mon, 2016-06-06 at 15:40 +0200, Paolo Bonzini wrote:
>>
>> On 02/06/2016 15:59, Rik van Riel wrote:
>>>
>>> If a guest is saved to disk and later restored (eg. after
>>> a host reboot), or live migrated to another host, I would
>>> expect to get totally disjoint steal time statistics from
>>> the "new run" of the guest (which is the same run of the
>>> guest OS).
>> Why?  The preexisting guest steal time is always added to by
>> KVM, so the time won't restart from zero.
>>
>> Continuing the previous count on CPU hot-unplug followed by hot-plug
>> is less obvious, but I think it's overall the right thing to do.
>>
>> In fact, I was going to test a patch this week as simple as this:
>>
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index eea2a6f72b31..1ef5e48b3a36 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
>>  	if (!has_steal_clock)
>>  		return;
>>  
>> -	memset(st, 0, sizeof(*st));
>> -
>>  	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) |
>> KVM_MSR_ENABLED));
> 
> By removing the memset from initial bootup allocation,
> can't that cause the steal time to "increase by a ludicrous
> amount" the very first time it is compared with the arch
> independent value in the scheduler code?
> 
> In other words, would removal of the memset result in still
> requiring Wanpeng's patch?

The percpu area is initialized to zero, isn't it?

Paolo

> What am I overlooking?
> 
> Is there something preventing a non-zero value right at
> the beginning?
> 
> Also, is there a chance of ending up with a non-zero bit
> in the seqcount if the memset is removed?
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug
  2016-06-07  7:31         ` Paolo Bonzini
@ 2016-06-07  7:35           ` Wanpeng Li
  0 siblings, 0 replies; 9+ messages in thread
From: Wanpeng Li @ 2016-06-07  7:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rik van Riel, Peter Zijlstra, linux-kernel, kvm, Wanpeng Li,
	Ingo Molnar, Thomas Gleixner, Frederic Weisbecker, Radim

2016-06-07 15:31 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 07/06/2016 03:24, Rik van Riel wrote:
>> On Mon, 2016-06-06 at 15:40 +0200, Paolo Bonzini wrote:
>>>
>>> On 02/06/2016 15:59, Rik van Riel wrote:
>>>>
>>>> If a guest is saved to disk and later restored (eg. after
>>>> a host reboot), or live migrated to another host, I would
>>>> expect to get totally disjoint steal time statistics from
>>>> the "new run" of the guest (which is the same run of the
>>>> guest OS).
>>> Why?  The preexisting guest steal time is always added to by
>>> KVM, so the time won't restart from zero.
>>>
>>> Continuing the previous count on CPU hot-unplug followed by hot-plug
>>> is less obvious, but I think it's overall the right thing to do.
>>>
>>> In fact, I was going to test a patch this week as simple as this:
>>>
>>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>>> index eea2a6f72b31..1ef5e48b3a36 100644
>>> --- a/arch/x86/kernel/kvm.c
>>> +++ b/arch/x86/kernel/kvm.c
>>> @@ -301,8 +301,6 @@ static void kvm_register_steal_time(void)
>>>      if (!has_steal_clock)
>>>              return;
>>>
>>> -    memset(st, 0, sizeof(*st));
>>> -
>>>      wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) |
>>> KVM_MSR_ENABLED));
>>
>> By removing the memset from initial bootup allocation,
>> can't that cause the steal time to "increase by a ludicrous
>> amount" the very first time it is compared with the arch
>> independent value in the scheduler code?
>>
>> In other words, would removal of the memset result in still
>> requiring Wanpeng's patch?
>
> The percpu area is initialized to zero, isn't it?

Your proposal can fix the steal clock warp during guest cpu hotplug, I
will send out a new version later.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-06-07  7:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-02 11:57 [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug Wanpeng Li
2016-06-02 12:00 ` Peter Zijlstra
2016-06-02 13:59   ` Rik van Riel
2016-06-03  5:34     ` Wanpeng Li
2016-06-06 13:40     ` Paolo Bonzini
2016-06-06 22:42       ` Wanpeng Li
2016-06-07  1:24       ` Rik van Riel
2016-06-07  7:31         ` Paolo Bonzini
2016-06-07  7:35           ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).