linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: vmx: fix underflow in TSC deadline calculation
@ 2016-06-27 13:11 Paolo Bonzini
  2016-06-28  6:15 ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-06-27 13:11 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: wanpeng.li, Yunhong Jiang

If the TSC deadline timer is programmed really close to the deadline or
even in the past, the computation in vmx_set_hv_timer can underflow and
cause delta_tsc to be set to a huge value.  This generally results
in vmx_set_hv_timer returning -ERANGE, but we can fix it by limiting
delta_tsc to be positive or zero.

Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c1d655c10fd2..85e2f0a882ca 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10829,9 +10829,9 @@ static inline int u64_shl_div_u64(u64 a, unsigned int shift,
 static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u64 tscl = rdtsc(), delta_tsc;
-
-	delta_tsc = guest_deadline_tsc - kvm_read_l1_tsc(vcpu, tscl);
+	u64 tscl = rdtsc();
+	u64 guest_tscl = kvm_read_l1_tsc(vcpu, tscl);
+	u64 delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl;
 
 	/* Convert to host delta tsc if tsc scaling is enabled */
 	if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: vmx: fix underflow in TSC deadline calculation
  2016-06-27 13:11 [PATCH] KVM: vmx: fix underflow in TSC deadline calculation Paolo Bonzini
@ 2016-06-28  6:15 ` Wanpeng Li
  2016-06-28  8:43   ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2016-06-28  6:15 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Wanpeng Li, Yunhong Jiang

2016-06-27 21:11 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> If the TSC deadline timer is programmed really close to the deadline or
> even in the past, the computation in vmx_set_hv_timer can underflow and
> cause delta_tsc to be set to a huge value.  This generally results
> in vmx_set_hv_timer returning -ERANGE, but we can fix it by limiting
> delta_tsc to be positive or zero.
>
> Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c1d655c10fd2..85e2f0a882ca 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10829,9 +10829,9 @@ static inline int u64_shl_div_u64(u64 a, unsigned int shift,
>  static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
>  {
>         struct vcpu_vmx *vmx = to_vmx(vcpu);
> -       u64 tscl = rdtsc(), delta_tsc;
> -
> -       delta_tsc = guest_deadline_tsc - kvm_read_l1_tsc(vcpu, tscl);
> +       u64 tscl = rdtsc();
> +       u64 guest_tscl = kvm_read_l1_tsc(vcpu, tscl);
> +       u64 delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl;
>
>         /* Convert to host delta tsc if tsc scaling is enabled */
>         if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&

This patch still can't fix the bug after my testing. I have a patch on
hand and will send out soon.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: vmx: fix underflow in TSC deadline calculation
  2016-06-28  6:15 ` Wanpeng Li
@ 2016-06-28  8:43   ` Paolo Bonzini
  2016-06-28  8:46     ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-06-28  8:43 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Wanpeng Li, Yunhong Jiang



On 28/06/2016 08:15, Wanpeng Li wrote:
> 2016-06-27 21:11 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>> If the TSC deadline timer is programmed really close to the deadline or
>> even in the past, the computation in vmx_set_hv_timer can underflow and
>> cause delta_tsc to be set to a huge value.  This generally results
>> in vmx_set_hv_timer returning -ERANGE, but we can fix it by limiting
>> delta_tsc to be positive or zero.
>>
>> Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  arch/x86/kvm/vmx.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index c1d655c10fd2..85e2f0a882ca 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -10829,9 +10829,9 @@ static inline int u64_shl_div_u64(u64 a, unsigned int shift,
>>  static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
>>  {
>>         struct vcpu_vmx *vmx = to_vmx(vcpu);
>> -       u64 tscl = rdtsc(), delta_tsc;
>> -
>> -       delta_tsc = guest_deadline_tsc - kvm_read_l1_tsc(vcpu, tscl);
>> +       u64 tscl = rdtsc();
>> +       u64 guest_tscl = kvm_read_l1_tsc(vcpu, tscl);
>> +       u64 delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl;
>>
>>         /* Convert to host delta tsc if tsc scaling is enabled */
>>         if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
> 
> This patch still can't fix the bug after my testing. I have a patch on
> hand and will send out soon.

Nice!  Do you think we need both patches?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: vmx: fix underflow in TSC deadline calculation
  2016-06-28  8:43   ` Paolo Bonzini
@ 2016-06-28  8:46     ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-06-28  8:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Wanpeng Li, Yunhong Jiang

2016-06-28 16:43 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 28/06/2016 08:15, Wanpeng Li wrote:
>> 2016-06-27 21:11 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>> If the TSC deadline timer is programmed really close to the deadline or
>>> even in the past, the computation in vmx_set_hv_timer can underflow and
>>> cause delta_tsc to be set to a huge value.  This generally results
>>> in vmx_set_hv_timer returning -ERANGE, but we can fix it by limiting
>>> delta_tsc to be positive or zero.
>>>
>>> Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  arch/x86/kvm/vmx.c | 6 +++---
>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index c1d655c10fd2..85e2f0a882ca 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -10829,9 +10829,9 @@ static inline int u64_shl_div_u64(u64 a, unsigned int shift,
>>>  static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
>>>  {
>>>         struct vcpu_vmx *vmx = to_vmx(vcpu);
>>> -       u64 tscl = rdtsc(), delta_tsc;
>>> -
>>> -       delta_tsc = guest_deadline_tsc - kvm_read_l1_tsc(vcpu, tscl);
>>> +       u64 tscl = rdtsc();
>>> +       u64 guest_tscl = kvm_read_l1_tsc(vcpu, tscl);
>>> +       u64 delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl;
>>>
>>>         /* Convert to host delta tsc if tsc scaling is enabled */
>>>         if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
>>
>> This patch still can't fix the bug after my testing. I have a patch on
>> hand and will send out soon.
>
> Nice!  Do you think we need both patches?

Yeah, we can keep them separately. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: vmx: fix underflow in TSC deadline calculation
  2016-06-28 22:55         ` Wanpeng Li
@ 2016-06-29  0:39           ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-06-29  0:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: yunhong jiang, linux-kernel, kvm, Wanpeng Li,
	Radim Krčmář,
	Yunhong Jiang

2016-06-29 6:55 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> 2016-06-29 4:07 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>
>>
>> On 28/06/2016 20:34, yunhong jiang wrote:
>>> Paolo, thanks for reply.
>>>
>>> Which race window you are talking about? Is it the
>>> kvm_lapic_switch_to_hv_timer()? If yes, hope we will not forgot it once the
>>> lapic timer is not pinned anymore in future.
>>
>> Yes, it's that one.  This is a good point against using
>> local_irq_save/restore.
>>
>> Paolo
>
> How about something like below(untested, just want to know if this is
> what you mean):

It can also fix the bug after my testing, I can send out a formal one
if it looks good to you.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: vmx: fix underflow in TSC deadline calculation
  2016-06-28 20:07       ` Paolo Bonzini
@ 2016-06-28 22:55         ` Wanpeng Li
  2016-06-29  0:39           ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2016-06-28 22:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: yunhong jiang, linux-kernel, kvm, Wanpeng Li,
	Radim Krčmář,
	Yunhong Jiang

2016-06-29 4:07 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 28/06/2016 20:34, yunhong jiang wrote:
>> Paolo, thanks for reply.
>>
>> Which race window you are talking about? Is it the
>> kvm_lapic_switch_to_hv_timer()? If yes, hope we will not forgot it once the
>> lapic timer is not pinned anymore in future.
>
> Yes, it's that one.  This is a good point against using
> local_irq_save/restore.
>
> Paolo

How about something like below(untested, just want to know if this is
what you mean):

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index fdc05ae..adf8e32 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1361,6 +1361,33 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);

+static void start_hv_tscdeadline(struct kvm_lapic *apic)
+{
+ unsigned long flags;
+ u64 tscdeadline = apic->lapic_timer.tscdeadline;
+
+ local_irq_save(flags);
+ if (kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
+ if (apic->lapic_timer.hv_timer_in_use) {
+ apic->lapic_timer.hv_timer_in_use = false;
+ kvm_x86_ops->cancel_hv_timer(apic->vcpu);
+ }
+ start_sw_tscdeadline(apic);
+ } else {
+ apic->lapic_timer.hv_timer_in_use = true;
+ hrtimer_cancel(&apic->lapic_timer.timer);
+
+ /* In case the sw timer triggered in the window */
+ if (atomic_read(&apic->lapic_timer.pending)) {
+ apic->lapic_timer.hv_timer_in_use = false;
+ kvm_x86_ops->cancel_hv_timer(apic->vcpu);
+ }
+ }
+ local_irq_restore(flags);
+ trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
+ apic->lapic_timer.hv_timer_in_use);
+}
+
 void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
 {
  struct kvm_lapic *apic = vcpu->arch.apic;
@@ -1368,22 +1395,8 @@ void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
  WARN_ON(apic->lapic_timer.hv_timer_in_use);

  if (apic_lvtt_tscdeadline(apic) &&
-    !atomic_read(&apic->lapic_timer.pending)) {
- u64 tscdeadline = apic->lapic_timer.tscdeadline;
-
- if (!kvm_x86_ops->set_hv_timer(vcpu, tscdeadline)) {
- apic->lapic_timer.hv_timer_in_use = true;
- hrtimer_cancel(&apic->lapic_timer.timer);
-
- /* In case the sw timer triggered in the window */
- if (atomic_read(&apic->lapic_timer.pending)) {
- apic->lapic_timer.hv_timer_in_use = false;
- kvm_x86_ops->cancel_hv_timer(apic->vcpu);
- }
- }
- trace_kvm_hv_timer_state(vcpu->vcpu_id,
- apic->lapic_timer.hv_timer_in_use);
- }
+    !atomic_read(&apic->lapic_timer.pending))
+ start_hv_tscdeadline(apic);
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);

@@ -1451,15 +1464,9 @@ static void start_apic_timer(struct kvm_lapic *apic)
    ktime_to_ns(ktime_add_ns(now,
  apic->lapic_timer.period)));
  } else if (apic_lvtt_tscdeadline(apic)) {
- /* lapic timer in tsc deadline mode */
- u64 tscdeadline = apic->lapic_timer.tscdeadline;
-
- if (kvm_x86_ops->set_hv_timer &&
-    !kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
- apic->lapic_timer.hv_timer_in_use = true;
- trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
- apic->lapic_timer.hv_timer_in_use);
- } else
+ if (kvm_x86_ops->set_hv_timer)
+ start_hv_tscdeadline(apic);
+ else
  start_sw_tscdeadline(apic);
  }
 }

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: vmx: fix underflow in TSC deadline calculation
  2016-06-28 18:34     ` yunhong jiang
@ 2016-06-28 20:07       ` Paolo Bonzini
  2016-06-28 22:55         ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-06-28 20:07 UTC (permalink / raw)
  To: yunhong jiang
  Cc: Wanpeng Li, linux-kernel, kvm, Wanpeng Li,
	Radim Krčmář,
	Yunhong Jiang



On 28/06/2016 20:34, yunhong jiang wrote:
> Paolo, thanks for reply.
> 
> Which race window you are talking about? Is it the
> kvm_lapic_switch_to_hv_timer()? If yes, hope we will not forgot it once the
> lapic timer is not pinned anymore in future.

Yes, it's that one.  This is a good point against using
local_irq_save/restore.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: vmx: fix underflow in TSC deadline calculation
  2016-06-28 17:56   ` Paolo Bonzini
@ 2016-06-28 18:34     ` yunhong jiang
  2016-06-28 20:07       ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: yunhong jiang @ 2016-06-28 18:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, linux-kernel, kvm, Wanpeng Li,
	Radim Krčmář,
	Yunhong Jiang

On Tue, 28 Jun 2016 13:56:38 -0400 (EDT)
Paolo Bonzini <pbonzini@redhat.com> wrote:

> 
> 
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index fdc05ae..b15e32a 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -1454,11 +1454,18 @@ static void start_apic_timer(struct
> > > kvm_lapic *apic) /* lapic timer in tsc deadline mode */
> > >  		u64 tscdeadline = apic->lapic_timer.tscdeadline;
> > >  
> > > -		if (kvm_x86_ops->set_hv_timer &&
> > > -		    !kvm_x86_ops->set_hv_timer(apic->vcpu,
> > > tscdeadline)) {
> > > -			apic->lapic_timer.hv_timer_in_use = true;
> > > -
> > > trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
> > > +		if (kvm_x86_ops->set_hv_timer) {
> > > +			if (kvm_x86_ops->set_hv_timer(apic->vcpu,
> > 
> > Would it be better that if set_hv_timer fails, we clear the vmx
> > timer (i.e. the VMCS field) before return the failure? I'm not sure
> > if it make sense to clear the previous setup if a new setup fails,
> > although it seems OK for me, since we have to cancel the hv_timer
> > anyway.
> 
> Good question.  I think we should abstract a little the set_hv_timer
> and cancel_hv_timer calls, for example with a new function
> start_hv_tscdeadline. It's also simpler to avoid the race window by
> disabling interrupts around the calls to set_hv_timer and
> hrtimer_cancel.  I'll see what I can come up with.

Paolo, thanks for reply.

Which race window you are talking about? Is it the
kvm_lapic_switch_to_hv_timer()? If yes, hope we will not forgot it once the
lapic timer is not pinned anymore in future.

Thanks
--jyh

> 
> Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: vmx: fix underflow in TSC deadline calculation
  2016-06-28 17:45 ` yunhong jiang
@ 2016-06-28 17:56   ` Paolo Bonzini
  2016-06-28 18:34     ` yunhong jiang
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-06-28 17:56 UTC (permalink / raw)
  To: yunhong jiang
  Cc: Wanpeng Li, linux-kernel, kvm, Wanpeng Li,
	Radim Krčmář,
	Yunhong Jiang



> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index fdc05ae..b15e32a 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1454,11 +1454,18 @@ static void start_apic_timer(struct kvm_lapic
> > *apic) /* lapic timer in tsc deadline mode */
> >  		u64 tscdeadline = apic->lapic_timer.tscdeadline;
> >  
> > -		if (kvm_x86_ops->set_hv_timer &&
> > -		    !kvm_x86_ops->set_hv_timer(apic->vcpu,
> > tscdeadline)) {
> > -			apic->lapic_timer.hv_timer_in_use = true;
> > -			trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
> > +		if (kvm_x86_ops->set_hv_timer) {
> > +			if (kvm_x86_ops->set_hv_timer(apic->vcpu,
> 
> Would it be better that if set_hv_timer fails, we clear the vmx timer (i.e. the
> VMCS field) before return the failure? I'm not sure if it make sense to clear
> the previous setup if a new setup fails, although it seems OK for me, since
> we have to cancel the hv_timer anyway.

Good question.  I think we should abstract a little the set_hv_timer and
cancel_hv_timer calls, for example with a new function start_hv_tscdeadline.
It's also simpler to avoid the race window by disabling interrupts around the
calls to set_hv_timer and hrtimer_cancel.  I'll see what I can come up with.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] KVM: vmx: fix underflow in TSC deadline calculation
  2016-06-28  6:54 Wanpeng Li
@ 2016-06-28 17:45 ` yunhong jiang
  2016-06-28 17:56   ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: yunhong jiang @ 2016-06-28 17:45 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Wanpeng Li, Paolo Bonzini,
	Radim Krčmář,
	Yunhong Jiang

On Tue, 28 Jun 2016 14:54:19 +0800
Wanpeng Li <kernellwp@gmail.com> wrote:

> INFO: rcu_sched detected stalls on CPUs/tasks:
>  1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0
> fqs=21663 (detected by 0, t=65016 jiffies, g=11500, c=11499, q=719)
> Task dump for CPU 1:
> qemu-system-x86 R  running task        0  3529   3525 0x00080808
>  ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40
>  ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8
>  0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9
> Call Trace:
> ? kvm_write_guest_cached+0xb9/0x160 [kvm]
> ? __delay+0xf/0x20
> ? wait_lapic_expire+0x14a/0x200 [kvm]
> ? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm]
> ? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm]
> ? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
> ? __fget+0x5/0x210
> ? do_vfs_ioctl+0x96/0x6a0
> ? __fget_light+0x2a/0x90
> ? SyS_ioctl+0x79/0x90
> ? do_syscall_64+0x7c/0x1e0
> ? entry_SYSCALL64_slow_path+0x25/0x25
> 
> This can be reproduced readily by running a full dynticks guest(since
> hrtimer in guest is heavily used) w/ lapic_timer_advance disabled.
> 
> If fail to program hardware preemption timer, we will fallback to
> hrtimer based method, however, a previous programmed preemption timer
> miss to cancel in this scenario which results in one hardware
> preemption timer and one hrtimer emulated tsc deadline timer run
> simultaneously. So sometimes the target guest deadline tsc is earlier
> than guest tsc, which leads to the computation in vmx_set_hv_timer
> can underflow and cause delta_tsc to be set a huge value, then host
> soft lockup as above. 
> 
> This patch fix it by cancelling the previous programmed preemption
> timer if there is once we failed to program the new preemption timer
> and fallback to hrtimer based method.

Hi, WanPeng, thanks for the patch.

> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  arch/x86/kvm/lapic.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index fdc05ae..b15e32a 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1454,11 +1454,18 @@ static void start_apic_timer(struct kvm_lapic
> *apic) /* lapic timer in tsc deadline mode */
>  		u64 tscdeadline = apic->lapic_timer.tscdeadline;
>  
> -		if (kvm_x86_ops->set_hv_timer &&
> -		    !kvm_x86_ops->set_hv_timer(apic->vcpu,
> tscdeadline)) {
> -			apic->lapic_timer.hv_timer_in_use = true;
> -			trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
> +		if (kvm_x86_ops->set_hv_timer) {
> +			if (kvm_x86_ops->set_hv_timer(apic->vcpu,

Would it be better that if set_hv_timer fails, we clear the vmx timer (i.e. the
VMCS field) before return the failure? I'm not sure if it make sense to clear
the previous setup if a new setup fails, although it seems OK for me, since we
have to cancel the hv_timer anyway.

Your idea?

Thanks
--jyh

> tscdeadline)) {
> +				if
> (apic->lapic_timer.hv_timer_in_use) {
> +
> kvm_x86_ops->cancel_hv_timer(apic->vcpu);
> +
> apic->lapic_timer.hv_timer_in_use = false;
> +				}
> +				start_sw_tscdeadline(apic);
> +			} else {
> +				apic->lapic_timer.hv_timer_in_use =
> true;
> +
> trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
> apic->lapic_timer.hv_timer_in_use);
> +			}
>  		} else
>  			start_sw_tscdeadline(apic);
>  	}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] KVM: vmx: fix underflow in TSC deadline calculation
@ 2016-06-28  6:54 Wanpeng Li
  2016-06-28 17:45 ` yunhong jiang
  0 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2016-06-28  6:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Wanpeng Li, Paolo Bonzini, Radim Krčmář, Yunhong Jiang

INFO: rcu_sched detected stalls on CPUs/tasks:
 1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0 fqs=21663 
 (detected by 0, t=65016 jiffies, g=11500, c=11499, q=719)
Task dump for CPU 1:
qemu-system-x86 R  running task        0  3529   3525 0x00080808
 ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40
 ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8
 0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9
Call Trace:
? kvm_write_guest_cached+0xb9/0x160 [kvm]
? __delay+0xf/0x20
? wait_lapic_expire+0x14a/0x200 [kvm]
? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm]
? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm]
? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
? __fget+0x5/0x210
? do_vfs_ioctl+0x96/0x6a0
? __fget_light+0x2a/0x90
? SyS_ioctl+0x79/0x90
? do_syscall_64+0x7c/0x1e0
? entry_SYSCALL64_slow_path+0x25/0x25

This can be reproduced readily by running a full dynticks guest(since hrtimer 
in guest is heavily used) w/ lapic_timer_advance disabled.

If fail to program hardware preemption timer, we will fallback to hrtimer based 
method, however, a previous programmed preemption timer miss to cancel in this 
scenario which results in one hardware preemption timer and one hrtimer emulated 
tsc deadline timer run simultaneously. So sometimes the target guest deadline 
tsc is earlier than guest tsc, which leads to the computation in vmx_set_hv_timer 
can underflow and cause delta_tsc to be set a huge value, then host soft lockup 
as above. 

This patch fix it by cancelling the previous programmed preemption timer if there 
is once we failed to program the new preemption timer and fallback to hrtimer 
based method.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/lapic.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index fdc05ae..b15e32a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1454,11 +1454,18 @@ static void start_apic_timer(struct kvm_lapic *apic)
 		/* lapic timer in tsc deadline mode */
 		u64 tscdeadline = apic->lapic_timer.tscdeadline;
 
-		if (kvm_x86_ops->set_hv_timer &&
-		    !kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
-			apic->lapic_timer.hv_timer_in_use = true;
-			trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
+		if (kvm_x86_ops->set_hv_timer) {
+			if (kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
+				if (apic->lapic_timer.hv_timer_in_use) {
+					kvm_x86_ops->cancel_hv_timer(apic->vcpu);
+					apic->lapic_timer.hv_timer_in_use = false;
+				}
+				start_sw_tscdeadline(apic);
+			} else {
+				apic->lapic_timer.hv_timer_in_use = true;
+				trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
 					apic->lapic_timer.hv_timer_in_use);
+			}
 		} else
 			start_sw_tscdeadline(apic);
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-06-29  0:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-27 13:11 [PATCH] KVM: vmx: fix underflow in TSC deadline calculation Paolo Bonzini
2016-06-28  6:15 ` Wanpeng Li
2016-06-28  8:43   ` Paolo Bonzini
2016-06-28  8:46     ` Wanpeng Li
2016-06-28  6:54 Wanpeng Li
2016-06-28 17:45 ` yunhong jiang
2016-06-28 17:56   ` Paolo Bonzini
2016-06-28 18:34     ` yunhong jiang
2016-06-28 20:07       ` Paolo Bonzini
2016-06-28 22:55         ` Wanpeng Li
2016-06-29  0:39           ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).