* [PATCH v2 1/2] KVM: x86: introduce cancel_hv_tscdeadline
@ 2016-06-29 11:23 Wanpeng Li
2016-06-29 11:23 ` [PATCH v2 2/2] KVM: x86: fix underflow in TSC deadline calculation Wanpeng Li
2016-06-29 16:38 ` [PATCH v2 1/2] KVM: x86: introduce cancel_hv_tscdeadline yunhong jiang
0 siblings, 2 replies; 6+ messages in thread
From: Wanpeng Li @ 2016-06-29 11:23 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Wanpeng Li, Paolo Bonzini, Radim Krčmář, Yunhong Jiang
From: Wanpeng Li <wanpeng.li@hotmail.com>
Introduce cancel_hv_tscdeadline() to encapsulate preemption
timer cancel stuff.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
arch/x86/kvm/lapic.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index fdc05ae..9c20ac1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1349,14 +1349,19 @@ bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use);
+static void cancel_hv_tscdeadline(struct kvm_lapic *apic)
+{
+ kvm_x86_ops->cancel_hv_timer(apic->vcpu);
+ apic->lapic_timer.hv_timer_in_use = false;
+}
+
void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
WARN_ON(!apic->lapic_timer.hv_timer_in_use);
WARN_ON(swait_active(&vcpu->wq));
- kvm_x86_ops->cancel_hv_timer(vcpu);
- apic->lapic_timer.hv_timer_in_use = false;
+ cancel_hv_tscdeadline(apic);
apic_timer_expired(apic);
}
EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
@@ -1376,10 +1381,8 @@ void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
hrtimer_cancel(&apic->lapic_timer.timer);
/* In case the sw timer triggered in the window */
- if (atomic_read(&apic->lapic_timer.pending)) {
- apic->lapic_timer.hv_timer_in_use = false;
- kvm_x86_ops->cancel_hv_timer(apic->vcpu);
- }
+ if (atomic_read(&apic->lapic_timer.pending))
+ cancel_hv_tscdeadline(apic);
}
trace_kvm_hv_timer_state(vcpu->vcpu_id,
apic->lapic_timer.hv_timer_in_use);
@@ -1395,8 +1398,7 @@ void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu)
if (!apic->lapic_timer.hv_timer_in_use)
return;
- kvm_x86_ops->cancel_hv_timer(vcpu);
- apic->lapic_timer.hv_timer_in_use = false;
+ cancel_hv_tscdeadline(apic);
if (atomic_read(&apic->lapic_timer.pending))
return;
--
1.9.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] KVM: x86: fix underflow in TSC deadline calculation
2016-06-29 11:23 [PATCH v2 1/2] KVM: x86: introduce cancel_hv_tscdeadline Wanpeng Li
@ 2016-06-29 11:23 ` Wanpeng Li
2016-06-29 17:16 ` yunhong jiang
2016-06-29 16:38 ` [PATCH v2 1/2] KVM: x86: introduce cancel_hv_tscdeadline yunhong jiang
1 sibling, 1 reply; 6+ messages in thread
From: Wanpeng Li @ 2016-06-29 11:23 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Wanpeng Li, Paolo Bonzini, Radim Krčmář, Yunhong Jiang
From: Wanpeng Li <wanpeng.li@hotmail.com>
INFO: rcu_sched detected stalls on CPUs/tasks:
1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0 fqs=21663
(detected by 0, t=65016 jiffies, g=11500, c=11499, q=719)
Task dump for CPU 1:
qemu-system-x86 R running task 0 3529 3525 0x00080808
ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40
ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8
0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9
Call Trace:
? kvm_write_guest_cached+0xb9/0x160 [kvm]
? __delay+0xf/0x20
? wait_lapic_expire+0x14a/0x200 [kvm]
? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm]
? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm]
? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
? __fget+0x5/0x210
? do_vfs_ioctl+0x96/0x6a0
? __fget_light+0x2a/0x90
? SyS_ioctl+0x79/0x90
? do_syscall_64+0x7c/0x1e0
? entry_SYSCALL64_slow_path+0x25/0x25
This can be reproduced readily by running a full dynticks guest(since hrtimer
in guest is heavily used) w/ lapic_timer_advance disabled.
If fail to program hardware preemption timer, we will fallback to hrtimer based
method, however, a previous programmed preemption timer miss to cancel in this
scenario which results in one hardware preemption timer and one hrtimer emulated
tsc deadline timer run simultaneously. So sometimes the target guest deadline
tsc is earlier than guest tsc, which leads to the computation in vmx_set_hv_timer
can underflow and cause delta_tsc to be set a huge value, then host soft lockup
as above.
This patch fix it by cancelling the previous programmed preemption timer if there
is once we failed to program the new preemption timer and fallback to hrtimer
based method.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v1 -> v2:
* abstract the set_hv_timer and cancel_hv_tscdeadline
arch/x86/kvm/lapic.c | 48 +++++++++++++++++++++++++-----------------------
1 file changed, 25 insertions(+), 23 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9c20ac1..47ce77c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1366,6 +1366,26 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
+static void start_hv_tscdeadline(struct kvm_lapic *apic)
+{
+ u64 tscdeadline = apic->lapic_timer.tscdeadline;
+
+ if (kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
+ if (apic->lapic_timer.hv_timer_in_use)
+ cancel_hv_tscdeadline(apic);
+ start_sw_tscdeadline(apic);
+ } else {
+ apic->lapic_timer.hv_timer_in_use = true;
+ hrtimer_cancel(&apic->lapic_timer.timer);
+
+ /* In case the sw timer triggered in the window */
+ if (atomic_read(&apic->lapic_timer.pending))
+ cancel_hv_tscdeadline(apic);
+ }
+ trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
+ apic->lapic_timer.hv_timer_in_use);
+}
+
void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
@@ -1373,20 +1393,8 @@ void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
WARN_ON(apic->lapic_timer.hv_timer_in_use);
if (apic_lvtt_tscdeadline(apic) &&
- !atomic_read(&apic->lapic_timer.pending)) {
- u64 tscdeadline = apic->lapic_timer.tscdeadline;
-
- if (!kvm_x86_ops->set_hv_timer(vcpu, tscdeadline)) {
- apic->lapic_timer.hv_timer_in_use = true;
- hrtimer_cancel(&apic->lapic_timer.timer);
-
- /* In case the sw timer triggered in the window */
- if (atomic_read(&apic->lapic_timer.pending))
- cancel_hv_tscdeadline(apic);
- }
- trace_kvm_hv_timer_state(vcpu->vcpu_id,
- apic->lapic_timer.hv_timer_in_use);
- }
+ !atomic_read(&apic->lapic_timer.pending))
+ start_hv_tscdeadline(apic);
}
EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
@@ -1453,15 +1461,9 @@ static void start_apic_timer(struct kvm_lapic *apic)
ktime_to_ns(ktime_add_ns(now,
apic->lapic_timer.period)));
} else if (apic_lvtt_tscdeadline(apic)) {
- /* lapic timer in tsc deadline mode */
- u64 tscdeadline = apic->lapic_timer.tscdeadline;
-
- if (kvm_x86_ops->set_hv_timer &&
- !kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
- apic->lapic_timer.hv_timer_in_use = true;
- trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
- apic->lapic_timer.hv_timer_in_use);
- } else
+ if (kvm_x86_ops->set_hv_timer)
+ start_hv_tscdeadline(apic);
+ else
start_sw_tscdeadline(apic);
}
}
--
1.9.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] KVM: x86: introduce cancel_hv_tscdeadline
2016-06-29 11:23 [PATCH v2 1/2] KVM: x86: introduce cancel_hv_tscdeadline Wanpeng Li
2016-06-29 11:23 ` [PATCH v2 2/2] KVM: x86: fix underflow in TSC deadline calculation Wanpeng Li
@ 2016-06-29 16:38 ` yunhong jiang
1 sibling, 0 replies; 6+ messages in thread
From: yunhong jiang @ 2016-06-29 16:38 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-kernel, kvm, Wanpeng Li, Paolo Bonzini,
Radim Krčmář,
Yunhong Jiang
On Wed, 29 Jun 2016 19:23:56 +0800
Wanpeng Li <kernellwp@gmail.com> wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> Introduce cancel_hv_tscdeadline() to encapsulate preemption
> timer cancel stuff.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
It's ok for me, thanks for the patch, Wanpeng.
--jyh
> ---
> arch/x86/kvm/lapic.c | 18 ++++++++++--------
> 1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index fdc05ae..9c20ac1 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1349,14 +1349,19 @@ bool kvm_lapic_hv_timer_in_use(struct
> kvm_vcpu *vcpu) }
> EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use);
>
> +static void cancel_hv_tscdeadline(struct kvm_lapic *apic)
> +{
> + kvm_x86_ops->cancel_hv_timer(apic->vcpu);
> + apic->lapic_timer.hv_timer_in_use = false;
> +}
> +
> void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
> {
> struct kvm_lapic *apic = vcpu->arch.apic;
>
> WARN_ON(!apic->lapic_timer.hv_timer_in_use);
> WARN_ON(swait_active(&vcpu->wq));
> - kvm_x86_ops->cancel_hv_timer(vcpu);
> - apic->lapic_timer.hv_timer_in_use = false;
> + cancel_hv_tscdeadline(apic);
> apic_timer_expired(apic);
> }
> EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
> @@ -1376,10 +1381,8 @@ void kvm_lapic_switch_to_hv_timer(struct
> kvm_vcpu *vcpu) hrtimer_cancel(&apic->lapic_timer.timer);
>
> /* In case the sw timer triggered in the
> window */
> - if (atomic_read(&apic->lapic_timer.pending))
> {
> - apic->lapic_timer.hv_timer_in_use =
> false;
> -
> kvm_x86_ops->cancel_hv_timer(apic->vcpu);
> - }
> + if (atomic_read(&apic->lapic_timer.pending))
> + cancel_hv_tscdeadline(apic);
> }
> trace_kvm_hv_timer_state(vcpu->vcpu_id,
> apic->lapic_timer.hv_timer_in_use);
> @@ -1395,8 +1398,7 @@ void kvm_lapic_switch_to_sw_timer(struct
> kvm_vcpu *vcpu) if (!apic->lapic_timer.hv_timer_in_use)
> return;
>
> - kvm_x86_ops->cancel_hv_timer(vcpu);
> - apic->lapic_timer.hv_timer_in_use = false;
> + cancel_hv_tscdeadline(apic);
>
> if (atomic_read(&apic->lapic_timer.pending))
> return;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] KVM: x86: fix underflow in TSC deadline calculation
2016-06-29 11:23 ` [PATCH v2 2/2] KVM: x86: fix underflow in TSC deadline calculation Wanpeng Li
@ 2016-06-29 17:16 ` yunhong jiang
2016-06-29 20:49 ` Paolo Bonzini
2016-06-29 22:26 ` Wanpeng Li
0 siblings, 2 replies; 6+ messages in thread
From: yunhong jiang @ 2016-06-29 17:16 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-kernel, kvm, Wanpeng Li, Paolo Bonzini,
Radim Krčmář,
Yunhong Jiang
On Wed, 29 Jun 2016 19:23:57 +0800
Wanpeng Li <kernellwp@gmail.com> wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> INFO: rcu_sched detected stalls on CPUs/tasks:
> 1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0
> fqs=21663 (detected by 0, t=65016 jiffies, g=11500, c=11499, q=719)
> Task dump for CPU 1:
> qemu-system-x86 R running task 0 3529 3525 0x00080808
> ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40
> ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8
> 0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9
> Call Trace:
> ? kvm_write_guest_cached+0xb9/0x160 [kvm]
> ? __delay+0xf/0x20
> ? wait_lapic_expire+0x14a/0x200 [kvm]
> ? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm]
> ? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm]
> ? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
> ? __fget+0x5/0x210
> ? do_vfs_ioctl+0x96/0x6a0
> ? __fget_light+0x2a/0x90
> ? SyS_ioctl+0x79/0x90
> ? do_syscall_64+0x7c/0x1e0
> ? entry_SYSCALL64_slow_path+0x25/0x25
>
> This can be reproduced readily by running a full dynticks guest(since
> hrtimer in guest is heavily used) w/ lapic_timer_advance disabled.
>
> If fail to program hardware preemption timer, we will fallback to
> hrtimer based method, however, a previous programmed preemption timer
> miss to cancel in this scenario which results in one hardware
> preemption timer and one hrtimer emulated tsc deadline timer run
> simultaneously. So sometimes the target guest deadline tsc is earlier
> than guest tsc, which leads to the computation in vmx_set_hv_timer
> can underflow and cause delta_tsc to be set a huge value, then host
> soft lockup as above.
>
> This patch fix it by cancelling the previous programmed preemption
> timer if there is once we failed to program the new preemption timer
> and fallback to hrtimer based method.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> v1 -> v2:
> * abstract the set_hv_timer and cancel_hv_tscdeadline
>
> arch/x86/kvm/lapic.c | 48
> +++++++++++++++++++++++++----------------------- 1 file changed, 25
> insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9c20ac1..47ce77c 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1366,6 +1366,26 @@ void kvm_lapic_expired_hv_timer(struct
> kvm_vcpu *vcpu) }
> EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
>
> +static void start_hv_tscdeadline(struct kvm_lapic *apic)
> +{
> + u64 tscdeadline = apic->lapic_timer.tscdeadline;
> +
> + if (kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
> + if (apic->lapic_timer.hv_timer_in_use)
> + cancel_hv_tscdeadline(apic);
Wanpeng, thanks for the patch.
> + start_sw_tscdeadline(apic);
IMHO, it's not good to start_sw_tscdeadline() on the start_hv_tscdeadline()
function. I think it's expected that the sw_timer is stopped when
start_hv_tscdeadline() returns successsfully, or sw_timer is not impacted if
start_hv_tscdeadline() fails. But it's not expected that start_hv_tscdeadline()
returns successfully while in fact it's the sw_timer started instead :)
Would it be better to simply return failure here, and the caller then
starts the sw_timer?
> + } else {
> + apic->lapic_timer.hv_timer_in_use = true;
> + hrtimer_cancel(&apic->lapic_timer.timer);
> +
> + /* In case the sw timer triggered in the window */
> + if (atomic_read(&apic->lapic_timer.pending))
> + cancel_hv_tscdeadline(apic);
> + }
> + trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
> + apic->lapic_timer.hv_timer_in_use);
> +}
> +
> void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
> {
> struct kvm_lapic *apic = vcpu->arch.apic;
> @@ -1373,20 +1393,8 @@ void kvm_lapic_switch_to_hv_timer(struct
> kvm_vcpu *vcpu) WARN_ON(apic->lapic_timer.hv_timer_in_use);
>
> if (apic_lvtt_tscdeadline(apic) &&
> - !atomic_read(&apic->lapic_timer.pending)) {
> - u64 tscdeadline = apic->lapic_timer.tscdeadline;
> -
> - if (!kvm_x86_ops->set_hv_timer(vcpu, tscdeadline)) {
> - apic->lapic_timer.hv_timer_in_use = true;
> - hrtimer_cancel(&apic->lapic_timer.timer);
> -
> - /* In case the sw timer triggered in the
> window */
> - if (atomic_read(&apic->lapic_timer.pending))
> - cancel_hv_tscdeadline(apic);
> - }
> - trace_kvm_hv_timer_state(vcpu->vcpu_id,
> - apic->lapic_timer.hv_timer_in_use);
> - }
> + !atomic_read(&apic->lapic_timer.pending))
Not sure if we could put this check into the start_hv_tscdeadline(). It will also
make the race window all in the same function.
> + start_hv_tscdeadline(apic);
> }
> EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
>
> @@ -1453,15 +1461,9 @@ static void start_apic_timer(struct kvm_lapic
> *apic) ktime_to_ns(ktime_add_ns(now,
> apic->lapic_timer.period)));
> } else if (apic_lvtt_tscdeadline(apic)) {
> - /* lapic timer in tsc deadline mode */
> - u64 tscdeadline = apic->lapic_timer.tscdeadline;
> -
> - if (kvm_x86_ops->set_hv_timer &&
> - !kvm_x86_ops->set_hv_timer(apic->vcpu,
> tscdeadline)) {
> - apic->lapic_timer.hv_timer_in_use = true;
> - trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
> -
> apic->lapic_timer.hv_timer_in_use);
> - } else
> + if (kvm_x86_ops->set_hv_timer)
> + start_hv_tscdeadline(apic);
As comments above, would it be good to check the return of
start_hv_tscdeadline() and then start_sw_tscdeadline() if it fails? Just my 2
cents.
Thanks
--jyh
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] KVM: x86: fix underflow in TSC deadline calculation
2016-06-29 17:16 ` yunhong jiang
@ 2016-06-29 20:49 ` Paolo Bonzini
2016-06-29 22:26 ` Wanpeng Li
1 sibling, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2016-06-29 20:49 UTC (permalink / raw)
To: yunhong jiang, Wanpeng Li
Cc: linux-kernel, kvm, Wanpeng Li, Radim Krčmář,
Yunhong Jiang
On 29/06/2016 19:16, yunhong jiang wrote:
>> > + start_sw_tscdeadline(apic);
> IMHO, it's not good to start_sw_tscdeadline() on the start_hv_tscdeadline()
> function. I think it's expected that the sw_timer is stopped when
> start_hv_tscdeadline() returns successsfully, or sw_timer is not impacted if
> start_hv_tscdeadline() fails. But it's not expected that start_hv_tscdeadline()
> returns successfully while in fact it's the sw_timer started instead :)
>
> Would it be better to simply return failure here, and the caller then
> starts the sw_timer?
I agree with Yunhong.
Paolo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] KVM: x86: fix underflow in TSC deadline calculation
2016-06-29 17:16 ` yunhong jiang
2016-06-29 20:49 ` Paolo Bonzini
@ 2016-06-29 22:26 ` Wanpeng Li
1 sibling, 0 replies; 6+ messages in thread
From: Wanpeng Li @ 2016-06-29 22:26 UTC (permalink / raw)
To: yunhong jiang
Cc: linux-kernel, kvm, Wanpeng Li, Paolo Bonzini,
Radim Krčmář,
Yunhong Jiang
2016-06-30 1:16 GMT+08:00 yunhong jiang <yunhong.jiang@linux.intel.com>:
> On Wed, 29 Jun 2016 19:23:57 +0800
> Wanpeng Li <kernellwp@gmail.com> wrote:
>
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> INFO: rcu_sched detected stalls on CPUs/tasks:
>> 1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0
>> fqs=21663 (detected by 0, t=65016 jiffies, g=11500, c=11499, q=719)
>> Task dump for CPU 1:
>> qemu-system-x86 R running task 0 3529 3525 0x00080808
>> ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40
>> ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8
>> 0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9
>> Call Trace:
>> ? kvm_write_guest_cached+0xb9/0x160 [kvm]
>> ? __delay+0xf/0x20
>> ? wait_lapic_expire+0x14a/0x200 [kvm]
>> ? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm]
>> ? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm]
>> ? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>> ? __fget+0x5/0x210
>> ? do_vfs_ioctl+0x96/0x6a0
>> ? __fget_light+0x2a/0x90
>> ? SyS_ioctl+0x79/0x90
>> ? do_syscall_64+0x7c/0x1e0
>> ? entry_SYSCALL64_slow_path+0x25/0x25
>>
>> This can be reproduced readily by running a full dynticks guest(since
>> hrtimer in guest is heavily used) w/ lapic_timer_advance disabled.
>>
>> If fail to program hardware preemption timer, we will fallback to
>> hrtimer based method, however, a previous programmed preemption timer
>> miss to cancel in this scenario which results in one hardware
>> preemption timer and one hrtimer emulated tsc deadline timer run
>> simultaneously. So sometimes the target guest deadline tsc is earlier
>> than guest tsc, which leads to the computation in vmx_set_hv_timer
>> can underflow and cause delta_tsc to be set a huge value, then host
>> soft lockup as above.
>>
>> This patch fix it by cancelling the previous programmed preemption
>> timer if there is once we failed to program the new preemption timer
>> and fallback to hrtimer based method.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> v1 -> v2:
>> * abstract the set_hv_timer and cancel_hv_tscdeadline
>>
>> arch/x86/kvm/lapic.c | 48
>> +++++++++++++++++++++++++----------------------- 1 file changed, 25
>> insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 9c20ac1..47ce77c 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -1366,6 +1366,26 @@ void kvm_lapic_expired_hv_timer(struct
>> kvm_vcpu *vcpu) }
>> EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
>>
>> +static void start_hv_tscdeadline(struct kvm_lapic *apic)
>> +{
>> + u64 tscdeadline = apic->lapic_timer.tscdeadline;
>> +
>> + if (kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
>> + if (apic->lapic_timer.hv_timer_in_use)
>> + cancel_hv_tscdeadline(apic);
>
> Wanpeng, thanks for the patch.
Thanks for your review, Yunhong. :)
>
>> + start_sw_tscdeadline(apic);
>
> IMHO, it's not good to start_sw_tscdeadline() on the start_hv_tscdeadline()
> function. I think it's expected that the sw_timer is stopped when
> start_hv_tscdeadline() returns successsfully, or sw_timer is not impacted if
> start_hv_tscdeadline() fails. But it's not expected that start_hv_tscdeadline()
> returns successfully while in fact it's the sw_timer started instead :)
>
> Would it be better to simply return failure here, and the caller then
> starts the sw_timer?
Agreed.
>
>> + } else {
>> + apic->lapic_timer.hv_timer_in_use = true;
>> + hrtimer_cancel(&apic->lapic_timer.timer);
>> +
>> + /* In case the sw timer triggered in the window */
>> + if (atomic_read(&apic->lapic_timer.pending))
>> + cancel_hv_tscdeadline(apic);
>> + }
>> + trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
>> + apic->lapic_timer.hv_timer_in_use);
>> +}
>> +
>> void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
>> {
>> struct kvm_lapic *apic = vcpu->arch.apic;
>> @@ -1373,20 +1393,8 @@ void kvm_lapic_switch_to_hv_timer(struct
>> kvm_vcpu *vcpu) WARN_ON(apic->lapic_timer.hv_timer_in_use);
>>
>> if (apic_lvtt_tscdeadline(apic) &&
>> - !atomic_read(&apic->lapic_timer.pending)) {
>> - u64 tscdeadline = apic->lapic_timer.tscdeadline;
>> -
>> - if (!kvm_x86_ops->set_hv_timer(vcpu, tscdeadline)) {
>> - apic->lapic_timer.hv_timer_in_use = true;
>> - hrtimer_cancel(&apic->lapic_timer.timer);
>> -
>> - /* In case the sw timer triggered in the
>> window */
>> - if (atomic_read(&apic->lapic_timer.pending))
>> - cancel_hv_tscdeadline(apic);
>> - }
>> - trace_kvm_hv_timer_state(vcpu->vcpu_id,
>> - apic->lapic_timer.hv_timer_in_use);
>> - }
>> + !atomic_read(&apic->lapic_timer.pending))
>
> Not sure if we could put this check into the start_hv_tscdeadline(). It will also
> make the race window all in the same function.
Agreed.
>
>> + start_hv_tscdeadline(apic);
>> }
>> EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
>>
>> @@ -1453,15 +1461,9 @@ static void start_apic_timer(struct kvm_lapic
>> *apic) ktime_to_ns(ktime_add_ns(now,
>> apic->lapic_timer.period)));
>> } else if (apic_lvtt_tscdeadline(apic)) {
>> - /* lapic timer in tsc deadline mode */
>> - u64 tscdeadline = apic->lapic_timer.tscdeadline;
>> -
>> - if (kvm_x86_ops->set_hv_timer &&
>> - !kvm_x86_ops->set_hv_timer(apic->vcpu,
>> tscdeadline)) {
>> - apic->lapic_timer.hv_timer_in_use = true;
>> - trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
>> -
>> apic->lapic_timer.hv_timer_in_use);
>> - } else
>> + if (kvm_x86_ops->set_hv_timer)
>> + start_hv_tscdeadline(apic);
>
> As comments above, would it be good to check the return of
> start_hv_tscdeadline() and then start_sw_tscdeadline() if it fails? Just my 2
> cents.
Agreed. I will do these in next version.
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-06-29 22:27 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-29 11:23 [PATCH v2 1/2] KVM: x86: introduce cancel_hv_tscdeadline Wanpeng Li
2016-06-29 11:23 ` [PATCH v2 2/2] KVM: x86: fix underflow in TSC deadline calculation Wanpeng Li
2016-06-29 17:16 ` yunhong jiang
2016-06-29 20:49 ` Paolo Bonzini
2016-06-29 22:26 ` Wanpeng Li
2016-06-29 16:38 ` [PATCH v2 1/2] KVM: x86: introduce cancel_hv_tscdeadline yunhong jiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).