linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFT PATCH v5 0/3] Fix preemption timer optimization while nested guests run
@ 2016-07-08 12:02 Paolo Bonzini
  2016-07-08 12:02 ` [RFT PATCH v5 1/3] KVM: nVMX: avoid incorrect preemption timer vmexit in nested guest Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Paolo Bonzini @ 2016-07-08 12:02 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

As mentioned earlier, I don't have a reproducer yet that requires any
changes beyond Wanpeng's (patch 1)---it's possible of course to write
a kvm-unit-test testcase but I haven't had time for that yet.

Thanks,

Paolo Bonzini (2):
  KVM: VMX: reflect broken preemption timer in vmcs_config
  KVM: nVMX: keep preemption timer enabled during L2 execution

Wanpeng Li (1):
  KVM: nVMX: avoid incorrect preemption timer vmexit in nested guest

 arch/x86/kvm/vmx.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFT PATCH v5 1/3] KVM: nVMX: avoid incorrect preemption timer vmexit in nested guest
  2016-07-08 12:02 [RFT PATCH v5 0/3] Fix preemption timer optimization while nested guests run Paolo Bonzini
@ 2016-07-08 12:02 ` Paolo Bonzini
  2016-07-08 12:02 ` [RFT PATCH v5 2/3] KVM: VMX: reflect broken preemption timer in vmcs_config Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2016-07-08 12:02 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Wanpeng Li, Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

From: Wanpeng Li <wanpeng.li@hotmail.com>

The preemption timer for nested VMX is emulated by hrtimer which is started on L2
entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
nested_vmx_exit_handled is always returning true for preemption timer vmexit.  Then,
the L1 preemption timer vmexit is captured and be treated as a L2 preemption
timer vmexit, causing NULL pointer dereferences or worse in the L1 guest's
vmexit handler:

    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<          (null)>]           (null)
    PGD 0
    Oops: 0010 [#1] SMP
    Call Trace:
     ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
     handle_preemption_timer+0xe/0x20 [kvm_intel]
     vmx_handle_exit+0x169/0x15a0 [kvm_intel]
     ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
     kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
     ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
     ? vcpu_load+0x1c/0x60 [kvm]
     ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
     kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
     do_vfs_ioctl+0x96/0x6a0
     ? __fget_light+0x2a/0x90
     SyS_ioctl+0x79/0x90
     do_syscall_64+0x68/0x180
     entry_SYSCALL64_slow_path+0x25/0x25
    Code:  Bad RIP value.
    RIP  [<          (null)>]           (null)
     RSP <ffff8800b5263c48>
    CR2: 0000000000000000
    ---[ end trace 9c70c48b1a2bc66e ]---

This can be reproduced readily by preemption timer enabled on L0 and disabled
on L1.

Return false since preemption timer vmexits must never be reflected to L2.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e564fa2c7ac8..f6e5cc679898 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
 	case EXIT_REASON_PCOMMIT:
 		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
+	case EXIT_REASON_PREEMPTION_TIMER:
+		return false;
 	default:
 		return true;
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFT PATCH v5 2/3] KVM: VMX: reflect broken preemption timer in vmcs_config
  2016-07-08 12:02 [RFT PATCH v5 0/3] Fix preemption timer optimization while nested guests run Paolo Bonzini
  2016-07-08 12:02 ` [RFT PATCH v5 1/3] KVM: nVMX: avoid incorrect preemption timer vmexit in nested guest Paolo Bonzini
@ 2016-07-08 12:02 ` Paolo Bonzini
  2016-07-08 12:02 ` [RFT PATCH v5 3/3] KVM: nVMX: keep preemption timer enabled during L2 execution Paolo Bonzini
  2016-07-11  4:56 ` [RFT PATCH v5 0/3] Fix preemption timer optimization while nested guests run Wanpeng Li
  3 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2016-07-08 12:02 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

Simplify cpu_has_vmx_preemption_timer.  This is consistent with the
rest of setup_vmcs_config and preparatory for the next patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f6e5cc679898..0048be79c7b9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1121,9 +1121,6 @@ static inline bool cpu_has_broken_vmx_preemption_timer(void)
 
 static inline bool cpu_has_vmx_preemption_timer(void)
 {
-	if (cpu_has_broken_vmx_preemption_timer())
-		return false;
-
 	return vmcs_config.pin_based_exec_ctrl &
 		PIN_BASED_VMX_PREEMPTION_TIMER;
 }
@@ -3407,6 +3404,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 				&_pin_based_exec_control) < 0)
 		return -EIO;
 
+	if (cpu_has_broken_vmx_preemption_timer())
+		_pin_based_exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
 	if (!(_cpu_based_2nd_exec_control &
 		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY))
 		_pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFT PATCH v5 3/3] KVM: nVMX: keep preemption timer enabled during L2 execution
  2016-07-08 12:02 [RFT PATCH v5 0/3] Fix preemption timer optimization while nested guests run Paolo Bonzini
  2016-07-08 12:02 ` [RFT PATCH v5 1/3] KVM: nVMX: avoid incorrect preemption timer vmexit in nested guest Paolo Bonzini
  2016-07-08 12:02 ` [RFT PATCH v5 2/3] KVM: VMX: reflect broken preemption timer in vmcs_config Paolo Bonzini
@ 2016-07-08 12:02 ` Paolo Bonzini
  2016-07-08 17:29   ` yunhong jiang
  2016-07-11  4:56 ` [RFT PATCH v5 0/3] Fix preemption timer optimization while nested guests run Wanpeng Li
  3 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-07-08 12:02 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

Because the vmcs12 preemption timer is emulated through a separate hrtimer,
we can keep on using the preemption timer in the vmcs02 to emulare L1's
TSC deadline timer.

However, the corresponding bit in the pin-based execution control field
must be kept consistent between vmcs01 and vmcs02.  On vmentry we copy
it into the vmcs02; on vmexit the preemption timer must be disabled in
the vmcs01 if a preemption timer vmexit happened while in guest mode.

The preemption timer value in the vmcs02 is set by vmx_vcpu_run, so it
need not be considered in prepare_vmcs02.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0048be79c7b9..8cda4449a60e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9796,9 +9796,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 	vmcs_write64(VMCS_LINK_POINTER, -1ull);
 
 	exec_control = vmcs12->pin_based_vm_exec_control;
-	exec_control |= vmcs_config.pin_based_exec_ctrl;
+
+	/* Preemption timer setting is only taken from vmcs01.  */
 	exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
+	exec_control |= vmcs_config.pin_based_exec_ctrl;
+	if (vmx->hv_deadline_tsc == -1)
+		exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
 
+	/* Posted interrupts setting is only taken from vmcs12.  */
 	if (nested_cpu_has_posted_intr(vmcs12)) {
 		/*
 		 * Note that we use L0's vector here and in
@@ -10727,8 +10732,14 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 
 	load_vmcs12_host_state(vcpu, vmcs12);
 
-	/* Update TSC_OFFSET if TSC was changed while L2 ran */
+	/* Update any VMCS fields that might have changed while L2 ran */
 	vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
+	if (vmx->hv_deadline_tsc == -1)
+		vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
+				PIN_BASED_VMX_PREEMPTION_TIMER);
+	else
+		vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
+			      PIN_BASED_VMX_PREEMPTION_TIMER);
 
 	/* This is needed for same reason as it was needed in prepare_vmcs02 */
 	vmx->host_rsp = 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFT PATCH v5 3/3] KVM: nVMX: keep preemption timer enabled during L2 execution
  2016-07-08 12:02 ` [RFT PATCH v5 3/3] KVM: nVMX: keep preemption timer enabled during L2 execution Paolo Bonzini
@ 2016-07-08 17:29   ` yunhong jiang
  2016-07-08 17:41     ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: yunhong jiang @ 2016-07-08 17:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

On Fri,  8 Jul 2016 14:02:13 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Because the vmcs12 preemption timer is emulated through a separate
> hrtimer, we can keep on using the preemption timer in the vmcs02 to
> emulare L1's TSC deadline timer.
> 
> However, the corresponding bit in the pin-based execution control
> field must be kept consistent between vmcs01 and vmcs02.  On vmentry
> we copy it into the vmcs02; on vmexit the preemption timer must be
> disabled in the vmcs01 if a preemption timer vmexit happened while in
> guest mode.
> 
> The preemption timer value in the vmcs02 is set by vmx_vcpu_run, so it
> need not be considered in prepare_vmcs02.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 0048be79c7b9..8cda4449a60e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9796,9 +9796,14 @@ static void prepare_vmcs02(struct kvm_vcpu
> *vcpu, struct vmcs12 *vmcs12) vmcs_write64(VMCS_LINK_POINTER, -1ull);
>  
>  	exec_control = vmcs12->pin_based_vm_exec_control;
> -	exec_control |= vmcs_config.pin_based_exec_ctrl;
> +
> +	/* Preemption timer setting is only taken from vmcs01.  */
>  	exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;

Do we still keep this clear here with followed changes?

> +	exec_control |= vmcs_config.pin_based_exec_ctrl;
> +	if (vmx->hv_deadline_tsc == -1)
> +		exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>  
> +	/* Posted interrupts setting is only taken from vmcs12.  */
>  	if (nested_cpu_has_posted_intr(vmcs12)) {
>  		/*
>  		 * Note that we use L0's vector here and in
> @@ -10727,8 +10732,14 @@ static void nested_vmx_vmexit(struct
> kvm_vcpu *vcpu, u32 exit_reason, 
>  	load_vmcs12_host_state(vcpu, vmcs12);
>  
> -	/* Update TSC_OFFSET if TSC was changed while L2 ran */
> +	/* Update any VMCS fields that might have changed while L2
> ran */ vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
> +	if (vmx->hv_deadline_tsc == -1)
> +		vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
> +				PIN_BASED_VMX_PREEMPTION_TIMER);
> +	else
> +		vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
> +			      PIN_BASED_VMX_PREEMPTION_TIMER);

Why do we need change the vmcs01 here? Per my understanding, the vmcs01 is not
changed when the L2 guest is running thus the PIN_BASED_VM_EXEC_CONTROL should
not be changed?  I'm not familiar with nested VMX, sorry if this is a naive
question.

Thanks
--jyh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFT PATCH v5 3/3] KVM: nVMX: keep preemption timer enabled during L2 execution
  2016-07-08 17:29   ` yunhong jiang
@ 2016-07-08 17:41     ` Paolo Bonzini
  2016-07-08 21:30       ` yunhong jiang
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-07-08 17:41 UTC (permalink / raw)
  To: yunhong jiang
  Cc: linux-kernel, kvm, Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang



On 08/07/2016 19:29, yunhong jiang wrote:
> >  
> >  	exec_control = vmcs12->pin_based_vm_exec_control;
> > -	exec_control |= vmcs_config.pin_based_exec_ctrl;
> > +
> > +	/* Preemption timer setting is only taken from vmcs01.  */
> >  	exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>
> Do we still keep this clear here with followed changes?

Yes.  If L1 wants to use the preemption timer the bit will be set in
vmcs12->pin_based_vm_exec_control  In this case, however, KVM uses an
hrtimer to emulate L1's preemption timer, so we must not copy the bit
into the vmcs02 (i.e. the VMCS that L0 uses to run L2).  Thus the
preemption timer control of the vmcs02 must come exclusively from
vmx->hv_deadline_tsc.

> > +	exec_control |= vmcs_config.pin_based_exec_ctrl;
> > +	if (vmx->hv_deadline_tsc == -1)
> > +		exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
> >  
> > +	/* Posted interrupts setting is only taken from vmcs12.  */
> >  	if (nested_cpu_has_posted_intr(vmcs12)) {
> >  		/*
> >  		 * Note that we use L0's vector here and in
> > @@ -10727,8 +10732,14 @@ static void nested_vmx_vmexit(struct
> > kvm_vcpu *vcpu, u32 exit_reason, 
> >  	load_vmcs12_host_state(vcpu, vmcs12);
> >  
> > -	/* Update TSC_OFFSET if TSC was changed while L2 ran */
> > +	/* Update any VMCS fields that might have changed while L2
> > ran */ vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
> > +	if (vmx->hv_deadline_tsc == -1)
> > +		vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
> > +				PIN_BASED_VMX_PREEMPTION_TIMER);
> > +	else
> > +		vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
> > +			      PIN_BASED_VMX_PREEMPTION_TIMER);
>
> Why do we need change the vmcs01 here? Per my understanding, the vmcs01 is not
> changed when the L2 guest is running thus the PIN_BASED_VM_EXEC_CONTROL should
> not be changed?

This is the point where we are updating the vmcs01 after exiting.  If
vmx->hv_deadline_tsc has changed (for example because of a preemption
timer vmexit, or because L2 did a HLT and L1 is not intercepting HLT) we
need to update the preemption timer control to synchronize it with
vmx->hv_deadline_tsc.

> I'm not familiar with nested VMX, sorry if this is a naive question.

It's not naive, don't worry! :)

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFT PATCH v5 3/3] KVM: nVMX: keep preemption timer enabled during L2 execution
  2016-07-08 17:41     ` Paolo Bonzini
@ 2016-07-08 21:30       ` yunhong jiang
  2016-07-08 21:39         ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: yunhong jiang @ 2016-07-08 21:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

On Fri, 8 Jul 2016 19:41:48 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> 
> 
> On 08/07/2016 19:29, yunhong jiang wrote:
> > >  
> > >  	exec_control = vmcs12->pin_based_vm_exec_control;
> > > -	exec_control |= vmcs_config.pin_based_exec_ctrl;
> > > +
> > > +	/* Preemption timer setting is only taken from vmcs01.
> > > */ exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
> >
> > Do we still keep this clear here with followed changes?
> 
> Yes.  If L1 wants to use the preemption timer the bit will be set in
> vmcs12->pin_based_vm_exec_control  In this case, however, KVM uses an
> hrtimer to emulate L1's preemption timer, so we must not copy the bit
> into the vmcs02 (i.e. the VMCS that L0 uses to run L2).  Thus the
> preemption timer control of the vmcs02 must come exclusively from
> vmx->hv_deadline_tsc.

Thanks for the clarification.

> 
> > > +	exec_control |= vmcs_config.pin_based_exec_ctrl;
> > > +	if (vmx->hv_deadline_tsc == -1)
> > > +		exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
> > >  
> > > +	/* Posted interrupts setting is only taken from vmcs12.
> > > */ if (nested_cpu_has_posted_intr(vmcs12)) {
> > >  		/*
> > >  		 * Note that we use L0's vector here and in
> > > @@ -10727,8 +10732,14 @@ static void nested_vmx_vmexit(struct
> > > kvm_vcpu *vcpu, u32 exit_reason, 
> > >  	load_vmcs12_host_state(vcpu, vmcs12);
> > >  
> > > -	/* Update TSC_OFFSET if TSC was changed while L2 ran */
> > > +	/* Update any VMCS fields that might have changed while
> > > L2 ran */ vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
> > > +	if (vmx->hv_deadline_tsc == -1)
> > > +		vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
> > > +				PIN_BASED_VMX_PREEMPTION_TIMER);
> > > +	else
> > > +		vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
> > > +			      PIN_BASED_VMX_PREEMPTION_TIMER);
> >
> > Why do we need change the vmcs01 here? Per my understanding, the
> > vmcs01 is not changed when the L2 guest is running thus the
> > PIN_BASED_VM_EXEC_CONTROL should not be changed?
> 
> This is the point where we are updating the vmcs01 after exiting.  If
> vmx->hv_deadline_tsc has changed (for example because of a preemption


Thanks for the explaination. I try to go through the code and still
have one question. I'd describe below and hope get your input.

When the L2 guest running while the VMX Preemption timer triggered, the
vcpu_enter_guest() will trigger vmx_handle_exit(), with the CPU vmcs as
vmcs02. On the vmx_handle_exit(), the nested_vmx_exit_handled() return
false as the 1st patch did, thus the vmcs is not switched. The
kvm_lapic_expired_hv_timer() will be called with vmcs02, instead of
vmcs01. Is it something we wanted? I assume we should use vmcs01 there
since we will clear the preemption timer VMCS bit there.

Thanks
--jyh

> timer vmexit, or because L2 did a HLT and L1 is not intercepting HLT)
> we need to update the preemption timer control to synchronize it with
> vmx->hv_deadline_tsc.
> 
> > I'm not familiar with nested VMX, sorry if this is a naive question.
> 
> It's not naive, don't worry! :)
> 
> Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFT PATCH v5 3/3] KVM: nVMX: keep preemption timer enabled during L2 execution
  2016-07-08 21:30       ` yunhong jiang
@ 2016-07-08 21:39         ` Paolo Bonzini
  2016-07-08 23:18           ` yunhong jiang
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-07-08 21:39 UTC (permalink / raw)
  To: yunhong jiang
  Cc: linux-kernel, kvm, Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

> > > > @@ -10727,8 +10732,14 @@ static void nested_vmx_vmexit(struct
> > > > kvm_vcpu *vcpu, u32 exit_reason,
> > > >  	load_vmcs12_host_state(vcpu, vmcs12);
> > > >  
> > > > -	/* Update TSC_OFFSET if TSC was changed while L2 ran */
> > > > +	/* Update any VMCS fields that might have changed while
> > > > L2 ran */ vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
> > > > +	if (vmx->hv_deadline_tsc == -1)
> > > > +		vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
> > > > +				PIN_BASED_VMX_PREEMPTION_TIMER);
> > > > +	else
> > > > +		vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
> > > > +			      PIN_BASED_VMX_PREEMPTION_TIMER);
> > >
> > > Why do we need change the vmcs01 here? Per my understanding, the
> > > vmcs01 is not changed when the L2 guest is running thus the
> > > PIN_BASED_VM_EXEC_CONTROL should not be changed?
> > 
> > This is the point where we are updating the vmcs01 after exiting.  If
> > vmx->hv_deadline_tsc has changed (for example because of a preemption
> 
> Thanks for the explaination. I try to go through the code and still
> have one question. I'd describe below and hope get your input.
> 
> When the L2 guest running while the VMX Preemption timer triggered, the
> vcpu_enter_guest() will trigger vmx_handle_exit(), with the CPU vmcs as
> vmcs02. On the vmx_handle_exit(), the nested_vmx_exit_handled() return
> false as the 1st patch did, thus the vmcs is not switched. The
> kvm_lapic_expired_hv_timer() will be called with vmcs02, instead of
> vmcs01. Is it something we wanted? I assume we should use vmcs01 there
> since we will clear the preemption timer VMCS bit there.

Actually we want both.  For whatever reason, the interrupt might not
cause a vmexit---for example if the L0 PPR is masking the LVTT vector.
In this case, we need to cancel the preemption timer in the vmcs02
(done by kvm_lapic_expired_hv_timer) and keep running L2.  On the next
vmexit, nested_vmx_vmexit will load the vmcs01 and clear the preemption
timer bit.

Of course this is only theory until Wanpeng confirms that my patch works
for him. :)

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFT PATCH v5 3/3] KVM: nVMX: keep preemption timer enabled during L2 execution
  2016-07-08 21:39         ` Paolo Bonzini
@ 2016-07-08 23:18           ` yunhong jiang
  0 siblings, 0 replies; 11+ messages in thread
From: yunhong jiang @ 2016-07-08 23:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

On Fri, 8 Jul 2016 17:39:22 -0400 (EDT)
Paolo Bonzini <pbonzini@redhat.com> wrote:

> > > > > @@ -10727,8 +10732,14 @@ static void nested_vmx_vmexit(struct
> > > > > kvm_vcpu *vcpu, u32 exit_reason,
> > > > >  	load_vmcs12_host_state(vcpu, vmcs12);
> > > > >  
> > > > > -	/* Update TSC_OFFSET if TSC was changed while L2 ran
> > > > > */
> > > > > +	/* Update any VMCS fields that might have changed
> > > > > while L2 ran */ vmcs_write64(TSC_OFFSET,
> > > > > vmx->nested.vmcs01_tsc_offset);
> > > > > +	if (vmx->hv_deadline_tsc == -1)
> > > > > +		vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL,
> > > > > +
> > > > > PIN_BASED_VMX_PREEMPTION_TIMER);
> > > > > +	else
> > > > > +		vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
> > > > > +
> > > > > PIN_BASED_VMX_PREEMPTION_TIMER);
> > > >
> > > > Why do we need change the vmcs01 here? Per my understanding, the
> > > > vmcs01 is not changed when the L2 guest is running thus the
> > > > PIN_BASED_VM_EXEC_CONTROL should not be changed?
> > > 
> > > This is the point where we are updating the vmcs01 after
> > > exiting.  If vmx->hv_deadline_tsc has changed (for example
> > > because of a preemption
> > 
> > Thanks for the explaination. I try to go through the code and still
> > have one question. I'd describe below and hope get your input.
> > 
> > When the L2 guest running while the VMX Preemption timer triggered,
> > the vcpu_enter_guest() will trigger vmx_handle_exit(), with the CPU
> > vmcs as vmcs02. On the vmx_handle_exit(), the
> > nested_vmx_exit_handled() return false as the 1st patch did, thus
> > the vmcs is not switched. The kvm_lapic_expired_hv_timer() will be
> > called with vmcs02, instead of vmcs01. Is it something we wanted? I
> > assume we should use vmcs01 there since we will clear the
> > preemption timer VMCS bit there.
> 
> Actually we want both.  For whatever reason, the interrupt might not
> cause a vmexit---for example if the L0 PPR is masking the LVTT vector.
> In this case, we need to cancel the preemption timer in the vmcs02
> (done by kvm_lapic_expired_hv_timer) and keep running L2.  On the next
> vmexit, nested_vmx_vmexit will load the vmcs01 and clear the
> preemption timer bit.

Got it and thanks for clarification.

--jyh

> 
> Of course this is only theory until Wanpeng confirms that my patch
> works for him. :)
> 
> Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFT PATCH v5 0/3] Fix preemption timer optimization while nested guests run
  2016-07-08 12:02 [RFT PATCH v5 0/3] Fix preemption timer optimization while nested guests run Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-07-08 12:02 ` [RFT PATCH v5 3/3] KVM: nVMX: keep preemption timer enabled during L2 execution Paolo Bonzini
@ 2016-07-11  4:56 ` Wanpeng Li
  2016-07-11  7:29   ` Paolo Bonzini
  3 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2016-07-11  4:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

2016-07-08 20:02 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> As mentioned earlier, I don't have a reproducer yet that requires any
> changes beyond Wanpeng's (patch 1)---it's possible of course to write
> a kvm-unit-test testcase but I haven't had time for that yet.
>
> Thanks,
>
> Paolo Bonzini (2):
>   KVM: VMX: reflect broken preemption timer in vmcs_config
>   KVM: nVMX: keep preemption timer enabled during L2 execution

Cool, this patchset works after my testing. Feel free to add my
Tested-by for your two patches. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFT PATCH v5 0/3] Fix preemption timer optimization while nested guests run
  2016-07-11  4:56 ` [RFT PATCH v5 0/3] Fix preemption timer optimization while nested guests run Wanpeng Li
@ 2016-07-11  7:29   ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2016-07-11  7:29 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang



On 11/07/2016 06:56, Wanpeng Li wrote:
> 2016-07-08 20:02 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>> As mentioned earlier, I don't have a reproducer yet that requires any
>> changes beyond Wanpeng's (patch 1)---it's possible of course to write
>> a kvm-unit-test testcase but I haven't had time for that yet.
>>
>> Thanks,
>>
>> Paolo Bonzini (2):
>>   KVM: VMX: reflect broken preemption timer in vmcs_config
>>   KVM: nVMX: keep preemption timer enabled during L2 execution
> 
> Cool, this patchset works after my testing. Feel free to add my
> Tested-by for your two patches. :)

Great, thanks!

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-07-11  7:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-08 12:02 [RFT PATCH v5 0/3] Fix preemption timer optimization while nested guests run Paolo Bonzini
2016-07-08 12:02 ` [RFT PATCH v5 1/3] KVM: nVMX: avoid incorrect preemption timer vmexit in nested guest Paolo Bonzini
2016-07-08 12:02 ` [RFT PATCH v5 2/3] KVM: VMX: reflect broken preemption timer in vmcs_config Paolo Bonzini
2016-07-08 12:02 ` [RFT PATCH v5 3/3] KVM: nVMX: keep preemption timer enabled during L2 execution Paolo Bonzini
2016-07-08 17:29   ` yunhong jiang
2016-07-08 17:41     ` Paolo Bonzini
2016-07-08 21:30       ` yunhong jiang
2016-07-08 21:39         ` Paolo Bonzini
2016-07-08 23:18           ` yunhong jiang
2016-07-11  4:56 ` [RFT PATCH v5 0/3] Fix preemption timer optimization while nested guests run Wanpeng Li
2016-07-11  7:29   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).