All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest
@ 2016-07-07  3:46 Wanpeng Li
  2016-07-07  3:46 ` [PATCH v3 2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it Wanpeng Li
  2016-07-07  6:48 ` [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest Haozhong Zhang
  0 siblings, 2 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-07-07  3:46 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Wanpeng Li, Paolo Bonzini, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

From: Wanpeng Li <wanpeng.li@hotmail.com>

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] SMP
Call Trace:
 ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
 handle_preemption_timer+0xe/0x20 [kvm_intel]
 vmx_handle_exit+0x169/0x15a0 [kvm_intel]
 ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
 kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
 ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
 ? vcpu_load+0x1c/0x60 [kvm]
 ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
 kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
 do_vfs_ioctl+0x96/0x6a0
 ? __fget_light+0x2a/0x90
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x68/0x180
 entry_SYSCALL64_slow_path+0x25/0x25
Code:  Bad RIP value.
RIP  [<          (null)>]           (null)
 RSP <ffff8800b5263c48>
CR2: 0000000000000000
---[ end trace 9c70c48b1a2bc66e ]---

This can be reproduced readily by preemption timer enabled on L0 and disabled
on L1.

Preemption timer for nested VMX is emulated by hrtimer which is started on L2
entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
nested_vmx_exit_handled is always return true for preemption timer vmexit, then
the L1 preemption timer vmexit is captured and be treated as a L2 preemption
timer vmexit, incurr a nested vmexit dereference NULL pointer.

This patch fix it by depending on check_nested_events to capture L2 preemption
timer(emulated hrtimer) expire and nested vmexit.

Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v2 -> v3:
 * update patch subject
v1 -> v2:
 * fix typo in patch description

 arch/x86/kvm/vmx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 85e2f0a..29c16a8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
 	case EXIT_REASON_PCOMMIT:
 		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
+	case EXIT_REASON_PREEMPTION_TIMER:
+		return false;
 	default:
 		return true;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it
  2016-07-07  3:46 [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest Wanpeng Li
@ 2016-07-07  3:46 ` Wanpeng Li
  2016-07-07  8:10   ` Paolo Bonzini
  2016-07-07  6:48 ` [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest Haozhong Zhang
  1 sibling, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2016-07-07  3:46 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Wanpeng Li, Paolo Bonzini, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

From: Wanpeng Li <wanpeng.li@hotmail.com>

We will go to vcpu_run() loop after L0 emulates VMRESUME which incurs 
kvm_sched_out and kvm_sched_in operations since cond_resched() will be 
called once need resched. Preemption timer will be reprogrammed if vCPU 
is scheduled to a different pCPU. Then the preemption timer bit of vmcs02 
will be set if L0 enable preemption timer to run L1 even if L1 doesn't 
enable preemption timer to run L2.

This patch fix it by don't reprogram preemption timer of vmcs02 if L1's 
vCPU is scheduled on diffent pCPU when we are in the way to vmresume 
nested guest.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0cc6cf8..e8fe16a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2742,7 +2742,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		if (tsc_delta < 0)
 			mark_tsc_unstable("KVM discovered backwards TSC");
 
-		if (kvm_lapic_hv_timer_in_use(vcpu) &&
+		if (!is_guest_mode(vcpu) &&
+			kvm_lapic_hv_timer_in_use(vcpu) &&
 				kvm_x86_ops->set_hv_timer(vcpu,
 					kvm_get_lapic_tscdeadline_msr(vcpu)))
 			kvm_lapic_switch_to_sw_timer(vcpu);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest
  2016-07-07  3:46 [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest Wanpeng Li
  2016-07-07  3:46 ` [PATCH v3 2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it Wanpeng Li
@ 2016-07-07  6:48 ` Haozhong Zhang
  2016-07-07  6:56   ` Wanpeng Li
  1 sibling, 1 reply; 11+ messages in thread
From: Haozhong Zhang @ 2016-07-07  6:48 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Wanpeng Li, Paolo Bonzini,
	Radim Krčmář,
	Yunhong Jiang, Jan Kiszka

On 07/07/16 11:46, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: [<          (null)>]           (null)
> PGD 0
> Oops: 0010 [#1] SMP
> Call Trace:
>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
>  handle_preemption_timer+0xe/0x20 [kvm_intel]
>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>  ? vcpu_load+0x1c/0x60 [kvm]
>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>  do_vfs_ioctl+0x96/0x6a0
>  ? __fget_light+0x2a/0x90
>  SyS_ioctl+0x79/0x90
>  do_syscall_64+0x68/0x180
>  entry_SYSCALL64_slow_path+0x25/0x25
> Code:  Bad RIP value.
> RIP  [<          (null)>]           (null)
>  RSP <ffff8800b5263c48>
> CR2: 0000000000000000
> ---[ end trace 9c70c48b1a2bc66e ]---
> 
> This can be reproduced readily by preemption timer enabled on L0 and disabled
> on L1.
> 
> Preemption timer for nested VMX is emulated by hrtimer which is started on L2
> entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
> nested_vmx_exit_handled is always return true for preemption timer vmexit, then
> the L1 preemption timer vmexit is captured and be treated as a L2 preemption
> timer vmexit, incurr a nested vmexit dereference NULL pointer.
> 
> This patch fix it by depending on check_nested_events to capture L2 preemption
> timer(emulated hrtimer) expire and nested vmexit.
> 
> Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> Cc: Jan Kiszka <jan.kiszka@siemens.com>
> Cc: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> v2 -> v3:
>  * update patch subject
> v1 -> v2:
>  * fix typo in patch description
> 
>  arch/x86/kvm/vmx.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 85e2f0a..29c16a8 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
>  		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
>  	case EXIT_REASON_PCOMMIT:
>  		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
> +	case EXIT_REASON_PREEMPTION_TIMER:
> +		return false;

If patch 2 can avoid accidentally enabling preemption timer in vmcs02,
will this one still be needed?

Haozhong


>  	default:
>  		return true;
>  	}
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest
  2016-07-07  6:48 ` [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest Haozhong Zhang
@ 2016-07-07  6:56   ` Wanpeng Li
  2016-07-07  7:02     ` Haozhong Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2016-07-07  6:56 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm, Wanpeng Li, Paolo Bonzini,
	Radim Krčmář,
	Yunhong Jiang, Jan Kiszka

2016-07-07 14:48 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
> On 07/07/16 11:46, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> BUG: unable to handle kernel NULL pointer dereference at           (null)
>> IP: [<          (null)>]           (null)
>> PGD 0
>> Oops: 0010 [#1] SMP
>> Call Trace:
>>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
>>  handle_preemption_timer+0xe/0x20 [kvm_intel]
>>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
>>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
>>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>>  ? vcpu_load+0x1c/0x60 [kvm]
>>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
>>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>>  do_vfs_ioctl+0x96/0x6a0
>>  ? __fget_light+0x2a/0x90
>>  SyS_ioctl+0x79/0x90
>>  do_syscall_64+0x68/0x180
>>  entry_SYSCALL64_slow_path+0x25/0x25
>> Code:  Bad RIP value.
>> RIP  [<          (null)>]           (null)
>>  RSP <ffff8800b5263c48>
>> CR2: 0000000000000000
>> ---[ end trace 9c70c48b1a2bc66e ]---
>>
>> This can be reproduced readily by preemption timer enabled on L0 and disabled
>> on L1.
>>
>> Preemption timer for nested VMX is emulated by hrtimer which is started on L2
>> entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
>> nested_vmx_exit_handled is always return true for preemption timer vmexit, then
>> the L1 preemption timer vmexit is captured and be treated as a L2 preemption
>> timer vmexit, incurr a nested vmexit dereference NULL pointer.
>>
>> This patch fix it by depending on check_nested_events to capture L2 preemption
>> timer(emulated hrtimer) expire and nested vmexit.
>>
>> Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
>> Cc: Jan Kiszka <jan.kiszka@siemens.com>
>> Cc: Haozhong Zhang <haozhong.zhang@intel.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> v2 -> v3:
>>  * update patch subject
>> v1 -> v2:
>>  * fix typo in patch description
>>
>>  arch/x86/kvm/vmx.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 85e2f0a..29c16a8 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
>>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
>>       case EXIT_REASON_PCOMMIT:
>>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
>> +     case EXIT_REASON_PREEMPTION_TIMER:
>> +             return false;
>
> If patch 2 can avoid accidentally enabling preemption timer in vmcs02,
> will this one still be needed?

After complete "L1 TSC deadline timer to trigger while L2 is running",
L0's preemption timer fire when L2 is running can result in
(is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) be true, right?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest
  2016-07-07  6:56   ` Wanpeng Li
@ 2016-07-07  7:02     ` Haozhong Zhang
  2016-07-07  7:07       ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Haozhong Zhang @ 2016-07-07  7:02 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Wanpeng Li, Paolo Bonzini,
	Radim Krčmář,
	Yunhong Jiang, Jan Kiszka

On 07/07/16 14:56, Wanpeng Li wrote:
> 2016-07-07 14:48 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
> > On 07/07/16 11:46, Wanpeng Li wrote:
> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at           (null)
> >> IP: [<          (null)>]           (null)
> >> PGD 0
> >> Oops: 0010 [#1] SMP
> >> Call Trace:
> >>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
> >>  handle_preemption_timer+0xe/0x20 [kvm_intel]
> >>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
> >>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
> >>  ? vcpu_load+0x1c/0x60 [kvm]
> >>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
> >>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
> >>  do_vfs_ioctl+0x96/0x6a0
> >>  ? __fget_light+0x2a/0x90
> >>  SyS_ioctl+0x79/0x90
> >>  do_syscall_64+0x68/0x180
> >>  entry_SYSCALL64_slow_path+0x25/0x25
> >> Code:  Bad RIP value.
> >> RIP  [<          (null)>]           (null)
> >>  RSP <ffff8800b5263c48>
> >> CR2: 0000000000000000
> >> ---[ end trace 9c70c48b1a2bc66e ]---
> >>
> >> This can be reproduced readily by preemption timer enabled on L0 and disabled
> >> on L1.
> >>
> >> Preemption timer for nested VMX is emulated by hrtimer which is started on L2
> >> entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
> >> nested_vmx_exit_handled is always return true for preemption timer vmexit, then
> >> the L1 preemption timer vmexit is captured and be treated as a L2 preemption
> >> timer vmexit, incurr a nested vmexit dereference NULL pointer.
> >>
> >> This patch fix it by depending on check_nested_events to capture L2 preemption
> >> timer(emulated hrtimer) expire and nested vmexit.
> >>
> >> Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> Cc: Radim Krčmář <rkrcmar@redhat.com>
> >> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> >> Cc: Jan Kiszka <jan.kiszka@siemens.com>
> >> Cc: Haozhong Zhang <haozhong.zhang@intel.com>
> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> >> ---
> >> v2 -> v3:
> >>  * update patch subject
> >> v1 -> v2:
> >>  * fix typo in patch description
> >>
> >>  arch/x86/kvm/vmx.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index 85e2f0a..29c16a8 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
> >>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
> >>       case EXIT_REASON_PCOMMIT:
> >>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
> >> +     case EXIT_REASON_PREEMPTION_TIMER:
> >> +             return false;
> >
> > If patch 2 can avoid accidentally enabling preemption timer in vmcs02,
> > will this one still be needed?
> 
> After complete "L1 TSC deadline timer to trigger while L2 is running",
> L0's preemption timer fire when L2 is running can result in
> (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) be true, right?
>

In prepare_vmcs02():

    exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
    ...
    vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, exec_control);

so preemption timer will never be enabled while L2 guest is running.

Haozhong

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest
  2016-07-07  7:02     ` Haozhong Zhang
@ 2016-07-07  7:07       ` Wanpeng Li
  2016-07-07  7:25         ` Haozhong Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2016-07-07  7:07 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm, Wanpeng Li, Paolo Bonzini,
	Radim Krčmář,
	Yunhong Jiang, Jan Kiszka

2016-07-07 15:02 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
> On 07/07/16 14:56, Wanpeng Li wrote:
>> 2016-07-07 14:48 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
>> > On 07/07/16 11:46, Wanpeng Li wrote:
>> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
>> >>
>> >> BUG: unable to handle kernel NULL pointer dereference at           (null)
>> >> IP: [<          (null)>]           (null)
>> >> PGD 0
>> >> Oops: 0010 [#1] SMP
>> >> Call Trace:
>> >>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
>> >>  handle_preemption_timer+0xe/0x20 [kvm_intel]
>> >>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
>> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>> >>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
>> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>> >>  ? vcpu_load+0x1c/0x60 [kvm]
>> >>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
>> >>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>> >>  do_vfs_ioctl+0x96/0x6a0
>> >>  ? __fget_light+0x2a/0x90
>> >>  SyS_ioctl+0x79/0x90
>> >>  do_syscall_64+0x68/0x180
>> >>  entry_SYSCALL64_slow_path+0x25/0x25
>> >> Code:  Bad RIP value.
>> >> RIP  [<          (null)>]           (null)
>> >>  RSP <ffff8800b5263c48>
>> >> CR2: 0000000000000000
>> >> ---[ end trace 9c70c48b1a2bc66e ]---
>> >>
>> >> This can be reproduced readily by preemption timer enabled on L0 and disabled
>> >> on L1.
>> >>
>> >> Preemption timer for nested VMX is emulated by hrtimer which is started on L2
>> >> entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
>> >> nested_vmx_exit_handled is always return true for preemption timer vmexit, then
>> >> the L1 preemption timer vmexit is captured and be treated as a L2 preemption
>> >> timer vmexit, incurr a nested vmexit dereference NULL pointer.
>> >>
>> >> This patch fix it by depending on check_nested_events to capture L2 preemption
>> >> timer(emulated hrtimer) expire and nested vmexit.
>> >>
>> >> Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> >> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> >> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
>> >> Cc: Jan Kiszka <jan.kiszka@siemens.com>
>> >> Cc: Haozhong Zhang <haozhong.zhang@intel.com>
>> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> >> ---
>> >> v2 -> v3:
>> >>  * update patch subject
>> >> v1 -> v2:
>> >>  * fix typo in patch description
>> >>
>> >>  arch/x86/kvm/vmx.c | 2 ++
>> >>  1 file changed, 2 insertions(+)
>> >>
>> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> >> index 85e2f0a..29c16a8 100644
>> >> --- a/arch/x86/kvm/vmx.c
>> >> +++ b/arch/x86/kvm/vmx.c
>> >> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
>> >>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
>> >>       case EXIT_REASON_PCOMMIT:
>> >>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
>> >> +     case EXIT_REASON_PREEMPTION_TIMER:
>> >> +             return false;
>> >
>> > If patch 2 can avoid accidentally enabling preemption timer in vmcs02,
>> > will this one still be needed?
>>
>> After complete "L1 TSC deadline timer to trigger while L2 is running",
>> L0's preemption timer fire when L2 is running can result in
>> (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) be true, right?
>>
>
> In prepare_vmcs02():
>
>     exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>     ...
>     vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, exec_control);
>
> so preemption timer will never be enabled while L2 guest is running.

It is not true after Paolo's patch.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest
  2016-07-07  7:07       ` Wanpeng Li
@ 2016-07-07  7:25         ` Haozhong Zhang
  0 siblings, 0 replies; 11+ messages in thread
From: Haozhong Zhang @ 2016-07-07  7:25 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Wanpeng Li, Paolo Bonzini,
	Radim Krčmář,
	Yunhong Jiang, Jan Kiszka

On 07/07/16 15:07, Wanpeng Li wrote:
> 2016-07-07 15:02 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
> > On 07/07/16 14:56, Wanpeng Li wrote:
> >> 2016-07-07 14:48 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
> >> > On 07/07/16 11:46, Wanpeng Li wrote:
> >> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
> >> >>
> >> >> BUG: unable to handle kernel NULL pointer dereference at           (null)
> >> >> IP: [<          (null)>]           (null)
> >> >> PGD 0
> >> >> Oops: 0010 [#1] SMP
> >> >> Call Trace:
> >> >>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
> >> >>  handle_preemption_timer+0xe/0x20 [kvm_intel]
> >> >>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
> >> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
> >> >>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
> >> >>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
> >> >>  ? vcpu_load+0x1c/0x60 [kvm]
> >> >>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
> >> >>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
> >> >>  do_vfs_ioctl+0x96/0x6a0
> >> >>  ? __fget_light+0x2a/0x90
> >> >>  SyS_ioctl+0x79/0x90
> >> >>  do_syscall_64+0x68/0x180
> >> >>  entry_SYSCALL64_slow_path+0x25/0x25
> >> >> Code:  Bad RIP value.
> >> >> RIP  [<          (null)>]           (null)
> >> >>  RSP <ffff8800b5263c48>
> >> >> CR2: 0000000000000000
> >> >> ---[ end trace 9c70c48b1a2bc66e ]---
> >> >>
> >> >> This can be reproduced readily by preemption timer enabled on L0 and disabled
> >> >> on L1.
> >> >>
> >> >> Preemption timer for nested VMX is emulated by hrtimer which is started on L2
> >> >> entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
> >> >> nested_vmx_exit_handled is always return true for preemption timer vmexit, then
> >> >> the L1 preemption timer vmexit is captured and be treated as a L2 preemption
> >> >> timer vmexit, incurr a nested vmexit dereference NULL pointer.
> >> >>
> >> >> This patch fix it by depending on check_nested_events to capture L2 preemption
> >> >> timer(emulated hrtimer) expire and nested vmexit.
> >> >>
> >> >> Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> >> Cc: Radim Krčmář <rkrcmar@redhat.com>
> >> >> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> >> >> Cc: Jan Kiszka <jan.kiszka@siemens.com>
> >> >> Cc: Haozhong Zhang <haozhong.zhang@intel.com>
> >> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> >> >> ---
> >> >> v2 -> v3:
> >> >>  * update patch subject
> >> >> v1 -> v2:
> >> >>  * fix typo in patch description
> >> >>
> >> >>  arch/x86/kvm/vmx.c | 2 ++
> >> >>  1 file changed, 2 insertions(+)
> >> >>
> >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> >> index 85e2f0a..29c16a8 100644
> >> >> --- a/arch/x86/kvm/vmx.c
> >> >> +++ b/arch/x86/kvm/vmx.c
> >> >> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
> >> >>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
> >> >>       case EXIT_REASON_PCOMMIT:
> >> >>               return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
> >> >> +     case EXIT_REASON_PREEMPTION_TIMER:
> >> >> +             return false;
> >> >
> >> > If patch 2 can avoid accidentally enabling preemption timer in vmcs02,
> >> > will this one still be needed?
> >>
> >> After complete "L1 TSC deadline timer to trigger while L2 is running",
> >> L0's preemption timer fire when L2 is running can result in
> >> (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) be true, right?
> >>
> >
> > In prepare_vmcs02():
> >
> >     exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
> >     ...
> >     vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, exec_control);
> >
> > so preemption timer will never be enabled while L2 guest is running.
> 
> It is not true after Paolo's patch.
>

OK, I didn't quite follow discussions before v3 and just noticed
Paolo's patch in v2 comments is to enable L1 preemption timer while L2
is running. Then this one looks reasonable.

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it
  2016-07-07  3:46 ` [PATCH v3 2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it Wanpeng Li
@ 2016-07-07  8:10   ` Paolo Bonzini
  2016-07-07  8:31     ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-07-07  8:10 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang



On 07/07/2016 05:46, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> We will go to vcpu_run() loop after L0 emulates VMRESUME which incurs 
> kvm_sched_out and kvm_sched_in operations since cond_resched() will be 
> called once need resched. Preemption timer will be reprogrammed if vCPU 
> is scheduled to a different pCPU. Then the preemption timer bit of vmcs02 
> will be set if L0 enable preemption timer to run L1 even if L1 doesn't 
> enable preemption timer to run L2.
> 
> This patch fix it by don't reprogram preemption timer of vmcs02 if L1's 
> vCPU is scheduled on diffent pCPU when we are in the way to vmresume 
> nested guest.

Again, this is wrong.  There is no reason why L1's APIC timer cannot be
emulated through the vmcs12's preemption timer setting.  The only issue
is getting the pin-based execution controls right.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it
  2016-07-07  8:10   ` Paolo Bonzini
@ 2016-07-07  8:31     ` Wanpeng Li
  2016-07-07 10:33       ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2016-07-07  8:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

2016-07-07 16:10 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 07/07/2016 05:46, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> We will go to vcpu_run() loop after L0 emulates VMRESUME which incurs
>> kvm_sched_out and kvm_sched_in operations since cond_resched() will be
>> called once need resched. Preemption timer will be reprogrammed if vCPU
>> is scheduled to a different pCPU. Then the preemption timer bit of vmcs02
>> will be set if L0 enable preemption timer to run L1 even if L1 doesn't
>> enable preemption timer to run L2.
>>
>> This patch fix it by don't reprogram preemption timer of vmcs02 if L1's
>> vCPU is scheduled on diffent pCPU when we are in the way to vmresume
>> nested guest.
>
> Again, this is wrong.  There is no reason why L1's APIC timer cannot be
> emulated through the vmcs12's preemption timer setting.  The only issue
> is getting the pin-based execution controls right.

This patch doesn't intend to implement "L1 TSC deadline timer to
trigger while L2 is running", it just solves why vmcs02 is set even if

exec_control = vmcs12->pin_based_vm_exec_control;
exec_control |= vmcs_config.pin_based_exec_ctrl;
exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;

We should set pin-based execution controls right to implement "L1 TSC
deadline timer to trigger while L2 is running".

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it
  2016-07-07  8:31     ` Wanpeng Li
@ 2016-07-07 10:33       ` Paolo Bonzini
  2016-07-07 10:54         ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-07-07 10:33 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang



On 07/07/2016 10:31, Wanpeng Li wrote:
> 2016-07-07 16:10 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>
>>
>> On 07/07/2016 05:46, Wanpeng Li wrote:
>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>
>>> We will go to vcpu_run() loop after L0 emulates VMRESUME which incurs
>>> kvm_sched_out and kvm_sched_in operations since cond_resched() will be
>>> called once need resched. Preemption timer will be reprogrammed if vCPU
>>> is scheduled to a different pCPU. Then the preemption timer bit of vmcs02
>>> will be set if L0 enable preemption timer to run L1 even if L1 doesn't
>>> enable preemption timer to run L2.
>>>
>>> This patch fix it by don't reprogram preemption timer of vmcs02 if L1's
>>> vCPU is scheduled on diffent pCPU when we are in the way to vmresume
>>> nested guest.
>>
>> Again, this is wrong.  There is no reason why L1's APIC timer cannot be
>> emulated through the vmcs12's preemption timer setting.  The only issue
>> is getting the pin-based execution controls right.
> 
> This patch doesn't intend to implement "L1 TSC deadline timer to
> trigger while L2 is running", it just solves why vmcs02 is set even if
> 
> exec_control = vmcs12->pin_based_vm_exec_control;
> exec_control |= vmcs_config.pin_based_exec_ctrl;
> exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
> 
> We should set pin-based execution controls right to implement "L1 TSC
> deadline timer to trigger while L2 is running".

Ok, now I get it, but I still cannot understand the logic in your patch.

You write:

		if (!is_guest_mode(vcpu) &&
			kvm_lapic_hv_timer_in_use(vcpu) &&
 				kvm_x86_ops->set_hv_timer(vcpu,
 					kvm_get_lapic_tscdeadline_msr(vcpu)))
 			kvm_lapic_switch_to_sw_timer(vcpu);

but this means that while L2 runs you miss L1's APIC timer interrupt.  Do you
want this instead:

		if (kvm_lapic_hv_timer_in_use(vcpu) &&
		    (is_guest_mode(vcpu) ||
 		     kvm_x86_ops->set_hv_timer(vcpu,
 					       kvm_get_lapic_tscdeadline_msr(vcpu))))
 			kvm_lapic_switch_to_sw_timer(vcpu);

?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it
  2016-07-07 10:33       ` Paolo Bonzini
@ 2016-07-07 10:54         ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2016-07-07 10:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Wanpeng Li, Radim Krčmář,
	Yunhong Jiang, Jan Kiszka, Haozhong Zhang

2016-07-07 18:33 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 07/07/2016 10:31, Wanpeng Li wrote:
>> 2016-07-07 16:10 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>>
>>>
>>> On 07/07/2016 05:46, Wanpeng Li wrote:
>>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>>
>>>> We will go to vcpu_run() loop after L0 emulates VMRESUME which incurs
>>>> kvm_sched_out and kvm_sched_in operations since cond_resched() will be
>>>> called once need resched. Preemption timer will be reprogrammed if vCPU
>>>> is scheduled to a different pCPU. Then the preemption timer bit of vmcs02
>>>> will be set if L0 enable preemption timer to run L1 even if L1 doesn't
>>>> enable preemption timer to run L2.
>>>>
>>>> This patch fix it by don't reprogram preemption timer of vmcs02 if L1's
>>>> vCPU is scheduled on diffent pCPU when we are in the way to vmresume
>>>> nested guest.
>>>
>>> Again, this is wrong.  There is no reason why L1's APIC timer cannot be
>>> emulated through the vmcs12's preemption timer setting.  The only issue
>>> is getting the pin-based execution controls right.
>>
>> This patch doesn't intend to implement "L1 TSC deadline timer to
>> trigger while L2 is running", it just solves why vmcs02 is set even if
>>
>> exec_control = vmcs12->pin_based_vm_exec_control;
>> exec_control |= vmcs_config.pin_based_exec_ctrl;
>> exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>
>> We should set pin-based execution controls right to implement "L1 TSC
>> deadline timer to trigger while L2 is running".
>
> Ok, now I get it, but I still cannot understand the logic in your patch.
>
> You write:
>
>                 if (!is_guest_mode(vcpu) &&
>                         kvm_lapic_hv_timer_in_use(vcpu) &&
>                                 kvm_x86_ops->set_hv_timer(vcpu,
>                                         kvm_get_lapic_tscdeadline_msr(vcpu)))
>                         kvm_lapic_switch_to_sw_timer(vcpu);
>
> but this means that while L2 runs you miss L1's APIC timer interrupt.  Do you
> want this instead:
>
>                 if (kvm_lapic_hv_timer_in_use(vcpu) &&
>                     (is_guest_mode(vcpu) ||
>                      kvm_x86_ops->set_hv_timer(vcpu,
>                                                kvm_get_lapic_tscdeadline_msr(vcpu))))
>                         kvm_lapic_switch_to_sw_timer(vcpu);

Nice point out. :) I will send out another version.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-07-07 10:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-07  3:46 [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest Wanpeng Li
2016-07-07  3:46 ` [PATCH v3 2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it Wanpeng Li
2016-07-07  8:10   ` Paolo Bonzini
2016-07-07  8:31     ` Wanpeng Li
2016-07-07 10:33       ` Paolo Bonzini
2016-07-07 10:54         ` Wanpeng Li
2016-07-07  6:48 ` [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest Haozhong Zhang
2016-07-07  6:56   ` Wanpeng Li
2016-07-07  7:02     ` Haozhong Zhang
2016-07-07  7:07       ` Wanpeng Li
2016-07-07  7:25         ` Haozhong Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.