kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2
@ 2017-08-01  2:25 Wanpeng Li
  2017-08-01 19:59 ` Radim Krčmář
  2017-08-03 13:46 ` Radim Krčmář
  0 siblings, 2 replies; 8+ messages in thread
From: Wanpeng Li @ 2017-08-01  2:25 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

------------[ cut here ]------------
 WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
 CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
 RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
Call Trace:
  vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
  ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
  kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
  ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x8f/0x750
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL64_slow_path+0x25/0x25

This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which 
means that tells the kernel to not make use of any IOAPICs that may be present 
in the system.

Actually external_intr variable in nested_vmx_vmexit() is the req_int_win 
variable passed from vcpu_enter_guest() which means that the L0's userspace 
requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
L0's userspace reqeusts an irq window) is true, so there is no interrupt which 
L1 requires to inject to L2, we should not attempt to emualte "Acknowledge 
interrupt on exit" for the irq window requirement in this scenario.

This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"  
if there is no L1 requirement to inject an interrupt to L2.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v1 -> v2:
 * update patch description
 * check nested_exit_intr_ack_set() first 

 arch/x86/kvm/vmx.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2737343..c5a0ab5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 
 	vmx_switch_vmcs(vcpu, &vmx->vmcs01);
 
-	if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
-	    && nested_exit_intr_ack_set(vcpu)) {
+	if (nested_exit_intr_ack_set(vcpu) &&
+		exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
+		kvm_cpu_has_interrupt(vcpu)) {
 		int irq = kvm_cpu_get_interrupt(vcpu);
 		WARN_ON(irq < 0);
 		vmcs12->vm_exit_intr_info = irq |
-- 
2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2
  2017-08-01  2:25 [PATCH v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2 Wanpeng Li
@ 2017-08-01 19:59 ` Radim Krčmář
  2017-08-01 22:42   ` Wanpeng Li
  2017-08-03 13:46 ` Radim Krčmář
  1 sibling, 1 reply; 8+ messages in thread
From: Radim Krčmář @ 2017-08-01 19:59 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-07-31 19:25-0700, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> ------------[ cut here ]------------
>  WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>  CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
>  RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
> Call Trace:
>   vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>   ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>   kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
>   ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
>   ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
>   kvm_vcpu_ioctl+0x340/0x700 [kvm]
>   ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
>   ? __fget+0xfc/0x210
>   do_vfs_ioctl+0xa4/0x6a0
>   ? __fget+0x11d/0x210
>   SyS_ioctl+0x79/0x90
>   do_syscall_64+0x8f/0x750
>   ? trace_hardirqs_on_thunk+0x1a/0x1c
>   entry_SYSCALL64_slow_path+0x25/0x25
> 
> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which 
> means that tells the kernel to not make use of any IOAPICs that may be present 
> in the system.
> 
> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win 
> variable passed from vcpu_enter_guest() which means that the L0's userspace 
> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
> L0's userspace reqeusts an irq window) is true, so there is no interrupt which 
> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge 
> interrupt on exit" for the irq window requirement in this scenario.
> 
> This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"  
> if there is no L1 requirement to inject an interrupt to L2.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> v1 -> v2:
>  * update patch description
>  * check nested_exit_intr_ack_set() first 
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>  
>  	vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>  
> -	if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
> -	    && nested_exit_intr_ack_set(vcpu)) {
> +	if (nested_exit_intr_ack_set(vcpu) &&
> +		exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
> +		kvm_cpu_has_interrupt(vcpu)) {

This would work as a solution, but I don't think it's the correct
behavior.

SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit
interrupt information (valid interrupt) is always set to 1 on
EXIT_REASON_EXTERNAL_INTERRUPT.  We don't want to break hypervisors
expecting an interrupt in that case, so we should do a userspace VM exit
when the window is open and then inject the userspace interrupt with a
VM exit.

The simplest thing that came to my mind is to:

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 39a6222bf968..9ad0c882c4f5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10687,7 +10687,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 		return 0;
 	}
 
-	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
+	if ((kvm_cpu_has_interrupt(vcpu) ||
+	     (external_intr && !nested_exit_intr_ack_set(vcpu))) &&
 	    nested_exit_on_intr(vcpu)) {
 		if (vmx->nested.nested_run_pending)
 			return -EBUSY;

but I think it could break more ... actually, why was the window closed?

kvm_vcpu_ready_for_interrupt_injection() checks vmx_interrupt_allowed()
in order to decide need for the window, but vmx_check_nested_events()
doesn't care about that at all, so the window might just appear closed.
Would the following hunk help too?

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 39a6222bf968..7e6caa9c225d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5567,8 +5567,10 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
-	return (!to_vmx(vcpu)->nested.nested_run_pending &&
-		vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
+	if (is_guest_mode(vcpu))
+		return !to_vmx(vcpu)->nested.nested_run_pending;
+
+	return vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF &&
 		!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
 			(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
 }

(It doesn't prevent malicious userspace from hitting the WARN, though.)

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2
  2017-08-01 19:59 ` Radim Krčmář
@ 2017-08-01 22:42   ` Wanpeng Li
  2017-08-02  8:05     ` Wanpeng Li
  0 siblings, 1 reply; 8+ messages in thread
From: Wanpeng Li @ 2017-08-01 22:42 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-08-02 3:59 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-07-31 19:25-0700, Wanpeng Li:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> ------------[ cut here ]------------
>>  WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>>  CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
>>  RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>> Call Trace:
>>   vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>>   ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>>   kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
>>   ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
>>   ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
>>   kvm_vcpu_ioctl+0x340/0x700 [kvm]
>>   ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
>>   ? __fget+0xfc/0x210
>>   do_vfs_ioctl+0xa4/0x6a0
>>   ? __fget+0x11d/0x210
>>   SyS_ioctl+0x79/0x90
>>   do_syscall_64+0x8f/0x750
>>   ? trace_hardirqs_on_thunk+0x1a/0x1c
>>   entry_SYSCALL64_slow_path+0x25/0x25
>>
>> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
>> means that tells the kernel to not make use of any IOAPICs that may be present
>> in the system.
>>
>> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
>> variable passed from vcpu_enter_guest() which means that the L0's userspace
>> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
>> L0's userspace reqeusts an irq window) is true, so there is no interrupt which
>> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
>> interrupt on exit" for the irq window requirement in this scenario.
>>
>> This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
>> if there is no L1 requirement to inject an interrupt to L2.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> v1 -> v2:
>>  * update patch description
>>  * check nested_exit_intr_ack_set() first
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>>
>>       vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>>
>> -     if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
>> -         && nested_exit_intr_ack_set(vcpu)) {
>> +     if (nested_exit_intr_ack_set(vcpu) &&
>> +             exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
>> +             kvm_cpu_has_interrupt(vcpu)) {
>
> This would work as a solution, but I don't think it's the correct
> behavior.
>
> SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit
> interrupt information (valid interrupt) is always set to 1 on
> EXIT_REASON_EXTERNAL_INTERRUPT.  We don't want to break hypervisors
> expecting an interrupt in that case, so we should do a userspace VM exit
> when the window is open and then inject the userspace interrupt with a
> VM exit.

Agreed.

>
> The simplest thing that came to my mind is to:
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 39a6222bf968..9ad0c882c4f5 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10687,7 +10687,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>                 return 0;
>         }
>
> -       if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
> +       if ((kvm_cpu_has_interrupt(vcpu) ||
> +            (external_intr && !nested_exit_intr_ack_set(vcpu))) &&
>             nested_exit_on_intr(vcpu)) {
>                 if (vmx->nested.nested_run_pending)
>                         return -EBUSY;
>

Agreed.

> but I think it could break more ... actually, why was the window closed?
>
> kvm_vcpu_ready_for_interrupt_injection() checks vmx_interrupt_allowed()
> in order to decide need for the window, but vmx_check_nested_events()
> doesn't care about that at all, so the window might just appear closed.
> Would the following hunk help too?

In addition, the request window can be requested by L0's userspace
(kvm_arch_pre_run), and the idea below still can't fix in my testing.

Regards,
Wanpeng Li

>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 39a6222bf968..7e6caa9c225d 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5567,8 +5567,10 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
>
>  static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
>  {
> -       return (!to_vmx(vcpu)->nested.nested_run_pending &&
> -               vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
> +       if (is_guest_mode(vcpu))
> +               return !to_vmx(vcpu)->nested.nested_run_pending;
> +
> +       return vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF &&
>                 !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
>                         (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
>  }
>
> (It doesn't prevent malicious userspace from hitting the WARN, though.)
>
> Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2
  2017-08-01 22:42   ` Wanpeng Li
@ 2017-08-02  8:05     ` Wanpeng Li
  2017-08-02  8:13       ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Wanpeng Li @ 2017-08-02  8:05 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-08-02 6:42 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> 2017-08-02 3:59 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> 2017-07-31 19:25-0700, Wanpeng Li:
>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>
>>> ------------[ cut here ]------------
>>>  WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>>>  CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
>>>  RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>>> Call Trace:
>>>   vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>>>   ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>>>   kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
>>>   ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
>>>   ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
>>>   kvm_vcpu_ioctl+0x340/0x700 [kvm]
>>>   ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
>>>   ? __fget+0xfc/0x210
>>>   do_vfs_ioctl+0xa4/0x6a0
>>>   ? __fget+0x11d/0x210
>>>   SyS_ioctl+0x79/0x90
>>>   do_syscall_64+0x8f/0x750
>>>   ? trace_hardirqs_on_thunk+0x1a/0x1c
>>>   entry_SYSCALL64_slow_path+0x25/0x25
>>>
>>> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
>>> means that tells the kernel to not make use of any IOAPICs that may be present
>>> in the system.
>>>
>>> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
>>> variable passed from vcpu_enter_guest() which means that the L0's userspace
>>> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
>>> L0's userspace reqeusts an irq window) is true, so there is no interrupt which
>>> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
>>> interrupt on exit" for the irq window requirement in this scenario.
>>>
>>> This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
>>> if there is no L1 requirement to inject an interrupt to L2.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>> ---
>>> v1 -> v2:
>>>  * update patch description
>>>  * check nested_exit_intr_ack_set() first
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>>>
>>>       vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>>>
>>> -     if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
>>> -         && nested_exit_intr_ack_set(vcpu)) {
>>> +     if (nested_exit_intr_ack_set(vcpu) &&
>>> +             exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
>>> +             kvm_cpu_has_interrupt(vcpu)) {
>>
>> This would work as a solution, but I don't think it's the correct
>> behavior.
>>
>> SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit
>> interrupt information (valid interrupt) is always set to 1 on
>> EXIT_REASON_EXTERNAL_INTERRUPT.  We don't want to break hypervisors
>> expecting an interrupt in that case, so we should do a userspace VM exit
>> when the window is open and then inject the userspace interrupt with a
>> VM exit.
>
> Agreed.
>
>>
>> The simplest thing that came to my mind is to:
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 39a6222bf968..9ad0c882c4f5 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -10687,7 +10687,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>>                 return 0;
>>         }
>>
>> -       if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
>> +       if ((kvm_cpu_has_interrupt(vcpu) ||
>> +            (external_intr && !nested_exit_intr_ack_set(vcpu))) &&
>>             nested_exit_on_intr(vcpu)) {
>>                 if (vmx->nested.nested_run_pending)
>>                         return -EBUSY;
>>
>
> Agreed.

What's your opinion, Paolo? :) Actually I considered the above idea
before, it is what SDM defined.

Regards,
Wanpeng Li

>
>> but I think it could break more ... actually, why was the window closed?
>>
>> kvm_vcpu_ready_for_interrupt_injection() checks vmx_interrupt_allowed()
>> in order to decide need for the window, but vmx_check_nested_events()
>> doesn't care about that at all, so the window might just appear closed.
>> Would the following hunk help too?
>
> In addition, the request window can be requested by L0's userspace
> (kvm_arch_pre_run), and the idea below still can't fix in my testing.
>
> Regards,
> Wanpeng Li
>
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 39a6222bf968..7e6caa9c225d 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -5567,8 +5567,10 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
>>
>>  static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
>>  {
>> -       return (!to_vmx(vcpu)->nested.nested_run_pending &&
>> -               vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
>> +       if (is_guest_mode(vcpu))
>> +               return !to_vmx(vcpu)->nested.nested_run_pending;
>> +
>> +       return vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF &&
>>                 !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
>>                         (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
>>  }
>>
>> (It doesn't prevent malicious userspace from hitting the WARN, though.)
>>
>> Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2
  2017-08-02  8:05     ` Wanpeng Li
@ 2017-08-02  8:13       ` Paolo Bonzini
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2017-08-02  8:13 UTC (permalink / raw)
  To: Wanpeng Li, Radim Krčmář; +Cc: linux-kernel, kvm, Wanpeng Li

On 02/08/2017 10:05, Wanpeng Li wrote:
>>>
>>> SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit
>>> interrupt information (valid interrupt) is always set to 1 on
>>> EXIT_REASON_EXTERNAL_INTERRUPT.  We don't want to break hypervisors
>>> expecting an interrupt in that case, so we should do a userspace VM exit
>>> when the window is open and then inject the userspace interrupt with a
>>> VM exit.
>> Agreed.
>>
>>> The simplest thing that came to my mind is to:
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 39a6222bf968..9ad0c882c4f5 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -10687,7 +10687,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>>>                 return 0;
>>>         }
>>>
>>> -       if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
>>> +       if ((kvm_cpu_has_interrupt(vcpu) ||
>>> +            (external_intr && !nested_exit_intr_ack_set(vcpu))) &&
>>>             nested_exit_on_intr(vcpu)) {
>>>                 if (vmx->nested.nested_run_pending)
>>>                         return -EBUSY;
>>>
>> Agreed.
>
> What's your opinion, Paolo? :) Actually I considered the above idea
> before, it is what SDM defined.

Radim and I always agree. :)

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2
  2017-08-01  2:25 [PATCH v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2 Wanpeng Li
  2017-08-01 19:59 ` Radim Krčmář
@ 2017-08-03 13:46 ` Radim Krčmář
  2017-08-04  1:23   ` Wanpeng Li
  1 sibling, 1 reply; 8+ messages in thread
From: Radim Krčmář @ 2017-08-03 13:46 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-07-31 19:25-0700, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> ------------[ cut here ]------------
>  WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>  CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
>  RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
> Call Trace:
>   vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>   ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>   kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
>   ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
>   ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
>   kvm_vcpu_ioctl+0x340/0x700 [kvm]
>   ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
>   ? __fget+0xfc/0x210
>   do_vfs_ioctl+0xa4/0x6a0
>   ? __fget+0x11d/0x210
>   SyS_ioctl+0x79/0x90
>   do_syscall_64+0x8f/0x750
>   ? trace_hardirqs_on_thunk+0x1a/0x1c
>   entry_SYSCALL64_slow_path+0x25/0x25
> 
> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which 
> means that tells the kernel to not make use of any IOAPICs that may be present 
> in the system.
> 
> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win 
> variable passed from vcpu_enter_guest() which means that the L0's userspace 
> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
> L0's userspace reqeusts an irq window) is true, so there is no interrupt which 
> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge 
> interrupt on exit" for the irq window requirement in this scenario.
> 
> This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"  
> if there is no L1 requirement to inject an interrupt to L2.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> v1 -> v2:
>  * update patch description
>  * check nested_exit_intr_ack_set() first 
> 
>  arch/x86/kvm/vmx.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 2737343..c5a0ab5 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>  
>  	vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>  
> -	if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
> -	    && nested_exit_intr_ack_set(vcpu)) {

I've added a TODO comment so it's clearer that we should not be here if
there is no interrupt.

> +	if (nested_exit_intr_ack_set(vcpu) &&
> +		exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
> +		kvm_cpu_has_interrupt(vcpu)) {
>  		int irq = kvm_cpu_get_interrupt(vcpu);
>  		WARN_ON(irq < 0);
>  		vmcs12->vm_exit_intr_info = irq |

Changed the indentation to the original alignment.
Please don't use 1 tab -- the condition and body meld, which makes it
harder to read.  (2 tabs would be ok too.)

And the subject was way too long, so I changed it to
KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"

Applied as it results in better behavior, even if it still is incorrect,
thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2
  2017-08-03 13:46 ` Radim Krčmář
@ 2017-08-04  1:23   ` Wanpeng Li
  2019-10-18 21:12     ` Jim Mattson
  0 siblings, 1 reply; 8+ messages in thread
From: Wanpeng Li @ 2017-08-04  1:23 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-08-03 21:46 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-07-31 19:25-0700, Wanpeng Li:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> ------------[ cut here ]------------
>>  WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>>  CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
>>  RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>> Call Trace:
>>   vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>>   ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>>   kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
>>   ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
>>   ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
>>   kvm_vcpu_ioctl+0x340/0x700 [kvm]
>>   ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
>>   ? __fget+0xfc/0x210
>>   do_vfs_ioctl+0xa4/0x6a0
>>   ? __fget+0x11d/0x210
>>   SyS_ioctl+0x79/0x90
>>   do_syscall_64+0x8f/0x750
>>   ? trace_hardirqs_on_thunk+0x1a/0x1c
>>   entry_SYSCALL64_slow_path+0x25/0x25
>>
>> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
>> means that tells the kernel to not make use of any IOAPICs that may be present
>> in the system.
>>
>> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
>> variable passed from vcpu_enter_guest() which means that the L0's userspace
>> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
>> L0's userspace reqeusts an irq window) is true, so there is no interrupt which
>> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
>> interrupt on exit" for the irq window requirement in this scenario.
>>
>> This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
>> if there is no L1 requirement to inject an interrupt to L2.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> v1 -> v2:
>>  * update patch description
>>  * check nested_exit_intr_ack_set() first
>>
>>  arch/x86/kvm/vmx.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 2737343..c5a0ab5 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>>
>>       vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>>
>> -     if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
>> -         && nested_exit_intr_ack_set(vcpu)) {
>
> I've added a TODO comment so it's clearer that we should not be here if
> there is no interrupt.
>
>> +     if (nested_exit_intr_ack_set(vcpu) &&
>> +             exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
>> +             kvm_cpu_has_interrupt(vcpu)) {
>>               int irq = kvm_cpu_get_interrupt(vcpu);
>>               WARN_ON(irq < 0);
>>               vmcs12->vm_exit_intr_info = irq |
>
> Changed the indentation to the original alignment.
> Please don't use 1 tab -- the condition and body meld, which makes it
> harder to read.  (2 tabs would be ok too.)
>
> And the subject was way too long, so I changed it to
> KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"
>
> Applied as it results in better behavior, even if it still is incorrect,
> thanks.

Thanks Radim. :) In addition, I will think more about it and figure
out a finial solution.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2
  2017-08-04  1:23   ` Wanpeng Li
@ 2019-10-18 21:12     ` Jim Mattson
  0 siblings, 0 replies; 8+ messages in thread
From: Jim Mattson @ 2019-10-18 21:12 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Radim Krčmář,
	linux-kernel, kvm, Paolo Bonzini, Wanpeng Li, Dan Cross,
	Marc Orr

On Thu, Aug 3, 2017 at 6:23 PM Wanpeng Li <kernellwp@gmail.com> wrote:

> Thanks Radim. :) In addition, I will think more about it and figure
> out a finial solution.

Have you had any thoughts on a final solution? We're seeing incorrect
behavior with an L1 hypervisor running under qemu with "-machine
q35,kernel-irqchip=split", and I believe this may be the cause.

In particular, VMCS12 has ACK_INTERRUPT_ON_EXIT set, but L1 is seeing
an L2 exit for "external interrupt" with the VMCS12 VM-exit
interruption information cleared to 0.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-10-18 21:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-01  2:25 [PATCH v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2 Wanpeng Li
2017-08-01 19:59 ` Radim Krčmář
2017-08-01 22:42   ` Wanpeng Li
2017-08-02  8:05     ` Wanpeng Li
2017-08-02  8:13       ` Paolo Bonzini
2017-08-03 13:46 ` Radim Krčmář
2017-08-04  1:23   ` Wanpeng Li
2019-10-18 21:12     ` Jim Mattson

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox