All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] KVM: nVMX: Do not recalc IOAPIC handled vectors while running L2
@ 2017-11-10  1:24 Liran Alon
  2017-11-10  1:24 ` [PATCH] " Liran Alon
  0 siblings, 1 reply; 6+ messages in thread
From: Liran Alon @ 2017-11-10  1:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm; +Cc: jmattson, wanpeng.li, idan.brown

Hi,

This series, of a single patch, aim to fix a bug caused by 
recalculation of IOAPIC handled vectors while
guest is in L2 (is_guest_mode(vcpu)==true).

This bug leads to wrongly overwriting vmcs02->eoi_exit_bitmap (which
should always be equal to vmcs12->eoi_exit_bitmap) instead of writing
new values to vmcs01->eoi_exit_bitmap.

The patch fixes this issue by delaying processing of
KVM_REQ_SCAN_IOAPIC to execute only when running L1
(is_guest_mode(vcpu)==false).

We have also written a kvm-unit-test that severs as a regression test
to verify that this bug is indeed fixed. We will commit that
kvm-unit-test in a seperate series of patches soon.

Thanks.
-Liran Alon

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] KVM: nVMX: Do not recalc IOAPIC handled vectors while running L2
  2017-11-10  1:24 [PATCH 0/1] KVM: nVMX: Do not recalc IOAPIC handled vectors while running L2 Liran Alon
@ 2017-11-10  1:24 ` Liran Alon
  2017-11-10  8:37   ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Liran Alon @ 2017-11-10  1:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm
  Cc: jmattson, wanpeng.li, idan.brown, Liran Alon, Krish Sadhukhan

When L1 IOAPIC redirection-table is written, a request of
KVM_REQ_SCAN_IOAPIC is set on all vCPUs. This is done such that
all vCPUs will now recalc their IOAPIC handled vectors.

However, it could be that one of the vCPUs is currently running
L2. In this case, vcpu_scan_ioapic() will be called while
is_guest_mode(vcpu) == true. In this case, load_eoi_exitmap()
will be called which would write to vmcs02->eoi_exit_bitmap,
which is wrong because vmcs02->eoi_exit_bitmap should always
be equal to vmcs12->eoi_exit_bitmap.
Furthermore, at this point KVM_REQ_SCAN_IOAPIC was already
consumed and therefore we will never update vmcs01->eoi_exit_bitmap.
Which could lead to remote_irr of some IOAPIC level-triggered entry
to remain set forever.

Fix this issue by delaying KVM_REQ_SCAN_IOAPIC processing to execute
only when running L1 (is_guest_mode(vcpu) == false).

Issue was reproduced with the following setup:
* L0 runs KVM with 64 CPUs
* L1 runs ESXi 6.0 with 8 CPUs
* ESXi runs 4 L2 VMs:
1. Windows 8.1 32bit with 4 CPUs
2. Ubuntu 17 Server with 4 CPUs
3. Ubuntu Desktop with 2 CPUs
4. CentOS 32bit with 1 CPU
A short while after booting all the L2 VMs, ESXi lost networking.
Examining the issue revealed that ESXi dynamically reconfigures
the IOAPIC redirection-table entry of the NIC. Shortly after
leading to that entry's remote_irr being set forever.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c73e493adf07..ceb8beb1bfc9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -498,6 +498,7 @@ struct kvm_vcpu_arch {
 	u64 apic_base;
 	struct kvm_lapic *apic;    /* kernel irqchip context */
 	bool apicv_active;
+	bool scan_ioapic_pending;
 	DECLARE_BITMAP(ioapic_handled_vectors, 256);
 	unsigned long apic_attention;
 	int32_t apic_arb_prio;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03869eb7fcd6..ac1339148a9a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6720,6 +6720,12 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 	if (!kvm_apic_hw_enabled(vcpu->arch.apic))
 		return;
 
+	if (is_guest_mode(vcpu)) {
+		vcpu->arch.scan_ioapic_pending = true;
+		return;
+	}
+	vcpu->arch.scan_ioapic_pending = false;
+
 	bitmap_zero(vcpu->arch.ioapic_handled_vectors, 256);
 
 	if (irqchip_split(vcpu->kvm))
@@ -6833,7 +6839,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 				goto out;
 			}
 		}
-		if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
+		if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu) ||
+		    (!is_guest_mode(vcpu) && vcpu->arch.scan_ioapic_pending))
 			vcpu_scan_ioapic(vcpu);
 		if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
 			kvm_vcpu_reload_apic_access_page(vcpu);
@@ -7981,6 +7988,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 	kvm = vcpu->kvm;
 
 	vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu);
+	vcpu->arch.scan_ioapic_pending = false;
 	vcpu->arch.pv.pv_unhalted = false;
 	vcpu->arch.emulate_ctxt.ops = &emulate_ops;
 	if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_reset_bsp(vcpu))
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] KVM: nVMX: Do not recalc IOAPIC handled vectors while running L2
  2017-11-10  1:24 ` [PATCH] " Liran Alon
@ 2017-11-10  8:37   ` Paolo Bonzini
  2017-11-10 21:47     ` Liran Alon
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2017-11-10  8:37 UTC (permalink / raw)
  To: Liran Alon, rkrcmar, kvm
  Cc: jmattson, wanpeng.li, idan.brown, Krish Sadhukhan

On 10/11/2017 02:24, Liran Alon wrote:
> 
> @@ -6720,6 +6720,12 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
>  	if (!kvm_apic_hw_enabled(vcpu->arch.apic))
>  		return;
>  
> +	if (is_guest_mode(vcpu)) {
> +		vcpu->arch.scan_ioapic_pending = true;
> +		return;
> +	}
> +	vcpu->arch.scan_ioapic_pending = false;
> +
>  	bitmap_zero(vcpu->arch.ioapic_handled_vectors, 256);
>  
>  	if (irqchip_split(vcpu->kvm))

I am not sure it is correct to exit immediately.  The bug is that you're
losing the EOI exit bitmap in the vmcs01; however, with your patch
you're losing it in the vmcs02.  If the L1 hypervisor passes the local
APIC registers to the L2 guest, you would have a similar bug.

However, the patch is generally in the right direction.  Another small
issue:

> @@ -6833,7 +6839,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  				goto out;
>  			}
>  		}
> -		if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
> +		if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu) ||
> +		    (!is_guest_mode(vcpu) && vcpu->arch.scan_ioapic_pending))
>  			vcpu_scan_ioapic(vcpu);

I think this is not reliable because kvm_request_pending(vcpu) might be
false here---and then you never reach the "if" in the first place.

Maybe you can add a

	if (vcpu->arch.scan_ioapic_pending)
		kvm_make_request(KVM_REQ_SCAN_IOAPIC, vcpu);

to leave_guest_mode in arch/x86/kvm/kvm_cache_regs.h?

Thanks,

Paolo

>  		if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
>  			kvm_vcpu_reload_apic_access_page(vcpu);
> @@ -7981,6 +7988,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>  	kvm = vcpu->kvm;
>  
>  	vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu);
> +	vcpu->arch.scan_ioapic_pending = false;
>  	vcpu->arch.pv.pv_unhalted = false;
>  	vcpu->arch.emulate_ctxt.ops = &emulate_ops;
>  	if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_reset_bsp(vcpu))
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] KVM: nVMX: Do not recalc IOAPIC handled vectors while running L2
  2017-11-10  8:37   ` Paolo Bonzini
@ 2017-11-10 21:47     ` Liran Alon
  2017-11-10 22:26       ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Liran Alon @ 2017-11-10 21:47 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, kvm
  Cc: jmattson, wanpeng.li, idan.brown, Krish Sadhukhan



On 10/11/17 10:37, Paolo Bonzini wrote:
> On 10/11/2017 02:24, Liran Alon wrote:
>>
>> @@ -6720,6 +6720,12 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
>>   	if (!kvm_apic_hw_enabled(vcpu->arch.apic))
>>   		return;
>>
>> +	if (is_guest_mode(vcpu)) {
>> +		vcpu->arch.scan_ioapic_pending = true;
>> +		return;
>> +	}
>> +	vcpu->arch.scan_ioapic_pending = false;
>> +
>>   	bitmap_zero(vcpu->arch.ioapic_handled_vectors, 256);
>>
>>   	if (irqchip_split(vcpu->kvm))
>
> I am not sure it is correct to exit immediately.  The bug is that you're
> losing the EOI exit bitmap in the vmcs01; however, with your patch
> you're losing it in the vmcs02.  If the L1 hypervisor passes the local
> APIC registers to the L2 guest, you would have a similar bug.

I have indeed not thought about this case. However, I think it is also 
currently not supported.

Assume that at some point of time L1 IOAPIC redirection-table is already 
fully-configured and is not going to be changed by L1 anymore. Further 
assume that L1 wants to run L2 and give him full access to L1 LAPIC. In 
this case, vmcs12->eoi_exit_bitmap will be empty (all zeros) as L1 
assumes EOIs to L1 LAPIC are always broadcasted to L1 IOAPIC (as happens 
in real hardware). When L1 will exec VMRESUME, prepare_vmcs02() will 
just make vmcs02->eoi_exit_bitmap a copy of vmcs12->eoi_exit_bitmap and 
therefore empty as-well. Therefore, L2 EOIs will not cause EOI_INDUCED 
exits to L0 and therefore will not reach L1 IOAPIC emulation code at L0.

In order to fully fix the issue at hand here, you will basically need 
vmcs02->eoi_exit_bitmap to be a logical OR of vmcs12->eoi_exit_bitmap 
and vmcs01->eoi_exit_bitmap. Then on EOI_INDUCED exits to L0, 
nested_vmx_exit_reflected() will decide if it should be forwarded to L1 
or not. That will also fix the bug this patch aims to fix.
However, I am a bit worry about this approach as it can cause a 
significant performance overhead for the common case of not pass-through 
L1 LAPIC to L2 for supporting this esoteric case.
We will try to think if we can come up with a better approach to solving 
this without such a performance hit.

However, I do think that because this scenario is currently not 
supported by KVM, we should consider first apply the v2 of this patch 
(Because the other small issue you found) and only then try to find a 
good-enough fix for the case you just mentioned here.

Am I missing something here or does it make sense?

>
> However, the patch is generally in the right direction.  Another small
> issue:
>
>> @@ -6833,7 +6839,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>   				goto out;
>>   			}
>>   		}
>> -		if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
>> +		if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu) ||
>> +		    (!is_guest_mode(vcpu) && vcpu->arch.scan_ioapic_pending))
>>   			vcpu_scan_ioapic(vcpu);
>
> I think this is not reliable because kvm_request_pending(vcpu) might be
> false here---and then you never reach the "if" in the first place.
>
> Maybe you can add a
>
> 	if (vcpu->arch.scan_ioapic_pending)
> 		kvm_make_request(KVM_REQ_SCAN_IOAPIC, vcpu);
>
> to leave_guest_mode in arch/x86/kvm/kvm_cache_regs.h?
>

Agree. Sorry I missed that. Our unit-test didn't caught this case :).
I will fix this in v2 of this patch and re-send.
Thanks for pointing this out.

-Liran

> Thanks,
>
> Paolo
>
>>   		if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
>>   			kvm_vcpu_reload_apic_access_page(vcpu);
>> @@ -7981,6 +7988,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>   	kvm = vcpu->kvm;
>>
>>   	vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu);
>> +	vcpu->arch.scan_ioapic_pending = false;
>>   	vcpu->arch.pv.pv_unhalted = false;
>>   	vcpu->arch.emulate_ctxt.ops = &emulate_ops;
>>   	if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_reset_bsp(vcpu))
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] KVM: nVMX: Do not recalc IOAPIC handled vectors while running L2
  2017-11-10 21:47     ` Liran Alon
@ 2017-11-10 22:26       ` Paolo Bonzini
  2017-11-10 23:20         ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2017-11-10 22:26 UTC (permalink / raw)
  To: Liran Alon, rkrcmar, kvm
  Cc: jmattson, wanpeng.li, idan.brown, Krish Sadhukhan

On 10/11/2017 22:47, Liran Alon wrote:
> In order to fully fix the issue at hand here, you will basically need
> vmcs02->eoi_exit_bitmap to be a logical OR of vmcs12->eoi_exit_bitmap
> and vmcs01->eoi_exit_bitmap. Then on EOI_INDUCED exits to L0,
> nested_vmx_exit_reflected() will decide if it should be forwarded to L1
> or not. That will also fix the bug this patch aims to fix.

No, it should be possible to detect whether the APIC is passed through:
if "virtualize APIC access" and "virtual-x2APIC mode" are both clear in
vmcs12, then any EOI_INDUCED exit from L2 must come from the L1 APIC.
Therefore if "virtualize APIC access" is clear, you should set the
vmcs02 EOI exit bitmap to the values in vmcs01, otherwise you should use
the vmcs12 bitmap.

This is more or less the

	else if (!(nested_cpu_has_virt_x2apic_mode(vmcs12)) &&
                   cpu_need_virtualize_apic_accesses(&vmx->vcpu)) {
                vmcs_set_bits(SECONDARY_VM_EXEC_CONTROL,
                              SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
                kvm_vcpu_reload_apic_access_page(vcpu);
        }

case of nested_get_vmcs12_pages.

Thanks,

Paolo

> However, I am a bit worry about this approach as it can cause a
> significant performance overhead for the common case of not pass-through
> L1 LAPIC to L2 for supporting this esoteric case.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] KVM: nVMX: Do not recalc IOAPIC handled vectors while running L2
  2017-11-10 22:26       ` Paolo Bonzini
@ 2017-11-10 23:20         ` Paolo Bonzini
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2017-11-10 23:20 UTC (permalink / raw)
  To: Liran Alon, rkrcmar, kvm

On 10/11/2017 23:26, Paolo Bonzini wrote:
> if "virtualize APIC access" and "virtual-x2APIC mode" are both clear in
> vmcs12, then any EOI_INDUCED exit from L2 must come from the L1 APIC.
> Therefore if "virtualize APIC access"

... and "virtual-x2APIC mode" too...

> is clear, you should set the
> vmcs02 EOI exit bitmap to the values in vmcs01, otherwise you should use
> the vmcs12 bitmap.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-10 23:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-10  1:24 [PATCH 0/1] KVM: nVMX: Do not recalc IOAPIC handled vectors while running L2 Liran Alon
2017-11-10  1:24 ` [PATCH] " Liran Alon
2017-11-10  8:37   ` Paolo Bonzini
2017-11-10 21:47     ` Liran Alon
2017-11-10 22:26       ` Paolo Bonzini
2017-11-10 23:20         ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.