[PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
@ 2014-07-02  6:54 Wanpeng Li
  2014-07-02  7:20 ` Hu, Robert
                   ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Wanpeng Li @ 2014-07-02  6:54 UTC (permalink / raw)
  To: Paolo Bonzini, Jan Kiszka, Gleb Natapov
  Cc: Hu Robert, kvm, linux-kernel, Wanpeng Li

This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 

If we didn't inject a still-pending event to L1 since nested_run_pending,
KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
there is no still-pending event to L1 which blocked by nested_run_pending. 
There is a race which lead to an interrupt will be injected to L2 which 
belong to L1 if L0 send an interrupt to L1 during this window. 

               VCPU0                               another thread 

L1 intr not blocked on L2 first entry
vmx_vcpu_run req event 
kvm check request req event 
check_nested_events don't have any intr 
not nested exit 
                                            intr occur (8254, lapic timer etc)
inject_pending_event now have intr 
inject interrupt 

This patch fix this race by introduced a l1_events_blocked field in nested_vmx 
which indicates there is still-pending event which blocked by nested_run_pending, 
and smart request a KVM_REQ_EVENT if there is a still-pending event which blocked 
by nested_run_pending.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
---
 arch/x86/kvm/vmx.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f4e5aed..fe69c49 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -372,6 +372,7 @@ struct nested_vmx {
 	u64 vmcs01_tsc_offset;
 	/* L2 must run next, and mustn't decide to exit to L1. */
 	bool nested_run_pending;
+	bool l1_events_blocked;
 	/*
 	 * Guest pages referred to in vmcs02 with host-physical pointers, so
 	 * we must keep them pinned while L2 runs.
@@ -7380,8 +7381,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	 * we did not inject a still-pending event to L1 now because of
 	 * nested_run_pending, we need to re-enable this bit.
 	 */
-	if (vmx->nested.nested_run_pending)
+	if (to_vmx(vcpu)->nested.l1_events_blocked) {
+		to_vmx(vcpu)->nested.l1_events_blocked = false;
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
+	}
 
 	vmx->nested.nested_run_pending = 0;
 
@@ -8197,15 +8200,20 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 
 	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
 	    vmx->nested.preemption_timer_expired) {
-		if (vmx->nested.nested_run_pending)
+		if (vmx->nested.nested_run_pending) {
+			vmx->nested.l1_events_blocked = true;
 			return -EBUSY;
+		}
 		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
 		return 0;
 	}
 
 	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
-		if (vmx->nested.nested_run_pending ||
-		    vcpu->arch.interrupt.pending)
+		if (vmx->nested.nested_run_pending) {
+			vmx->nested.l1_events_blocked = true;
+			return -EBUSY;
+		}
+		if (vcpu->arch.interrupt.pending)
 			return -EBUSY;
 		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
 				  NMI_VECTOR | INTR_TYPE_NMI_INTR |
@@ -8221,8 +8229,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 
 	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
 	    nested_exit_on_intr(vcpu)) {
-		if (vmx->nested.nested_run_pending)
+		if (vmx->nested.nested_run_pending) {
+			vmx->nested.l1_events_blocked = true;
 			return -EBUSY;
+		}
 		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
 	}
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* RE: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-02  6:54 [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race Wanpeng Li
@ 2014-07-02  7:20 ` Hu, Robert
  2014-07-02  9:03   ` Jan Kiszka
  2014-07-02  9:01 ` Jan Kiszka
  2014-07-02 16:27 ` Bandan Das
  2 siblings, 1 reply; 43+ messages in thread
From: Hu, Robert @ 2014-07-02  7:20 UTC (permalink / raw)
  To: Wanpeng Li, Paolo Bonzini, Jan Kiszka, Gleb Natapov; +Cc: kvm, linux-kernel

> -----Original Message-----
> From: Wanpeng Li [mailto:wanpeng.li@linux.intel.com]
> Sent: Wednesday, July 2, 2014 2:54 PM
> To: Paolo Bonzini; Jan Kiszka; Gleb Natapov
> Cc: Hu, Robert; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; Wanpeng Li
> Subject: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
> 
> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381
> 
> If we didn't inject a still-pending event to L1 since nested_run_pending,
> KVM_REQ_EVENT should be requested after the vmexit in order to inject the
> event to L1. However, current log blindly request a KVM_REQ_EVENT even if
> there is no still-pending event to L1 which blocked by nested_run_pending.
> There is a race which lead to an interrupt will be injected to L2 which
> belong to L1 if L0 send an interrupt to L1 during this window.
> 
>                VCPU0                               another thread
> 
> L1 intr not blocked on L2 first entry
> vmx_vcpu_run req event
> kvm check request req event
> check_nested_events don't have any intr
> not nested exit
>                                             intr occur (8254, lapic timer
> etc)
> inject_pending_event now have intr
> inject interrupt
> 
> This patch fix this race by introduced a l1_events_blocked field in nested_vmx
> which indicates there is still-pending event which blocked by
> nested_run_pending,
> and smart request a KVM_REQ_EVENT if there is a still-pending event which
> blocked
> by nested_run_pending.
> 
> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: Robert Hu<robert.hu@intel.com>
> ---
>  arch/x86/kvm/vmx.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f4e5aed..fe69c49 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -372,6 +372,7 @@ struct nested_vmx {
>  	u64 vmcs01_tsc_offset;
>  	/* L2 must run next, and mustn't decide to exit to L1. */
>  	bool nested_run_pending;
> +	bool l1_events_blocked;
>  	/*
>  	 * Guest pages referred to in vmcs02 with host-physical pointers, so
>  	 * we must keep them pinned while L2 runs.
> @@ -7380,8 +7381,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu
> *vcpu)
>  	 * we did not inject a still-pending event to L1 now because of
>  	 * nested_run_pending, we need to re-enable this bit.
>  	 */
> -	if (vmx->nested.nested_run_pending)
> +	if (to_vmx(vcpu)->nested.l1_events_blocked) {
> +		to_vmx(vcpu)->nested.l1_events_blocked = false;
>  		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +	}
> 
>  	vmx->nested.nested_run_pending = 0;
> 
> @@ -8197,15 +8200,20 @@ static int vmx_check_nested_events(struct
> kvm_vcpu *vcpu, bool external_intr)
> 
>  	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
>  	    vmx->nested.preemption_timer_expired) {
> -		if (vmx->nested.nested_run_pending)
> +		if (vmx->nested.nested_run_pending) {
> +			vmx->nested.l1_events_blocked = true;
>  			return -EBUSY;
> +		}
>  		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
>  		return 0;
>  	}
> 
>  	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
> -		if (vmx->nested.nested_run_pending ||
> -		    vcpu->arch.interrupt.pending)
> +		if (vmx->nested.nested_run_pending) {
> +			vmx->nested.l1_events_blocked = true;
> +			return -EBUSY;
> +		}
> +		if (vcpu->arch.interrupt.pending)
>  			return -EBUSY;
>  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
>  				  NMI_VECTOR | INTR_TYPE_NMI_INTR |
> @@ -8221,8 +8229,10 @@ static int vmx_check_nested_events(struct kvm_vcpu
> *vcpu, bool external_intr)
> 
>  	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
>  	    nested_exit_on_intr(vcpu)) {
> -		if (vmx->nested.nested_run_pending)
> +		if (vmx->nested.nested_run_pending) {
> +			vmx->nested.l1_events_blocked = true;
>  			return -EBUSY;
> +		}
>  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0,
> 0);
>  	}
> 
> --
> 1.9.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-02  6:54 [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race Wanpeng Li
  2014-07-02  7:20 ` Hu, Robert
@ 2014-07-02  9:01 ` Jan Kiszka
  2014-07-03  2:59   ` Wanpeng Li
  2014-07-03  5:15   ` Bandan Das
  2014-07-02 16:27 ` Bandan Das
  2 siblings, 2 replies; 43+ messages in thread
From: Jan Kiszka @ 2014-07-02  9:01 UTC (permalink / raw)
  To: Wanpeng Li, Paolo Bonzini, Gleb Natapov; +Cc: Hu Robert, kvm, linux-kernel

On 2014-07-02 08:54, Wanpeng Li wrote:
> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
> 
> If we didn't inject a still-pending event to L1 since nested_run_pending,
> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
> there is no still-pending event to L1 which blocked by nested_run_pending. 
> There is a race which lead to an interrupt will be injected to L2 which 
> belong to L1 if L0 send an interrupt to L1 during this window. 
> 
>                VCPU0                               another thread 
> 
> L1 intr not blocked on L2 first entry
> vmx_vcpu_run req event 
> kvm check request req event 
> check_nested_events don't have any intr 
> not nested exit 
>                                             intr occur (8254, lapic timer etc)
> inject_pending_event now have intr 
> inject interrupt 
> 
> This patch fix this race by introduced a l1_events_blocked field in nested_vmx 
> which indicates there is still-pending event which blocked by nested_run_pending, 
> and smart request a KVM_REQ_EVENT if there is a still-pending event which blocked 
> by nested_run_pending.

There are more, unrelated reasons why KVM_REQ_EVENT could be set. Why
aren't those able to trigger this scenario?

In any case, unconditionally setting KVM_REQ_EVENT seems strange and
should be changed.

Jan

> 
> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> ---
>  arch/x86/kvm/vmx.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f4e5aed..fe69c49 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -372,6 +372,7 @@ struct nested_vmx {
>  	u64 vmcs01_tsc_offset;
>  	/* L2 must run next, and mustn't decide to exit to L1. */
>  	bool nested_run_pending;
> +	bool l1_events_blocked;
>  	/*
>  	 * Guest pages referred to in vmcs02 with host-physical pointers, so
>  	 * we must keep them pinned while L2 runs.
> @@ -7380,8 +7381,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	 * we did not inject a still-pending event to L1 now because of
>  	 * nested_run_pending, we need to re-enable this bit.
>  	 */
> -	if (vmx->nested.nested_run_pending)
> +	if (to_vmx(vcpu)->nested.l1_events_blocked) {
> +		to_vmx(vcpu)->nested.l1_events_blocked = false;
>  		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +	}
>  
>  	vmx->nested.nested_run_pending = 0;
>  
> @@ -8197,15 +8200,20 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>  
>  	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
>  	    vmx->nested.preemption_timer_expired) {
> -		if (vmx->nested.nested_run_pending)
> +		if (vmx->nested.nested_run_pending) {
> +			vmx->nested.l1_events_blocked = true;
>  			return -EBUSY;
> +		}
>  		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
>  		return 0;
>  	}
>  
>  	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
> -		if (vmx->nested.nested_run_pending ||
> -		    vcpu->arch.interrupt.pending)
> +		if (vmx->nested.nested_run_pending) {
> +			vmx->nested.l1_events_blocked = true;
> +			return -EBUSY;
> +		}
> +		if (vcpu->arch.interrupt.pending)
>  			return -EBUSY;
>  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
>  				  NMI_VECTOR | INTR_TYPE_NMI_INTR |
> @@ -8221,8 +8229,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>  
>  	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
>  	    nested_exit_on_intr(vcpu)) {
> -		if (vmx->nested.nested_run_pending)
> +		if (vmx->nested.nested_run_pending) {
> +			vmx->nested.l1_events_blocked = true;
>  			return -EBUSY;
> +		}
>  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
>  	}
>  
> 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-02  7:20 ` Hu, Robert
@ 2014-07-02  9:03   ` Jan Kiszka
  2014-07-02  9:13     ` Hu, Robert
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Kiszka @ 2014-07-02  9:03 UTC (permalink / raw)
  To: Hu, Robert, Wanpeng Li, Paolo Bonzini, Gleb Natapov; +Cc: kvm, linux-kernel

On 2014-07-02 09:20, Hu, Robert wrote:
>> -----Original Message-----
>> From: Wanpeng Li [mailto:wanpeng.li@linux.intel.com]
>> Sent: Wednesday, July 2, 2014 2:54 PM
>> To: Paolo Bonzini; Jan Kiszka; Gleb Natapov
>> Cc: Hu, Robert; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; Wanpeng Li
>> Subject: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
>>
>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381
>>
>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the
>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if
>> there is no still-pending event to L1 which blocked by nested_run_pending.
>> There is a race which lead to an interrupt will be injected to L2 which
>> belong to L1 if L0 send an interrupt to L1 during this window.
>>
>>                VCPU0                               another thread
>>
>> L1 intr not blocked on L2 first entry
>> vmx_vcpu_run req event
>> kvm check request req event
>> check_nested_events don't have any intr
>> not nested exit
>>                                             intr occur (8254, lapic timer
>> etc)
>> inject_pending_event now have intr
>> inject interrupt
>>
>> This patch fix this race by introduced a l1_events_blocked field in nested_vmx
>> which indicates there is still-pending event which blocked by
>> nested_run_pending,
>> and smart request a KVM_REQ_EVENT if there is a still-pending event which
>> blocked
>> by nested_run_pending.
>>
>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> Tested-by: Robert Hu<robert.hu@intel.com>

Do you happen to have a kvm-unit-test for this race? Or how did you
test? Just curious, and if there is a test, it would be good to
integrate it.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-02  9:03   ` Jan Kiszka
@ 2014-07-02  9:13     ` Hu, Robert
  2014-07-02  9:16       ` Jan Kiszka
  0 siblings, 1 reply; 43+ messages in thread
From: Hu, Robert @ 2014-07-02  9:13 UTC (permalink / raw)
  To: Jan Kiszka, Wanpeng Li, Paolo Bonzini, Gleb Natapov; +Cc: kvm, linux-kernel


> -----Original Message-----
> From: Jan Kiszka [mailto:jan.kiszka@siemens.com]
> Sent: Wednesday, July 2, 2014 5:03 PM
> To: Hu, Robert; Wanpeng Li; Paolo Bonzini; Gleb Natapov
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since
> race
> 
> On 2014-07-02 09:20, Hu, Robert wrote:
> >> -----Original Message-----
> >> From: Wanpeng Li [mailto:wanpeng.li@linux.intel.com]
> >> Sent: Wednesday, July 2, 2014 2:54 PM
> >> To: Paolo Bonzini; Jan Kiszka; Gleb Natapov
> >> Cc: Hu, Robert; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; Wanpeng
> Li
> >> Subject: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since
> race
> >>
> >> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381
> >>
> >> If we didn't inject a still-pending event to L1 since nested_run_pending,
> >> KVM_REQ_EVENT should be requested after the vmexit in order to inject the
> >> event to L1. However, current log blindly request a KVM_REQ_EVENT even if
> >> there is no still-pending event to L1 which blocked by nested_run_pending.
> >> There is a race which lead to an interrupt will be injected to L2 which
> >> belong to L1 if L0 send an interrupt to L1 during this window.
> >>
> >>                VCPU0                               another thread
> >>
> >> L1 intr not blocked on L2 first entry
> >> vmx_vcpu_run req event
> >> kvm check request req event
> >> check_nested_events don't have any intr
> >> not nested exit
> >>                                             intr occur (8254, lapic timer
> >> etc)
> >> inject_pending_event now have intr
> >> inject interrupt
> >>
> >> This patch fix this race by introduced a l1_events_blocked field in nested_vmx
> >> which indicates there is still-pending event which blocked by
> >> nested_run_pending,
> >> and smart request a KVM_REQ_EVENT if there is a still-pending event which
> >> blocked
> >> by nested_run_pending.
> >>
> >> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> > Tested-by: Robert Hu<robert.hu@intel.com>
> 
> Do you happen to have a kvm-unit-test for this race? Or how did you
> test? Just curious, and if there is a test, it would be good to
> integrate it.
I just apply the patch and test the reported scenario.
> 
> Jan
> 
> --
> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-02  9:13     ` Hu, Robert
@ 2014-07-02  9:16       ` Jan Kiszka
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Kiszka @ 2014-07-02  9:16 UTC (permalink / raw)
  To: Hu, Robert, Wanpeng Li, Paolo Bonzini, Gleb Natapov; +Cc: kvm, linux-kernel

On 2014-07-02 11:13, Hu, Robert wrote:
> 
>> -----Original Message-----
>> From: Jan Kiszka [mailto:jan.kiszka@siemens.com]
>> Sent: Wednesday, July 2, 2014 5:03 PM
>> To: Hu, Robert; Wanpeng Li; Paolo Bonzini; Gleb Natapov
>> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since
>> race
>>
>> On 2014-07-02 09:20, Hu, Robert wrote:
>>>> -----Original Message-----
>>>> From: Wanpeng Li [mailto:wanpeng.li@linux.intel.com]
>>>> Sent: Wednesday, July 2, 2014 2:54 PM
>>>> To: Paolo Bonzini; Jan Kiszka; Gleb Natapov
>>>> Cc: Hu, Robert; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; Wanpeng
>> Li
>>>> Subject: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since
>> race
>>>>
>>>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381
>>>>
>>>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>>>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the
>>>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if
>>>> there is no still-pending event to L1 which blocked by nested_run_pending.
>>>> There is a race which lead to an interrupt will be injected to L2 which
>>>> belong to L1 if L0 send an interrupt to L1 during this window.
>>>>
>>>>                VCPU0                               another thread
>>>>
>>>> L1 intr not blocked on L2 first entry
>>>> vmx_vcpu_run req event
>>>> kvm check request req event
>>>> check_nested_events don't have any intr
>>>> not nested exit
>>>>                                             intr occur (8254, lapic timer
>>>> etc)
>>>> inject_pending_event now have intr
>>>> inject interrupt
>>>>
>>>> This patch fix this race by introduced a l1_events_blocked field in nested_vmx
>>>> which indicates there is still-pending event which blocked by
>>>> nested_run_pending,
>>>> and smart request a KVM_REQ_EVENT if there is a still-pending event which
>>>> blocked
>>>> by nested_run_pending.
>>>>
>>>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>>> Tested-by: Robert Hu<robert.hu@intel.com>
>>
>> Do you happen to have a kvm-unit-test for this race? Or how did you
>> test? Just curious, and if there is a test, it would be good to
>> integrate it.
> I just apply the patch and test the reported scenario.

Ah, sorry, missed the referenced bug report.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-02  6:54 [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race Wanpeng Li
  2014-07-02  7:20 ` Hu, Robert
  2014-07-02  9:01 ` Jan Kiszka
@ 2014-07-02 16:27 ` Bandan Das
  2014-07-03  5:11   ` Wanpeng Li
  2 siblings, 1 reply; 43+ messages in thread
From: Bandan Das @ 2014-07-02 16:27 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Paolo Bonzini, Jan Kiszka, Gleb Natapov, Hu Robert, kvm, linux-kernel

Wanpeng Li <wanpeng.li@linux.intel.com> writes:

> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
I can also reproduce this easily with Linux as L1 by "slowing it down"
eg. running with ept = 0

I suggest changing the subject to -
KVM: nVMX: Fix race that incorrectly injects L1's irq to L2

> If we didn't inject a still-pending event to L1 since nested_run_pending,
> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 

What's current "log" ? Do you mean current "code" ?

> there is no still-pending event to L1 which blocked by nested_run_pending. 
> There is a race which lead to an interrupt will be injected to L2 which 
> belong to L1 if L0 send an interrupt to L1 during this window. 
>
>                VCPU0                               another thread 
>
> L1 intr not blocked on L2 first entry
> vmx_vcpu_run req event 
> kvm check request req event 
> check_nested_events don't have any intr 
> not nested exit 
>                                             intr occur (8254, lapic timer etc)
> inject_pending_event now have intr 
> inject interrupt 
>
> This patch fix this race by introduced a l1_events_blocked field in nested_vmx 
> which indicates there is still-pending event which blocked by nested_run_pending, 
> and smart request a KVM_REQ_EVENT if there is a still-pending event which blocked 
> by nested_run_pending.
>
> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> ---
>  arch/x86/kvm/vmx.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f4e5aed..fe69c49 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -372,6 +372,7 @@ struct nested_vmx {
>  	u64 vmcs01_tsc_offset;
>  	/* L2 must run next, and mustn't decide to exit to L1. */
>  	bool nested_run_pending;
> +	bool l1_events_blocked;
>  	/*
>  	 * Guest pages referred to in vmcs02 with host-physical pointers, so
>  	 * we must keep them pinned while L2 runs.
> @@ -7380,8 +7381,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	 * we did not inject a still-pending event to L1 now because of
>  	 * nested_run_pending, we need to re-enable this bit.
>  	 */
> -	if (vmx->nested.nested_run_pending)
> +	if (to_vmx(vcpu)->nested.l1_events_blocked) {
Is to_vmx() necessary since we alredy have the vmx pointer ?

> +		to_vmx(vcpu)->nested.l1_events_blocked = false;
>  		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +	}
>  
>  	vmx->nested.nested_run_pending = 0;
>  
> @@ -8197,15 +8200,20 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>  
>  	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
>  	    vmx->nested.preemption_timer_expired) {
> -		if (vmx->nested.nested_run_pending)
> +		if (vmx->nested.nested_run_pending) {
> +			vmx->nested.l1_events_blocked = true;
>  			return -EBUSY;
> +		}
>  		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
>  		return 0;
>  	}
>  
>  	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
> -		if (vmx->nested.nested_run_pending ||
> -		    vcpu->arch.interrupt.pending)
> +		if (vmx->nested.nested_run_pending) {
> +			vmx->nested.l1_events_blocked = true;
> +			return -EBUSY;
> +		}
> +		if (vcpu->arch.interrupt.pending)
>  			return -EBUSY;
>  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
>  				  NMI_VECTOR | INTR_TYPE_NMI_INTR |
> @@ -8221,8 +8229,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>  
>  	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
>  	    nested_exit_on_intr(vcpu)) {
> -		if (vmx->nested.nested_run_pending)
> +		if (vmx->nested.nested_run_pending) {
> +			vmx->nested.l1_events_blocked = true;
>  			return -EBUSY;
> +		}
>  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
>  	}
Also, I am wondering isn't it enough to just do this to avoid this race ?

 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
-       return (!to_vmx(vcpu)->nested.nested_run_pending &&
+       return (!is_guest_mode(vcpu) &&
+               !to_vmx(vcpu)->nested.nested_run_pending &&
                vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
                !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &

Thanks,
Bandan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-02  9:01 ` Jan Kiszka
@ 2014-07-03  2:59   ` Wanpeng Li
  2014-07-03  5:15   ` Bandan Das
  1 sibling, 0 replies; 43+ messages in thread
From: Wanpeng Li @ 2014-07-03  2:59 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

Hi Jan,
On Wed, Jul 02, 2014 at 11:01:30AM +0200, Jan Kiszka wrote:
>On 2014-07-02 08:54, Wanpeng Li wrote:
>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>> 
>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>> there is no still-pending event to L1 which blocked by nested_run_pending. 
>> There is a race which lead to an interrupt will be injected to L2 which 
>> belong to L1 if L0 send an interrupt to L1 during this window. 
>> 
>>                VCPU0                               another thread 
>> 
>> L1 intr not blocked on L2 first entry
>> vmx_vcpu_run req event 
>> kvm check request req event 
>> check_nested_events don't have any intr 
>> not nested exit 
>>                                             intr occur (8254, lapic timer etc)
>> inject_pending_event now have intr 
>> inject interrupt 
>> 
>> This patch fix this race by introduced a l1_events_blocked field in nested_vmx 
>> which indicates there is still-pending event which blocked by nested_run_pending, 
>> and smart request a KVM_REQ_EVENT if there is a still-pending event which blocked 
>> by nested_run_pending.
>
>There are more, unrelated reasons why KVM_REQ_EVENT could be set. Why
>aren't those able to trigger this scenario?
>

Other KVM_REQ_EVENT has a specific intr or pending , however, the KVM_REQ_EVENT 
which request after vmexit with nested_run_pending may or may not have a
specific intr or pending.

Regards,
Wanpeng Li 

>In any case, unconditionally setting KVM_REQ_EVENT seems strange and
>should be changed.
>
>Jan
>
>> 
>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>> ---
>>  arch/x86/kvm/vmx.c | 20 +++++++++++++++-----
>>  1 file changed, 15 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index f4e5aed..fe69c49 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -372,6 +372,7 @@ struct nested_vmx {
>>  	u64 vmcs01_tsc_offset;
>>  	/* L2 must run next, and mustn't decide to exit to L1. */
>>  	bool nested_run_pending;
>> +	bool l1_events_blocked;
>>  	/*
>>  	 * Guest pages referred to in vmcs02 with host-physical pointers, so
>>  	 * we must keep them pinned while L2 runs.
>> @@ -7380,8 +7381,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>  	 * we did not inject a still-pending event to L1 now because of
>>  	 * nested_run_pending, we need to re-enable this bit.
>>  	 */
>> -	if (vmx->nested.nested_run_pending)
>> +	if (to_vmx(vcpu)->nested.l1_events_blocked) {
>> +		to_vmx(vcpu)->nested.l1_events_blocked = false;
>>  		kvm_make_request(KVM_REQ_EVENT, vcpu);
>> +	}
>>  
>>  	vmx->nested.nested_run_pending = 0;
>>  
>> @@ -8197,15 +8200,20 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>>  
>>  	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
>>  	    vmx->nested.preemption_timer_expired) {
>> -		if (vmx->nested.nested_run_pending)
>> +		if (vmx->nested.nested_run_pending) {
>> +			vmx->nested.l1_events_blocked = true;
>>  			return -EBUSY;
>> +		}
>>  		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
>>  		return 0;
>>  	}
>>  
>>  	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
>> -		if (vmx->nested.nested_run_pending ||
>> -		    vcpu->arch.interrupt.pending)
>> +		if (vmx->nested.nested_run_pending) {
>> +			vmx->nested.l1_events_blocked = true;
>> +			return -EBUSY;
>> +		}
>> +		if (vcpu->arch.interrupt.pending)
>>  			return -EBUSY;
>>  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
>>  				  NMI_VECTOR | INTR_TYPE_NMI_INTR |
>> @@ -8221,8 +8229,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>>  
>>  	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
>>  	    nested_exit_on_intr(vcpu)) {
>> -		if (vmx->nested.nested_run_pending)
>> +		if (vmx->nested.nested_run_pending) {
>> +			vmx->nested.l1_events_blocked = true;
>>  			return -EBUSY;
>> +		}
>>  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
>>  	}
>>  
>> 
>
>-- 
>Siemens AG, Corporate Technology, CT RTC ITP SES-DE
>Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-02 16:27 ` Bandan Das
@ 2014-07-03  5:11   ` Wanpeng Li
  2014-07-03  5:29     ` Bandan Das
  0 siblings, 1 reply; 43+ messages in thread
From: Wanpeng Li @ 2014-07-03  5:11 UTC (permalink / raw)
  To: Bandan Das
  Cc: Paolo Bonzini, Jan Kiszka, Gleb Natapov, Hu Robert, kvm, linux-kernel

Hi Bandan,
On Wed, Jul 02, 2014 at 12:27:59PM -0400, Bandan Das wrote:
>Wanpeng Li <wanpeng.li@linux.intel.com> writes:
>
>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>I can also reproduce this easily with Linux as L1 by "slowing it down"
>eg. running with ept = 0
>
>I suggest changing the subject to -
>KVM: nVMX: Fix race that incorrectly injects L1's irq to L2
>

Ok, I will fold this to next version. ;-)

>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>
>What's current "log" ? Do you mean current "code" ?
>

Yeah, it's a typo. I mean "logic".

[...]
>Also, I am wondering isn't it enough to just do this to avoid this race ?
>
> static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
> {
>-       return (!to_vmx(vcpu)->nested.nested_run_pending &&
>+       return (!is_guest_mode(vcpu) &&
>+               !to_vmx(vcpu)->nested.nested_run_pending &&
>                vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
>                !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
>

I don't think you fix the root cause of the race, and there are two cases which 
I concern about your proposal:

- If there is a special L1 which don't ask to exit on external intrs, you will 
  lose the intrs which L0 inject to L2.  

- If inject_pending_event fail to inject an intr since the change that you made 
  in vmx_interrupt_allowed, L0 will request an intr window for L2, however, the 
  intr is belong to L1.

Regards,
Wanpeng Li 

>Thanks,
>Bandan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-02  9:01 ` Jan Kiszka
  2014-07-03  2:59   ` Wanpeng Li
@ 2014-07-03  5:15   ` Bandan Das
  2014-07-03  6:59     ` Wanpeng Li
  2014-07-04  6:17     ` [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race Wanpeng Li
  1 sibling, 2 replies; 43+ messages in thread
From: Bandan Das @ 2014-07-03  5:15 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Wanpeng Li, Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 2014-07-02 08:54, Wanpeng Li wrote:
>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>> 
>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>> there is no still-pending event to L1 which blocked by nested_run_pending. 
>> There is a race which lead to an interrupt will be injected to L2 which 
>> belong to L1 if L0 send an interrupt to L1 during this window. 
>> 
>>                VCPU0                               another thread 
>> 
>> L1 intr not blocked on L2 first entry
>> vmx_vcpu_run req event 
>> kvm check request req event 
>> check_nested_events don't have any intr 
>> not nested exit 
>>                                             intr occur (8254, lapic timer etc)
>> inject_pending_event now have intr 
>> inject interrupt 
>> 
>> This patch fix this race by introduced a l1_events_blocked field in nested_vmx 
>> which indicates there is still-pending event which blocked by nested_run_pending, 
>> and smart request a KVM_REQ_EVENT if there is a still-pending event which blocked 
>> by nested_run_pending.
>
> There are more, unrelated reasons why KVM_REQ_EVENT could be set. Why
> aren't those able to trigger this scenario?
>
> In any case, unconditionally setting KVM_REQ_EVENT seems strange and
> should be changed.


Ugh! I think I am hitting another one but this one's probably because 
we are not setting KVM_REQ_EVENT for something we should.

Before this patch, I was able to hit this bug everytime with 
"modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0" and then booting 
L2. I can verify that I was indeed hitting the race in inject_pending_event.

After this patch, I believe I am hitting another bug - this happens 
after I boot L2, as above, and then start a Linux kernel compilation
and then wait and watch :) It's a pain to debug because this happens
almost once in three times; it never happens if I run with ept=1, however,
I think that's only because the test completes sooner. But I can confirm
that I don't see it if I always set REQ_EVENT if nested_run_pending is set instead of
the approach this patch takes.
(Any debug hints help appreciated!)

So, I am not sure if this is the right fix. Rather, I think the safer thing
to do is to have the interrupt pending check for injection into L1 at
the "same site" as the call to kvm_queue_interrupt() just like we had before 
commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
nested events checks together ?

PS - Actually, a much easier fix (or rather hack) is to return 1 in 
vmx_interrupt_allowed() (as I mentioned elsewhere) only if 
!is_guest_mode(vcpu) That way, the pending interrupt interrupt 
can be taken care of correctly during the next vmexit.

Bandan

> Jan
>
>> 
>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>> ---
>>  arch/x86/kvm/vmx.c | 20 +++++++++++++++-----
>>  1 file changed, 15 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index f4e5aed..fe69c49 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -372,6 +372,7 @@ struct nested_vmx {
>>  	u64 vmcs01_tsc_offset;
>>  	/* L2 must run next, and mustn't decide to exit to L1. */
>>  	bool nested_run_pending;
>> +	bool l1_events_blocked;
>>  	/*
>>  	 * Guest pages referred to in vmcs02 with host-physical pointers, so
>>  	 * we must keep them pinned while L2 runs.
>> @@ -7380,8 +7381,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>  	 * we did not inject a still-pending event to L1 now because of
>>  	 * nested_run_pending, we need to re-enable this bit.
>>  	 */
>> -	if (vmx->nested.nested_run_pending)
>> +	if (to_vmx(vcpu)->nested.l1_events_blocked) {
>> +		to_vmx(vcpu)->nested.l1_events_blocked = false;
>>  		kvm_make_request(KVM_REQ_EVENT, vcpu);
>> +	}
>>  
>>  	vmx->nested.nested_run_pending = 0;
>>  
>> @@ -8197,15 +8200,20 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>>  
>>  	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
>>  	    vmx->nested.preemption_timer_expired) {
>> -		if (vmx->nested.nested_run_pending)
>> +		if (vmx->nested.nested_run_pending) {
>> +			vmx->nested.l1_events_blocked = true;
>>  			return -EBUSY;
>> +		}
>>  		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
>>  		return 0;
>>  	}
>>  
>>  	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
>> -		if (vmx->nested.nested_run_pending ||
>> -		    vcpu->arch.interrupt.pending)
>> +		if (vmx->nested.nested_run_pending) {
>> +			vmx->nested.l1_events_blocked = true;
>> +			return -EBUSY;
>> +		}
>> +		if (vcpu->arch.interrupt.pending)
>>  			return -EBUSY;
>>  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
>>  				  NMI_VECTOR | INTR_TYPE_NMI_INTR |
>> @@ -8221,8 +8229,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>>  
>>  	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
>>  	    nested_exit_on_intr(vcpu)) {
>> -		if (vmx->nested.nested_run_pending)
>> +		if (vmx->nested.nested_run_pending) {
>> +			vmx->nested.l1_events_blocked = true;
>>  			return -EBUSY;
>> +		}
>>  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
>>  	}
>>  
>> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-03  5:11   ` Wanpeng Li
@ 2014-07-03  5:29     ` Bandan Das
  2014-07-03  7:33       ` Jan Kiszka
  0 siblings, 1 reply; 43+ messages in thread
From: Bandan Das @ 2014-07-03  5:29 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Paolo Bonzini, Jan Kiszka, Gleb Natapov, Hu Robert, kvm, linux-kernel

Wanpeng Li <wanpeng.li@linux.intel.com> writes:

> Hi Bandan,
> On Wed, Jul 02, 2014 at 12:27:59PM -0400, Bandan Das wrote:
>>Wanpeng Li <wanpeng.li@linux.intel.com> writes:
>>
>>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>>I can also reproduce this easily with Linux as L1 by "slowing it down"
>>eg. running with ept = 0
>>
>>I suggest changing the subject to -
>>KVM: nVMX: Fix race that incorrectly injects L1's irq to L2
>>
>
> Ok, I will fold this to next version. ;-)
>
>>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>>
>>What's current "log" ? Do you mean current "code" ?
>>
>
> Yeah, it's a typo. I mean "logic".
>
> [...]
>>Also, I am wondering isn't it enough to just do this to avoid this race ?
>>
>> static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
>> {
>>-       return (!to_vmx(vcpu)->nested.nested_run_pending &&
>>+       return (!is_guest_mode(vcpu) &&
>>+               !to_vmx(vcpu)->nested.nested_run_pending &&
>>                vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
>>                !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
>>
>
> I don't think you fix the root cause of the race, and there are two cases which 
> I concern about your proposal:
>
> - If there is a special L1 which don't ask to exit on external intrs, you will 
>   lose the intrs which L0 inject to L2.  

Oh didn't think about that case :), thanks for the pointing this out.
It's easy to check this with Xen as L1, I suppose.

> - If inject_pending_event fail to inject an intr since the change that you made 
>   in vmx_interrupt_allowed, L0 will request an intr window for L2, however, the 
>   intr is belong to L1.
Oh, I thought inject_pending_event has kvm_queue_interrupt only if vmx_interrupt_allowed()
returns true so, interrupt will be injected correctly on the next vmexit.

Anyway, I am hitting another bug with your patch! Please see my other mail to the list.

Thanks!

> Regards,
> Wanpeng Li 
>
>>Thanks,
>>Bandan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-03  5:15   ` Bandan Das
@ 2014-07-03  6:59     ` Wanpeng Li
  2014-07-03 17:27       ` Bandan Das
  2014-07-04  6:17     ` [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race Wanpeng Li
  1 sibling, 1 reply; 43+ messages in thread
From: Wanpeng Li @ 2014-07-03  6:59 UTC (permalink / raw)
  To: Bandan Das
  Cc: Jan Kiszka, Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

On Thu, Jul 03, 2014 at 01:15:26AM -0400, Bandan Das wrote:
>Jan Kiszka <jan.kiszka@siemens.com> writes:
>
>> On 2014-07-02 08:54, Wanpeng Li wrote:
>>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>>> 
>>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>>> there is no still-pending event to L1 which blocked by nested_run_pending. 
>>> There is a race which lead to an interrupt will be injected to L2 which 
>>> belong to L1 if L0 send an interrupt to L1 during this window. 
>>> 
>>>                VCPU0                               another thread 
>>> 
>>> L1 intr not blocked on L2 first entry
>>> vmx_vcpu_run req event 
>>> kvm check request req event 
>>> check_nested_events don't have any intr 
>>> not nested exit 
>>>                                             intr occur (8254, lapic timer etc)
>>> inject_pending_event now have intr 
>>> inject interrupt 
>>> 
>>> This patch fix this race by introduced a l1_events_blocked field in nested_vmx 
>>> which indicates there is still-pending event which blocked by nested_run_pending, 
>>> and smart request a KVM_REQ_EVENT if there is a still-pending event which blocked 
>>> by nested_run_pending.
>>
>> There are more, unrelated reasons why KVM_REQ_EVENT could be set. Why
>> aren't those able to trigger this scenario?
>>
>> In any case, unconditionally setting KVM_REQ_EVENT seems strange and
>> should be changed.
>
>
>Ugh! I think I am hitting another one but this one's probably because 
>we are not setting KVM_REQ_EVENT for something we should.
>
>Before this patch, I was able to hit this bug everytime with 
>"modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0" and then booting 
>L2. I can verify that I was indeed hitting the race in inject_pending_event.
>
>After this patch, I believe I am hitting another bug - this happens 
>after I boot L2, as above, and then start a Linux kernel compilation
>and then wait and watch :) It's a pain to debug because this happens

I have already try several times with "modprobe kvm_intel ept=0 nested=1
enable_shadow_vmcs=0" and still can't reproduce the bug you mentioned.
Could you give me more details such like L1 and L2 which one hang or panic? 
In addition, if you can post the call trace is appreciated. 

Regards,
Wanpeng Li 

>almost once in three times; it never happens if I run with ept=1, however,
>I think that's only because the test completes sooner. But I can confirm
>that I don't see it if I always set REQ_EVENT if nested_run_pending is set instead of
>the approach this patch takes.
>(Any debug hints help appreciated!)
>
>So, I am not sure if this is the right fix. Rather, I think the safer thing
>to do is to have the interrupt pending check for injection into L1 at
>the "same site" as the call to kvm_queue_interrupt() just like we had before 
>commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
>nested events checks together ?
>
>PS - Actually, a much easier fix (or rather hack) is to return 1 in 
>vmx_interrupt_allowed() (as I mentioned elsewhere) only if 
>!is_guest_mode(vcpu) That way, the pending interrupt interrupt 
>can be taken care of correctly during the next vmexit.
>
>Bandan
>
>> Jan
>>
[...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-03  5:29     ` Bandan Das
@ 2014-07-03  7:33       ` Jan Kiszka
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Kiszka @ 2014-07-03  7:33 UTC (permalink / raw)
  To: Bandan Das, Wanpeng Li
  Cc: Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

On 2014-07-03 07:29, Bandan Das wrote:
> Wanpeng Li <wanpeng.li@linux.intel.com> writes:
> 
>> Hi Bandan,
>> On Wed, Jul 02, 2014 at 12:27:59PM -0400, Bandan Das wrote:
>>> Wanpeng Li <wanpeng.li@linux.intel.com> writes:
>>>
>>>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>>> I can also reproduce this easily with Linux as L1 by "slowing it down"
>>> eg. running with ept = 0
>>>
>>> I suggest changing the subject to -
>>> KVM: nVMX: Fix race that incorrectly injects L1's irq to L2
>>>
>>
>> Ok, I will fold this to next version. ;-)
>>
>>>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>>>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>>>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>>>
>>> What's current "log" ? Do you mean current "code" ?
>>>
>>
>> Yeah, it's a typo. I mean "logic".
>>
>> [...]
>>> Also, I am wondering isn't it enough to just do this to avoid this race ?
>>>
>>> static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
>>> {
>>> -       return (!to_vmx(vcpu)->nested.nested_run_pending &&
>>> +       return (!is_guest_mode(vcpu) &&
>>> +               !to_vmx(vcpu)->nested.nested_run_pending &&
>>>                vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
>>>                !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
>>>
>>
>> I don't think you fix the root cause of the race, and there are two cases which 
>> I concern about your proposal:
>>
>> - If there is a special L1 which don't ask to exit on external intrs, you will 
>>   lose the intrs which L0 inject to L2.  
> 
> Oh didn't think about that case :), thanks for the pointing this out.
> It's easy to check this with Xen as L1, I suppose.

Xen most probably intercepts external interrupts, but Jailhouse
definitely does not. We also have a unit test for that, but I will
likely not expose the issue of lost events.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-03  6:59     ` Wanpeng Li
@ 2014-07-03 17:27       ` Bandan Das
  2014-07-04  2:52         ` Wanpeng Li
  0 siblings, 1 reply; 43+ messages in thread
From: Bandan Das @ 2014-07-03 17:27 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Jan Kiszka, Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

Wanpeng Li <wanpeng.li@linux.intel.com> writes:

> On Thu, Jul 03, 2014 at 01:15:26AM -0400, Bandan Das wrote:
>>Jan Kiszka <jan.kiszka@siemens.com> writes:
>>
>>> On 2014-07-02 08:54, Wanpeng Li wrote:
>>>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>>>> 
>>>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>>>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>>>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>>>> there is no still-pending event to L1 which blocked by nested_run_pending. 
>>>> There is a race which lead to an interrupt will be injected to L2 which 
>>>> belong to L1 if L0 send an interrupt to L1 during this window. 
>>>> 
>>>>                VCPU0                               another thread 
>>>> 
>>>> L1 intr not blocked on L2 first entry
>>>> vmx_vcpu_run req event 
>>>> kvm check request req event 
>>>> check_nested_events don't have any intr 
>>>> not nested exit 
>>>>                                             intr occur (8254, lapic timer etc)
>>>> inject_pending_event now have intr 
>>>> inject interrupt 
>>>> 
>>>> This patch fix this race by introduced a l1_events_blocked field in nested_vmx 
>>>> which indicates there is still-pending event which blocked by nested_run_pending, 
>>>> and smart request a KVM_REQ_EVENT if there is a still-pending event which blocked 
>>>> by nested_run_pending.
>>>
>>> There are more, unrelated reasons why KVM_REQ_EVENT could be set. Why
>>> aren't those able to trigger this scenario?
>>>
>>> In any case, unconditionally setting KVM_REQ_EVENT seems strange and
>>> should be changed.
>>
>>
>>Ugh! I think I am hitting another one but this one's probably because 
>>we are not setting KVM_REQ_EVENT for something we should.
>>
>>Before this patch, I was able to hit this bug everytime with 
>>"modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0" and then booting 
>>L2. I can verify that I was indeed hitting the race in inject_pending_event.
>>
>>After this patch, I believe I am hitting another bug - this happens 
>>after I boot L2, as above, and then start a Linux kernel compilation
>>and then wait and watch :) It's a pain to debug because this happens
>
> I have already try several times with "modprobe kvm_intel ept=0 nested=1
> enable_shadow_vmcs=0" and still can't reproduce the bug you mentioned.
> Could you give me more details such like L1 and L2 which one hang or panic? 
> In addition, if you can post the call trace is appreciated. 

# modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0

The Host CPU - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
qemu cmd to run L1 - 
# qemu-system-x86_64 -drive file=level1.img,if=virtio,id=disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -drive file=level2.img,if=virtio,id=disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -vnc :2 --enable-kvm -monitor stdio -m 4G -net nic,macaddr=00:23:32:45:89:10 -net tap,ifname=tap0,script=/etc/qemu-ifup,downscript=no -smp 4 -cpu Nehalem,+vmx -serial pty

qemu cmd to run L2 -
# sudo qemu-system-x86_64 -hda VM/level2.img -vnc :0 --enable-kvm -monitor stdio -m 2G -smp 2 -cpu Nehalem -redir tcp:5555::22

Additionally,
L0 is FC19 with 3.16-rc3
L1 and L2 are Ubuntu 14.04 with 3.13.0-24-generic

Then start a kernel compilation inside L2 with "make -j3"

There's no call trace on L0, both L0 and L1 are hung (or rather really slow) and
L1 serial spews out CPU softlock up errors. Enabling panic on softlockup on L1 will give
a trace with smp_call_function_many() I think the corresponding code in kernel/smp.c that
triggers this is
 
WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
                      && !oops_in_progress && !early_boot_irqs_disabled);

I know in most cases this is usually harmless, but in this specific case,
it seems it's stuck here forever.

Sorry, I don't have a L1 call trace handy atm, I can post that if you are interested.

Note that this can take as much as 30 to 40 minutes to appear but once it does,
you will know because both L1 and L2 will be stuck with the serial messages as I mentioned
before. From my side, let me try this on another system to rule out any machine specific
weirdness going on..

Please let me know if you need any further information.

Thanks
Bandan

> Regards,
> Wanpeng Li 
>
>>almost once in three times; it never happens if I run with ept=1, however,
>>I think that's only because the test completes sooner. But I can confirm
>>that I don't see it if I always set REQ_EVENT if nested_run_pending is set instead of
>>the approach this patch takes.
>>(Any debug hints help appreciated!)
>>
>>So, I am not sure if this is the right fix. Rather, I think the safer thing
>>to do is to have the interrupt pending check for injection into L1 at
>>the "same site" as the call to kvm_queue_interrupt() just like we had before 
>>commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
>>nested events checks together ?
>>
>>PS - Actually, a much easier fix (or rather hack) is to return 1 in 
>>vmx_interrupt_allowed() (as I mentioned elsewhere) only if 
>>!is_guest_mode(vcpu) That way, the pending interrupt interrupt 
>>can be taken care of correctly during the next vmexit.
>>
>>Bandan
>>
>>> Jan
>>>
> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-03 17:27       ` Bandan Das
@ 2014-07-04  2:52         ` Wanpeng Li
  2014-07-04  5:43           ` Jan Kiszka
  0 siblings, 1 reply; 43+ messages in thread
From: Wanpeng Li @ 2014-07-04  2:52 UTC (permalink / raw)
  To: Bandan Das
  Cc: Jan Kiszka, Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

On Thu, Jul 03, 2014 at 01:27:05PM -0400, Bandan Das wrote:
[...]
># modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0
>
>The Host CPU - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>qemu cmd to run L1 - 
># qemu-system-x86_64 -drive file=level1.img,if=virtio,id=disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -drive file=level2.img,if=virtio,id=disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -vnc :2 --enable-kvm -monitor stdio -m 4G -net nic,macaddr=00:23:32:45:89:10 -net tap,ifname=tap0,script=/etc/qemu-ifup,downscript=no -smp 4 -cpu Nehalem,+vmx -serial pty
>
>qemu cmd to run L2 -
># sudo qemu-system-x86_64 -hda VM/level2.img -vnc :0 --enable-kvm -monitor stdio -m 2G -smp 2 -cpu Nehalem -redir tcp:5555::22
>
>Additionally,
>L0 is FC19 with 3.16-rc3
>L1 and L2 are Ubuntu 14.04 with 3.13.0-24-generic
>
>Then start a kernel compilation inside L2 with "make -j3"
>
>There's no call trace on L0, both L0 and L1 are hung (or rather really slow) and
>L1 serial spews out CPU softlock up errors. Enabling panic on softlockup on L1 will give
>a trace with smp_call_function_many() I think the corresponding code in kernel/smp.c that
>triggers this is
> 
>WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>                      && !oops_in_progress && !early_boot_irqs_disabled);
>
>I know in most cases this is usually harmless, but in this specific case,
>it seems it's stuck here forever.
>
>Sorry, I don't have a L1 call trace handy atm, I can post that if you are interested.
>
>Note that this can take as much as 30 to 40 minutes to appear but once it does,
>you will know because both L1 and L2 will be stuck with the serial messages as I mentioned
>before. From my side, let me try this on another system to rule out any machine specific
>weirdness going on..
>

Thanks for your pointing out. 

>Please let me know if you need any further information.
>

I just run kvm-unit-tests w/ vmx.flat and eventinj.flat.


w/ vmx.flat and w/o my patch applied 

[...]

Test suite : interrupt
FAIL: direct interrupt while running guest
PASS: intercepted interrupt while running guest
FAIL: direct interrupt + hlt
FAIL: intercepted interrupt + hlt
FAIL: direct interrupt + activity state hlt
FAIL: intercepted interrupt + activity state hlt
PASS: running a guest with interrupt acknowledgement set
SUMMARY: 69 tests, 6 failures

w/ vmx.flat and w/ my patch applied 

[...]

Test suite : interrupt
PASS: direct interrupt while running guest
PASS: intercepted interrupt while running guest
PASS: direct interrupt + hlt
FAIL: intercepted interrupt + hlt
PASS: direct interrupt + activity state hlt
PASS: intercepted interrupt + activity state hlt
PASS: running a guest with interrupt acknowledgement set

SUMMARY: 69 tests, 2 failures 


w/ eventinj.flat and w/o my patch applied 

SUMMARY: 13 tests, 0 failures

w/ eventinj.flat and w/ my patch applied 

SUMMARY: 13 tests, 0 failures


I'm not sure if the bug you mentioned has any relationship with "Fail:
intercepted interrupt + hlt" which has already present before my patch.

Regards,
Wanpeng Li 

>Thanks
>Bandan
>
>> Regards,
>> Wanpeng Li 
>>
>>>almost once in three times; it never happens if I run with ept=1, however,
>>>I think that's only because the test completes sooner. But I can confirm
>>>that I don't see it if I always set REQ_EVENT if nested_run_pending is set instead of
>>>the approach this patch takes.
>>>(Any debug hints help appreciated!)
>>>
>>>So, I am not sure if this is the right fix. Rather, I think the safer thing
>>>to do is to have the interrupt pending check for injection into L1 at
>>>the "same site" as the call to kvm_queue_interrupt() just like we had before 
>>>commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
>>>nested events checks together ?
>>>
>>>PS - Actually, a much easier fix (or rather hack) is to return 1 in 
>>>vmx_interrupt_allowed() (as I mentioned elsewhere) only if 
>>>!is_guest_mode(vcpu) That way, the pending interrupt interrupt 
>>>can be taken care of correctly during the next vmexit.
>>>
>>>Bandan
>>>
>>>> Jan
>>>>
>> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  2:52         ` Wanpeng Li
@ 2014-07-04  5:43           ` Jan Kiszka
  2014-07-04  6:08             ` Wanpeng Li
                               ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Jan Kiszka @ 2014-07-04  5:43 UTC (permalink / raw)
  To: Wanpeng Li, Bandan Das
  Cc: Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

On 2014-07-04 04:52, Wanpeng Li wrote:
> On Thu, Jul 03, 2014 at 01:27:05PM -0400, Bandan Das wrote:
> [...]
>> # modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0
>>
>> The Host CPU - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>> qemu cmd to run L1 - 
>> # qemu-system-x86_64 -drive file=level1.img,if=virtio,id=disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -drive file=level2.img,if=virtio,id=disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -vnc :2 --enable-kvm -monitor stdio -m 4G -net nic,macaddr=00:23:32:45:89:10 -net tap,ifname=tap0,script=/etc/qemu-ifup,downscript=no -smp 4 -cpu Nehalem,+vmx -serial pty
>>
>> qemu cmd to run L2 -
>> # sudo qemu-system-x86_64 -hda VM/level2.img -vnc :0 --enable-kvm -monitor stdio -m 2G -smp 2 -cpu Nehalem -redir tcp:5555::22
>>
>> Additionally,
>> L0 is FC19 with 3.16-rc3
>> L1 and L2 are Ubuntu 14.04 with 3.13.0-24-generic
>>
>> Then start a kernel compilation inside L2 with "make -j3"
>>
>> There's no call trace on L0, both L0 and L1 are hung (or rather really slow) and
>> L1 serial spews out CPU softlock up errors. Enabling panic on softlockup on L1 will give
>> a trace with smp_call_function_many() I think the corresponding code in kernel/smp.c that
>> triggers this is
>>
>> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>>                      && !oops_in_progress && !early_boot_irqs_disabled);
>>
>> I know in most cases this is usually harmless, but in this specific case,
>> it seems it's stuck here forever.
>>
>> Sorry, I don't have a L1 call trace handy atm, I can post that if you are interested.
>>
>> Note that this can take as much as 30 to 40 minutes to appear but once it does,
>> you will know because both L1 and L2 will be stuck with the serial messages as I mentioned
>> before. From my side, let me try this on another system to rule out any machine specific
>> weirdness going on..
>>
> 
> Thanks for your pointing out. 
> 
>> Please let me know if you need any further information.
>>
> 
> I just run kvm-unit-tests w/ vmx.flat and eventinj.flat.
> 
> 
> w/ vmx.flat and w/o my patch applied 
> 
> [...]
> 
> Test suite : interrupt
> FAIL: direct interrupt while running guest
> PASS: intercepted interrupt while running guest
> FAIL: direct interrupt + hlt
> FAIL: intercepted interrupt + hlt
> FAIL: direct interrupt + activity state hlt
> FAIL: intercepted interrupt + activity state hlt
> PASS: running a guest with interrupt acknowledgement set
> SUMMARY: 69 tests, 6 failures
> 
> w/ vmx.flat and w/ my patch applied 
> 
> [...]
> 
> Test suite : interrupt
> PASS: direct interrupt while running guest
> PASS: intercepted interrupt while running guest
> PASS: direct interrupt + hlt
> FAIL: intercepted interrupt + hlt
> PASS: direct interrupt + activity state hlt
> PASS: intercepted interrupt + activity state hlt
> PASS: running a guest with interrupt acknowledgement set
> 
> SUMMARY: 69 tests, 2 failures 

Which version (hash) of kvm-unit-tests are you using? All tests up to
307621765a are running fine here, but since a0e30e712d not much is
completing successfully anymore:

enabling apic
paging enabled
cr0 = 80010011
cr3 = 7fff000
cr4 = 20
PASS: test vmxon with FEATURE_CONTROL cleared
PASS: test vmxon without FEATURE_CONTROL lock
PASS: test enable VMX in FEATURE_CONTROL
PASS: test FEATURE_CONTROL lock bit
PASS: test vmxon
FAIL: test vmptrld
PASS: test vmclear
init_vmcs : make_vmcs_current error
FAIL: test vmptrst
init_vmcs : make_vmcs_current error
vmx_run : vmlaunch failed.
FAIL: test vmlaunch
FAIL: test vmlaunch

SUMMARY: 10 tests, 4 unexpected failures


Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  5:43           ` Jan Kiszka
@ 2014-07-04  6:08             ` Wanpeng Li
  2014-07-04  7:19               ` Jan Kiszka
  2014-07-04  7:42             ` Paolo Bonzini
  2014-07-04  9:33             ` Jan Kiszka
  2 siblings, 1 reply; 43+ messages in thread
From: Wanpeng Li @ 2014-07-04  6:08 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Bandan Das, Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

On Fri, Jul 04, 2014 at 07:43:14AM +0200, Jan Kiszka wrote:
>On 2014-07-04 04:52, Wanpeng Li wrote:
>> On Thu, Jul 03, 2014 at 01:27:05PM -0400, Bandan Das wrote:
>> [...]
>>> # modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0
>>>
>>> The Host CPU - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>>> qemu cmd to run L1 - 
>>> # qemu-system-x86_64 -drive file=level1.img,if=virtio,id=disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -drive file=level2.img,if=virtio,id=disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -vnc :2 --enable-kvm -monitor stdio -m 4G -net nic,macaddr=00:23:32:45:89:10 -net tap,ifname=tap0,script=/etc/qemu-ifup,downscript=no -smp 4 -cpu Nehalem,+vmx -serial pty
>>>
>>> qemu cmd to run L2 -
>>> # sudo qemu-system-x86_64 -hda VM/level2.img -vnc :0 --enable-kvm -monitor stdio -m 2G -smp 2 -cpu Nehalem -redir tcp:5555::22
>>>
>>> Additionally,
>>> L0 is FC19 with 3.16-rc3
>>> L1 and L2 are Ubuntu 14.04 with 3.13.0-24-generic
>>>
>>> Then start a kernel compilation inside L2 with "make -j3"
>>>
>>> There's no call trace on L0, both L0 and L1 are hung (or rather really slow) and
>>> L1 serial spews out CPU softlock up errors. Enabling panic on softlockup on L1 will give
>>> a trace with smp_call_function_many() I think the corresponding code in kernel/smp.c that
>>> triggers this is
>>>
>>> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>>>                      && !oops_in_progress && !early_boot_irqs_disabled);
>>>
>>> I know in most cases this is usually harmless, but in this specific case,
>>> it seems it's stuck here forever.
>>>
>>> Sorry, I don't have a L1 call trace handy atm, I can post that if you are interested.
>>>
>>> Note that this can take as much as 30 to 40 minutes to appear but once it does,
>>> you will know because both L1 and L2 will be stuck with the serial messages as I mentioned
>>> before. From my side, let me try this on another system to rule out any machine specific
>>> weirdness going on..
>>>
>> 
>> Thanks for your pointing out. 
>> 
>>> Please let me know if you need any further information.
>>>
>> 
>> I just run kvm-unit-tests w/ vmx.flat and eventinj.flat.
>> 
>> 
>> w/ vmx.flat and w/o my patch applied 
>> 
>> [...]
>> 
>> Test suite : interrupt
>> FAIL: direct interrupt while running guest
>> PASS: intercepted interrupt while running guest
>> FAIL: direct interrupt + hlt
>> FAIL: intercepted interrupt + hlt
>> FAIL: direct interrupt + activity state hlt
>> FAIL: intercepted interrupt + activity state hlt
>> PASS: running a guest with interrupt acknowledgement set
>> SUMMARY: 69 tests, 6 failures
>> 
>> w/ vmx.flat and w/ my patch applied 
>> 
>> [...]
>> 
>> Test suite : interrupt
>> PASS: direct interrupt while running guest
>> PASS: intercepted interrupt while running guest
>> PASS: direct interrupt + hlt
>> FAIL: intercepted interrupt + hlt
>> PASS: direct interrupt + activity state hlt
>> PASS: intercepted interrupt + activity state hlt
>> PASS: running a guest with interrupt acknowledgement set
>> 
>> SUMMARY: 69 tests, 2 failures 
>
>Which version (hash) of kvm-unit-tests are you using? All tests up to
>307621765a are running fine here, but since a0e30e712d not much is
>completing successfully anymore:
>

I just git pull my kvm-unit-tests to latest, the last commit is daeec9795d.

>enabling apic
>paging enabled
>cr0 = 80010011
>cr3 = 7fff000
>cr4 = 20
>PASS: test vmxon with FEATURE_CONTROL cleared
>PASS: test vmxon without FEATURE_CONTROL lock
>PASS: test enable VMX in FEATURE_CONTROL
>PASS: test FEATURE_CONTROL lock bit
>PASS: test vmxon
>FAIL: test vmptrld
>PASS: test vmclear
>init_vmcs : make_vmcs_current error
>FAIL: test vmptrst
>init_vmcs : make_vmcs_current error
>vmx_run : vmlaunch failed.
>FAIL: test vmlaunch
>FAIL: test vmlaunch
>
>SUMMARY: 10 tests, 4 unexpected failures


/opt/qemu/bin/qemu-system-x86_64 -enable-kvm -device pc-testdev -serial stdio 
-device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel ./x86/vmx.flat -cpu host

Test suite : interrupt
PASS: direct interrupt while running guest
PASS: intercepted interrupt while running guest
PASS: direct interrupt + hlt
FAIL: intercepted interrupt + hlt
PASS: direct interrupt + activity state hlt
PASS: intercepted interrupt + activity state hlt
PASS: running a guest with interrupt acknowledgement set

SUMMARY: 69 tests, 2 failures

However, interrupted interrupt + hlt always fail w/ and w/o my patch.

Regards,
Wanpeng Li 

>
>
>Jan
>
>-- 
>Siemens AG, Corporate Technology, CT RTC ITP SES-DE
>Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-03  5:15   ` Bandan Das
  2014-07-03  6:59     ` Wanpeng Li
@ 2014-07-04  6:17     ` Wanpeng Li
  2014-07-04  7:21       ` Jan Kiszka
  2014-07-07  0:56       ` Bandan Das
  1 sibling, 2 replies; 43+ messages in thread
From: Wanpeng Li @ 2014-07-04  6:17 UTC (permalink / raw)
  To: Bandan Das
  Cc: Jan Kiszka, Wanpeng Li, Paolo Bonzini, Gleb Natapov, Hu Robert,
	kvm, linux-kernel

On Thu, Jul 03, 2014 at 01:15:26AM -0400, Bandan Das wrote:
>Jan Kiszka <jan.kiszka@siemens.com> writes:
>
>> On 2014-07-02 08:54, Wanpeng Li wrote:
>>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>>> 
>>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>>> there is no still-pending event to L1 which blocked by nested_run_pending. 
>>> There is a race which lead to an interrupt will be injected to L2 which 
>>> belong to L1 if L0 send an interrupt to L1 during this window. 
>>> 
>>>                VCPU0                               another thread 
>>> 
>>> L1 intr not blocked on L2 first entry
>>> vmx_vcpu_run req event 
>>> kvm check request req event 
>>> check_nested_events don't have any intr 
>>> not nested exit 
>>>                                             intr occur (8254, lapic timer etc)
>>> inject_pending_event now have intr 
>>> inject interrupt 
>>> 
>>> This patch fix this race by introduced a l1_events_blocked field in nested_vmx 
>>> which indicates there is still-pending event which blocked by nested_run_pending, 
>>> and smart request a KVM_REQ_EVENT if there is a still-pending event which blocked 
>>> by nested_run_pending.
>>
>> There are more, unrelated reasons why KVM_REQ_EVENT could be set. Why
>> aren't those able to trigger this scenario?
>>
>> In any case, unconditionally setting KVM_REQ_EVENT seems strange and
>> should be changed.
>
>
>Ugh! I think I am hitting another one but this one's probably because 
>we are not setting KVM_REQ_EVENT for something we should.
>
>Before this patch, I was able to hit this bug everytime with 
>"modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0" and then booting 
>L2. I can verify that I was indeed hitting the race in inject_pending_event.
>
>After this patch, I believe I am hitting another bug - this happens 
>after I boot L2, as above, and then start a Linux kernel compilation
>and then wait and watch :) It's a pain to debug because this happens
>almost once in three times; it never happens if I run with ept=1, however,
>I think that's only because the test completes sooner. But I can confirm
>that I don't see it if I always set REQ_EVENT if nested_run_pending is set instead of
>the approach this patch takes.
>(Any debug hints help appreciated!)
>
>So, I am not sure if this is the right fix. Rather, I think the safer thing
>to do is to have the interrupt pending check for injection into L1 at
>the "same site" as the call to kvm_queue_interrupt() just like we had before 
>commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
>nested events checks together ?
>

How about revert commit b6b8a1451 and try if the bug which you mentioned
is still there?

Regards,
Wanpeng Li 

>PS - Actually, a much easier fix (or rather hack) is to return 1 in 
>vmx_interrupt_allowed() (as I mentioned elsewhere) only if 
>!is_guest_mode(vcpu) That way, the pending interrupt interrupt 
>can be taken care of correctly during the next vmexit.
>
>Bandan
>
>> Jan
>>
>>> 
>>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>>> ---
>>>  arch/x86/kvm/vmx.c | 20 +++++++++++++++-----
>>>  1 file changed, 15 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index f4e5aed..fe69c49 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -372,6 +372,7 @@ struct nested_vmx {
>>>  	u64 vmcs01_tsc_offset;
>>>  	/* L2 must run next, and mustn't decide to exit to L1. */
>>>  	bool nested_run_pending;
>>> +	bool l1_events_blocked;
>>>  	/*
>>>  	 * Guest pages referred to in vmcs02 with host-physical pointers, so
>>>  	 * we must keep them pinned while L2 runs.
>>> @@ -7380,8 +7381,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>>  	 * we did not inject a still-pending event to L1 now because of
>>>  	 * nested_run_pending, we need to re-enable this bit.
>>>  	 */
>>> -	if (vmx->nested.nested_run_pending)
>>> +	if (to_vmx(vcpu)->nested.l1_events_blocked) {
>>> +		to_vmx(vcpu)->nested.l1_events_blocked = false;
>>>  		kvm_make_request(KVM_REQ_EVENT, vcpu);
>>> +	}
>>>  
>>>  	vmx->nested.nested_run_pending = 0;
>>>  
>>> @@ -8197,15 +8200,20 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>>>  
>>>  	if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
>>>  	    vmx->nested.preemption_timer_expired) {
>>> -		if (vmx->nested.nested_run_pending)
>>> +		if (vmx->nested.nested_run_pending) {
>>> +			vmx->nested.l1_events_blocked = true;
>>>  			return -EBUSY;
>>> +		}
>>>  		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
>>>  		return 0;
>>>  	}
>>>  
>>>  	if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
>>> -		if (vmx->nested.nested_run_pending ||
>>> -		    vcpu->arch.interrupt.pending)
>>> +		if (vmx->nested.nested_run_pending) {
>>> +			vmx->nested.l1_events_blocked = true;
>>> +			return -EBUSY;
>>> +		}
>>> +		if (vcpu->arch.interrupt.pending)
>>>  			return -EBUSY;
>>>  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
>>>  				  NMI_VECTOR | INTR_TYPE_NMI_INTR |
>>> @@ -8221,8 +8229,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>>>  
>>>  	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
>>>  	    nested_exit_on_intr(vcpu)) {
>>> -		if (vmx->nested.nested_run_pending)
>>> +		if (vmx->nested.nested_run_pending) {
>>> +			vmx->nested.l1_events_blocked = true;
>>>  			return -EBUSY;
>>> +		}
>>>  		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
>>>  	}
>>>  
>>> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  6:08             ` Wanpeng Li
@ 2014-07-04  7:19               ` Jan Kiszka
  2014-07-04  7:39                 ` Wanpeng Li
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Kiszka @ 2014-07-04  7:19 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Bandan Das, Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

On 2014-07-04 08:08, Wanpeng Li wrote:
> On Fri, Jul 04, 2014 at 07:43:14AM +0200, Jan Kiszka wrote:
>> On 2014-07-04 04:52, Wanpeng Li wrote:
>>> On Thu, Jul 03, 2014 at 01:27:05PM -0400, Bandan Das wrote:
>>> [...]
>>>> # modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0
>>>>
>>>> The Host CPU - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>>>> qemu cmd to run L1 - 
>>>> # qemu-system-x86_64 -drive file=level1.img,if=virtio,id=disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -drive file=level2.img,if=virtio,id=disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -vnc :2 --enable-kvm -monitor stdio -m 4G -net nic,macaddr=00:23:32:45:89:10 -net tap,ifname=tap0,script=/etc/qemu-ifup,downscript=no -smp 4 -cpu Nehalem,+vmx -serial pty
>>>>
>>>> qemu cmd to run L2 -
>>>> # sudo qemu-system-x86_64 -hda VM/level2.img -vnc :0 --enable-kvm -monitor stdio -m 2G -smp 2 -cpu Nehalem -redir tcp:5555::22
>>>>
>>>> Additionally,
>>>> L0 is FC19 with 3.16-rc3
>>>> L1 and L2 are Ubuntu 14.04 with 3.13.0-24-generic
>>>>
>>>> Then start a kernel compilation inside L2 with "make -j3"
>>>>
>>>> There's no call trace on L0, both L0 and L1 are hung (or rather really slow) and
>>>> L1 serial spews out CPU softlock up errors. Enabling panic on softlockup on L1 will give
>>>> a trace with smp_call_function_many() I think the corresponding code in kernel/smp.c that
>>>> triggers this is
>>>>
>>>> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>>>>                      && !oops_in_progress && !early_boot_irqs_disabled);
>>>>
>>>> I know in most cases this is usually harmless, but in this specific case,
>>>> it seems it's stuck here forever.
>>>>
>>>> Sorry, I don't have a L1 call trace handy atm, I can post that if you are interested.
>>>>
>>>> Note that this can take as much as 30 to 40 minutes to appear but once it does,
>>>> you will know because both L1 and L2 will be stuck with the serial messages as I mentioned
>>>> before. From my side, let me try this on another system to rule out any machine specific
>>>> weirdness going on..
>>>>
>>>
>>> Thanks for your pointing out. 
>>>
>>>> Please let me know if you need any further information.
>>>>
>>>
>>> I just run kvm-unit-tests w/ vmx.flat and eventinj.flat.
>>>
>>>
>>> w/ vmx.flat and w/o my patch applied 
>>>
>>> [...]
>>>
>>> Test suite : interrupt
>>> FAIL: direct interrupt while running guest
>>> PASS: intercepted interrupt while running guest
>>> FAIL: direct interrupt + hlt
>>> FAIL: intercepted interrupt + hlt
>>> FAIL: direct interrupt + activity state hlt
>>> FAIL: intercepted interrupt + activity state hlt
>>> PASS: running a guest with interrupt acknowledgement set
>>> SUMMARY: 69 tests, 6 failures
>>>
>>> w/ vmx.flat and w/ my patch applied 
>>>
>>> [...]
>>>
>>> Test suite : interrupt
>>> PASS: direct interrupt while running guest
>>> PASS: intercepted interrupt while running guest
>>> PASS: direct interrupt + hlt
>>> FAIL: intercepted interrupt + hlt
>>> PASS: direct interrupt + activity state hlt
>>> PASS: intercepted interrupt + activity state hlt
>>> PASS: running a guest with interrupt acknowledgement set
>>>
>>> SUMMARY: 69 tests, 2 failures 
>>
>> Which version (hash) of kvm-unit-tests are you using? All tests up to
>> 307621765a are running fine here, but since a0e30e712d not much is
>> completing successfully anymore:
>>
> 
> I just git pull my kvm-unit-tests to latest, the last commit is daeec9795d.
> 
>> enabling apic
>> paging enabled
>> cr0 = 80010011
>> cr3 = 7fff000
>> cr4 = 20
>> PASS: test vmxon with FEATURE_CONTROL cleared
>> PASS: test vmxon without FEATURE_CONTROL lock
>> PASS: test enable VMX in FEATURE_CONTROL
>> PASS: test FEATURE_CONTROL lock bit
>> PASS: test vmxon
>> FAIL: test vmptrld
>> PASS: test vmclear
>> init_vmcs : make_vmcs_current error
>> FAIL: test vmptrst
>> init_vmcs : make_vmcs_current error
>> vmx_run : vmlaunch failed.
>> FAIL: test vmlaunch
>> FAIL: test vmlaunch
>>
>> SUMMARY: 10 tests, 4 unexpected failures
> 
> 
> /opt/qemu/bin/qemu-system-x86_64 -enable-kvm -device pc-testdev -serial stdio 
> -device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel ./x86/vmx.flat -cpu host
> 
> Test suite : interrupt
> PASS: direct interrupt while running guest
> PASS: intercepted interrupt while running guest
> PASS: direct interrupt + hlt
> FAIL: intercepted interrupt + hlt
> PASS: direct interrupt + activity state hlt
> PASS: intercepted interrupt + activity state hlt
> PASS: running a guest with interrupt acknowledgement set
> 
> SUMMARY: 69 tests, 2 failures

Somehow I'm missing the other 31 vmx test we have now... Could you post
the full log? Please also post the output of qemu/scripts/kvm/vmxcap on
your test host to compare with what I have here.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  6:17     ` [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race Wanpeng Li
@ 2014-07-04  7:21       ` Jan Kiszka
  2014-07-07  0:56       ` Bandan Das
  1 sibling, 0 replies; 43+ messages in thread
From: Jan Kiszka @ 2014-07-04  7:21 UTC (permalink / raw)
  To: Wanpeng Li, Bandan Das
  Cc: Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

On 2014-07-04 08:17, Wanpeng Li wrote:
> On Thu, Jul 03, 2014 at 01:15:26AM -0400, Bandan Das wrote:
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>
>>> On 2014-07-02 08:54, Wanpeng Li wrote:
>>>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>>>>
>>>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>>>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>>>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>>>> there is no still-pending event to L1 which blocked by nested_run_pending. 
>>>> There is a race which lead to an interrupt will be injected to L2 which 
>>>> belong to L1 if L0 send an interrupt to L1 during this window. 
>>>>
>>>>                VCPU0                               another thread 
>>>>
>>>> L1 intr not blocked on L2 first entry
>>>> vmx_vcpu_run req event 
>>>> kvm check request req event 
>>>> check_nested_events don't have any intr 
>>>> not nested exit 
>>>>                                             intr occur (8254, lapic timer etc)
>>>> inject_pending_event now have intr 
>>>> inject interrupt 
>>>>
>>>> This patch fix this race by introduced a l1_events_blocked field in nested_vmx 
>>>> which indicates there is still-pending event which blocked by nested_run_pending, 
>>>> and smart request a KVM_REQ_EVENT if there is a still-pending event which blocked 
>>>> by nested_run_pending.
>>>
>>> There are more, unrelated reasons why KVM_REQ_EVENT could be set. Why
>>> aren't those able to trigger this scenario?
>>>
>>> In any case, unconditionally setting KVM_REQ_EVENT seems strange and
>>> should be changed.
>>
>>
>> Ugh! I think I am hitting another one but this one's probably because 
>> we are not setting KVM_REQ_EVENT for something we should.
>>
>> Before this patch, I was able to hit this bug everytime with 
>> "modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0" and then booting 
>> L2. I can verify that I was indeed hitting the race in inject_pending_event.
>>
>> After this patch, I believe I am hitting another bug - this happens 
>> after I boot L2, as above, and then start a Linux kernel compilation
>> and then wait and watch :) It's a pain to debug because this happens
>> almost once in three times; it never happens if I run with ept=1, however,
>> I think that's only because the test completes sooner. But I can confirm
>> that I don't see it if I always set REQ_EVENT if nested_run_pending is set instead of
>> the approach this patch takes.
>> (Any debug hints help appreciated!)
>>
>> So, I am not sure if this is the right fix. Rather, I think the safer thing
>> to do is to have the interrupt pending check for injection into L1 at
>> the "same site" as the call to kvm_queue_interrupt() just like we had before 
>> commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
>> nested events checks together ?
>>
> 
> How about revert commit b6b8a1451 and try if the bug which you mentioned
> is still there?

I suspect you will have to reset back to b6b8a1451^ for this as other
changes depend on this commit now.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  7:19               ` Jan Kiszka
@ 2014-07-04  7:39                 ` Wanpeng Li
  2014-07-04  7:46                   ` Paolo Bonzini
  0 siblings, 1 reply; 43+ messages in thread
From: Wanpeng Li @ 2014-07-04  7:39 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Bandan Das, Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5090 bytes --]

On Fri, Jul 04, 2014 at 09:19:54AM +0200, Jan Kiszka wrote:
>On 2014-07-04 08:08, Wanpeng Li wrote:
>> On Fri, Jul 04, 2014 at 07:43:14AM +0200, Jan Kiszka wrote:
>>> On 2014-07-04 04:52, Wanpeng Li wrote:
>>>> On Thu, Jul 03, 2014 at 01:27:05PM -0400, Bandan Das wrote:
>>>> [...]
>>>>> # modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0
>>>>>
>>>>> The Host CPU - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>>>>> qemu cmd to run L1 - 
>>>>> # qemu-system-x86_64 -drive file=level1.img,if=virtio,id=disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -drive file=level2.img,if=virtio,id=disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=threads -vnc :2 --enable-kvm -monitor stdio -m 4G -net nic,macaddr=00:23:32:45:89:10 -net tap,ifname=tap0,script=/etc/qemu-ifup,downscript=no -smp 4 -cpu Nehalem,+vmx -serial pty
>>>>>
>>>>> qemu cmd to run L2 -
>>>>> # sudo qemu-system-x86_64 -hda VM/level2.img -vnc :0 --enable-kvm -monitor stdio -m 2G -smp 2 -cpu Nehalem -redir tcp:5555::22
>>>>>
>>>>> Additionally,
>>>>> L0 is FC19 with 3.16-rc3
>>>>> L1 and L2 are Ubuntu 14.04 with 3.13.0-24-generic
>>>>>
>>>>> Then start a kernel compilation inside L2 with "make -j3"
>>>>>
>>>>> There's no call trace on L0, both L0 and L1 are hung (or rather really slow) and
>>>>> L1 serial spews out CPU softlock up errors. Enabling panic on softlockup on L1 will give
>>>>> a trace with smp_call_function_many() I think the corresponding code in kernel/smp.c that
>>>>> triggers this is
>>>>>
>>>>> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>>>>>                      && !oops_in_progress && !early_boot_irqs_disabled);
>>>>>
>>>>> I know in most cases this is usually harmless, but in this specific case,
>>>>> it seems it's stuck here forever.
>>>>>
>>>>> Sorry, I don't have a L1 call trace handy atm, I can post that if you are interested.
>>>>>
>>>>> Note that this can take as much as 30 to 40 minutes to appear but once it does,
>>>>> you will know because both L1 and L2 will be stuck with the serial messages as I mentioned
>>>>> before. From my side, let me try this on another system to rule out any machine specific
>>>>> weirdness going on..
>>>>>
>>>>
>>>> Thanks for your pointing out. 
>>>>
>>>>> Please let me know if you need any further information.
>>>>>
>>>>
>>>> I just run kvm-unit-tests w/ vmx.flat and eventinj.flat.
>>>>
>>>>
>>>> w/ vmx.flat and w/o my patch applied 
>>>>
>>>> [...]
>>>>
>>>> Test suite : interrupt
>>>> FAIL: direct interrupt while running guest
>>>> PASS: intercepted interrupt while running guest
>>>> FAIL: direct interrupt + hlt
>>>> FAIL: intercepted interrupt + hlt
>>>> FAIL: direct interrupt + activity state hlt
>>>> FAIL: intercepted interrupt + activity state hlt
>>>> PASS: running a guest with interrupt acknowledgement set
>>>> SUMMARY: 69 tests, 6 failures
>>>>
>>>> w/ vmx.flat and w/ my patch applied 
>>>>
>>>> [...]
>>>>
>>>> Test suite : interrupt
>>>> PASS: direct interrupt while running guest
>>>> PASS: intercepted interrupt while running guest
>>>> PASS: direct interrupt + hlt
>>>> FAIL: intercepted interrupt + hlt
>>>> PASS: direct interrupt + activity state hlt
>>>> PASS: intercepted interrupt + activity state hlt
>>>> PASS: running a guest with interrupt acknowledgement set
>>>>
>>>> SUMMARY: 69 tests, 2 failures 
>>>
>>> Which version (hash) of kvm-unit-tests are you using? All tests up to
>>> 307621765a are running fine here, but since a0e30e712d not much is
>>> completing successfully anymore:
>>>
>> 
>> I just git pull my kvm-unit-tests to latest, the last commit is daeec9795d.
>> 
>>> enabling apic
>>> paging enabled
>>> cr0 = 80010011
>>> cr3 = 7fff000
>>> cr4 = 20
>>> PASS: test vmxon with FEATURE_CONTROL cleared
>>> PASS: test vmxon without FEATURE_CONTROL lock
>>> PASS: test enable VMX in FEATURE_CONTROL
>>> PASS: test FEATURE_CONTROL lock bit
>>> PASS: test vmxon
>>> FAIL: test vmptrld
>>> PASS: test vmclear
>>> init_vmcs : make_vmcs_current error
>>> FAIL: test vmptrst
>>> init_vmcs : make_vmcs_current error
>>> vmx_run : vmlaunch failed.
>>> FAIL: test vmlaunch
>>> FAIL: test vmlaunch
>>>
>>> SUMMARY: 10 tests, 4 unexpected failures
>> 
>> 
>> /opt/qemu/bin/qemu-system-x86_64 -enable-kvm -device pc-testdev -serial stdio 
>> -device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel ./x86/vmx.flat -cpu host
>> 
>> Test suite : interrupt
>> PASS: direct interrupt while running guest
>> PASS: intercepted interrupt while running guest
>> PASS: direct interrupt + hlt
>> FAIL: intercepted interrupt + hlt
>> PASS: direct interrupt + activity state hlt
>> PASS: intercepted interrupt + activity state hlt
>> PASS: running a guest with interrupt acknowledgement set
>> 
>> SUMMARY: 69 tests, 2 failures
>
>Somehow I'm missing the other 31 vmx test we have now... Could you post
>the full log? Please also post the output of qemu/scripts/kvm/vmxcap on
>your test host to compare with what I have here.

They are in attachment. 

Regards,
Wanpeng Li 

>
>Thanks,
>Jan
>
>-- 
>Siemens AG, Corporate Technology, CT RTC ITP SES-DE
>Corporate Competence Center Embedded Linux

[-- Attachment #2: vmxcap --]
[-- Type: text/plain, Size: 4533 bytes --]

Basic VMX Information
  Revision                                 18
  VMCS size                                1024
  VMCS restricted to 32 bit addresses      no
  Dual-monitor support                     yes
  VMCS memory type                         6
  INS/OUTS instruction information         yes
  IA32_VMX_TRUE_*_CTLS support             yes
pin-based controls
  External interrupt exiting               yes
  NMI exiting                              yes
  Virtual NMIs                             yes
  Activate VMX-preemption timer            yes
  Process posted interrupts                no
primary processor-based controls
  Interrupt window exiting                 yes
  Use TSC offsetting                       yes
  HLT exiting                              yes
  INVLPG exiting                           yes
  MWAIT exiting                            yes
  RDPMC exiting                            yes
  RDTSC exiting                            yes
  CR3-load exiting                         default
  CR3-store exiting                        default
  CR8-load exiting                         yes
  CR8-store exiting                        yes
  Use TPR shadow                           yes
  NMI-window exiting                       yes
  MOV-DR exiting                           yes
  Unconditional I/O exiting                yes
  Use I/O bitmaps                          yes
  Monitor trap flag                        yes
  Use MSR bitmaps                          yes
  MONITOR exiting                          yes
  PAUSE exiting                            yes
  Activate secondary control               yes
secondary processor-based controls
  Virtualize APIC accesses                 yes
  Enable EPT                               yes
  Descriptor-table exiting                 yes
  Enable RDTSCP                            yes
  Virtualize x2APIC mode                   yes
  Enable VPID                              yes
  WBINVD exiting                           yes
  Unrestricted guest                       yes
  APIC register emulation                  no
  Virtual interrupt delivery               no
  PAUSE-loop exiting                       yes
  RDRAND exiting                           yes
  Enable INVPCID                           yes
  Enable VM functions                      yes
  VMCS shadowing                           yes
  EPT-violation #VE                        no
VM-Exit controls
  Save debug controls                      default
  Host address-space size                  yes
  Load IA32_PERF_GLOBAL_CTRL               yes
  Acknowledge interrupt on exit            yes
  Save IA32_PAT                            yes
  Load IA32_PAT                            yes
  Save IA32_EFER                           yes
  Load IA32_EFER                           yes
  Save VMX-preemption timer value          yes
VM-Entry controls
  Load debug controls                      default
  IA-64 mode guest                         yes
  Entry to SMM                             yes
  Deactivate dual-monitor treatment        yes
  Load IA32_PERF_GLOBAL_CTRL               yes
  Load IA32_PAT                            yes
  Load IA32_EFER                           yes
Miscellaneous data
  VMX-preemption timer scale (log2)        5
  Store EFER.LMA into IA-32e mode guest control yes
  HLT activity state                       yes
  Shutdown activity state                  yes
  Wait-for-SIPI activity state             yes
  IA32_SMBASE support                      yes
  Number of CR3-target values              4
  MSR-load/store count recommenation       0
  IA32_SMM_MONITOR_CTL[2] can be set to 1  yes
  VMWRITE to VM-exit information fields    yes
  MSEG revision identifier                 0
VPID and EPT capabilities
  Execute-only EPT translations            yes
  Page-walk length 4                       yes
  Paging-structure memory type UC          yes
  Paging-structure memory type WB          yes
  2MB EPT pages                            yes
  1GB EPT pages                            yes
  INVEPT supported                         yes
  EPT accessed and dirty flags             yes
  Single-context INVEPT                    yes
  All-context INVEPT                       yes
  INVVPID supported                        yes
  Individual-address INVVPID               yes
  Single-context INVVPID                   yes
  All-context INVVPID                      yes
  Single-context-retaining-globals INVVPID yes
VM Functions
  EPTP Switching                           yes


[-- Attachment #3: vmx.flat.dump --]
[-- Type: text/plain, Size: 2385 bytes --]

enabling apic
paging enabled
cr0 = 80010011
cr3 = 7fff000
cr4 = 20
PASS: test vmxon with FEATURE_CONTROL cleared
PASS: test vmxon without FEATURE_CONTROL lock
PASS: test enable VMX in FEATURE_CONTROL
PASS: test FEATURE_CONTROL lock bit
PASS: test vmxon
PASS: test vmptrld
PASS: test vmclear
PASS: test vmptrst
PASS: test vmxoff

Test suite : vmenter
PASS: test vmlaunch
PASS: test vmresume

Test suite : preemption timer
PASS: Keep preemption value
PASS: Save preemption value
PASS: busy-wait for preemption timer
PASS: preemption timer during hlt
PASS: preemption timer with 0 value

Test suite : control field PAT
PASS: Exit save PAT
PASS: Exit load PAT
PASS: Entry load PAT

Test suite : control field EFER
PASS: Exit save EFER
PASS: Exit load EFER
PASS: Entry load EFER

Test suite : CR shadowing
PASS: Read through CR0
PASS: Read through CR4
PASS: Write through CR0
PASS: Write through CR4
PASS: Read shadowing CR0
PASS: Read shadowing CR4
PASS: Write shadowing CR0 (same value)
PASS: Write shadowing CR4 (same value)
PASS: Write shadowing different X86_CR0_TS
PASS: Write shadowing different X86_CR0_MP
PASS: Write shadowing different X86_CR4_TSD
PASS: Write shadowing different X86_CR4_DE

Test suite : I/O bitmap
PASS: I/O bitmap - I/O pass
PASS: I/O bitmap - I/O width, byte
PASS: I/O bitmap - I/O direction, in
PASS: I/O bitmap - trap in
PASS: I/O bitmap - I/O width, word
PASS: I/O bitmap - I/O direction, out
PASS: I/O bitmap - trap out
PASS: I/O bitmap - I/O width, long
PASS: I/O bitmap - I/O port, low part
PASS: I/O bitmap - I/O port, high part
PASS: I/O bitmap - partial pass
PASS: I/O bitmap - overrun
PASS: I/O bitmap - ignore unconditional exiting
PASS: I/O bitmap - unconditional exiting

Test suite : instruction intercept
PASS: HLT
PASS: INVLPG
PASS: MWAIT
PASS: RDPMC
PASS: RDTSC
PASS: MONITOR
PASS: PAUSE
PASS: WBINVD
PASS: CPUID
PASS: INVD

Test suite : EPT framework
PASS: EPT basic framework
PASS: EPT misconfigurations
PASS: EPT violation - page permission
FAIL: EPT violation - paging structure

Test suite : interrupt
PASS: direct interrupt while running guest
PASS: intercepted interrupt while running guest
PASS: direct interrupt + hlt
FAIL: intercepted interrupt + hlt
PASS: direct interrupt + activity state hlt
PASS: intercepted interrupt + activity state hlt
`ASS: running a guest with interrupt acknowledgement set

SUMMARY: 69 tests, 2 failures


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  5:43           ` Jan Kiszka
  2014-07-04  6:08             ` Wanpeng Li
@ 2014-07-04  7:42             ` Paolo Bonzini
  2014-07-04  9:33             ` Jan Kiszka
  2 siblings, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2014-07-04  7:42 UTC (permalink / raw)
  To: Jan Kiszka, Wanpeng Li, Bandan Das
  Cc: Gleb Natapov, Hu Robert, kvm, linux-kernel

Il 04/07/2014 07:43, Jan Kiszka ha scritto:
> Which version (hash) of kvm-unit-tests are you using? All tests up to
> 307621765a are running fine here, but since a0e30e712d not much is
> completing successfully anymore:

Which test failed to init and triggered the change in the patch?

The patch was needed here to fix failures when KVM is loaded with ept=0.

BTW, "FAIL: intercepted interrupt + hlt" is always failing here too 
(Xeon E5 Sandy Bridge).

Paolo

> enabling apic
> paging enabled
> cr0 = 80010011
> cr3 = 7fff000
> cr4 = 20
> PASS: test vmxon with FEATURE_CONTROL cleared
> PASS: test vmxon without FEATURE_CONTROL lock
> PASS: test enable VMX in FEATURE_CONTROL
> PASS: test FEATURE_CONTROL lock bit
> PASS: test vmxon
> FAIL: test vmptrld
> PASS: test vmclear
> init_vmcs : make_vmcs_current error
> FAIL: test vmptrst
> init_vmcs : make_vmcs_current error
> vmx_run : vmlaunch failed.
> FAIL: test vmlaunch
> FAIL: test vmlaunch
>
> SUMMARY: 10 tests, 4 unexpected failures
>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  7:39                 ` Wanpeng Li
@ 2014-07-04  7:46                   ` Paolo Bonzini
  2014-07-04  7:59                     ` Wanpeng Li
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2014-07-04  7:46 UTC (permalink / raw)
  To: Wanpeng Li, Jan Kiszka
  Cc: Bandan Das, Gleb Natapov, Hu Robert, kvm, linux-kernel

Il 04/07/2014 09:39, Wanpeng Li ha scritto:
> PASS: test vmxon with FEATURE_CONTROL cleared
> PASS: test vmxon without FEATURE_CONTROL lock
> PASS: test enable VMX in FEATURE_CONTROL
> PASS: test FEATURE_CONTROL lock bit
> PASS: test vmxon
> PASS: test vmptrld
> PASS: test vmclear
> PASS: test vmptrst
> PASS: test vmxoff

You are not running the latest versions of the tests.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  7:46                   ` Paolo Bonzini
@ 2014-07-04  7:59                     ` Wanpeng Li
  2014-07-04  8:14                       ` Paolo Bonzini
  0 siblings, 1 reply; 43+ messages in thread
From: Wanpeng Li @ 2014-07-04  7:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan Kiszka, Bandan Das, Gleb Natapov, Hu Robert, kvm, linux-kernel

On Fri, Jul 04, 2014 at 09:46:38AM +0200, Paolo Bonzini wrote:
>Il 04/07/2014 09:39, Wanpeng Li ha scritto:
>>PASS: test vmxon with FEATURE_CONTROL cleared
>>PASS: test vmxon without FEATURE_CONTROL lock
>>PASS: test enable VMX in FEATURE_CONTROL
>>PASS: test FEATURE_CONTROL lock bit
>>PASS: test vmxon
>>PASS: test vmptrld
>>PASS: test vmclear
>>PASS: test vmptrst
>>PASS: test vmxoff
>
>You are not running the latest versions of the tests.
>

The last commit in my tree is 

commit daeec9795d3e6d4e9636588b6cb5fcd6e00d6d46
Author: Bandan Das <bsd@redhat.com>
Date:   Mon Jun 9 17:04:54 2014 -0400

    VMX: Updated test_vmclear and test_vmptrld


>Paolo
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  7:59                     ` Wanpeng Li
@ 2014-07-04  8:14                       ` Paolo Bonzini
  2014-07-04  8:24                         ` Wanpeng Li
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2014-07-04  8:14 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Jan Kiszka, Bandan Das, Gleb Natapov, Hu Robert, kvm, linux-kernel

Il 04/07/2014 09:59, Wanpeng Li ha scritto:
>> >You are not running the latest versions of the tests.
>> >
> The last commit in my tree is
>
> commit daeec9795d3e6d4e9636588b6cb5fcd6e00d6d46
> Author: Bandan Das <bsd@redhat.com>
> Date:   Mon Jun 9 17:04:54 2014 -0400
>
>     VMX: Updated test_vmclear and test_vmptrld

Can you try recompiling x86/vmx.*?

Paolo


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  8:14                       ` Paolo Bonzini
@ 2014-07-04  8:24                         ` Wanpeng Li
  0 siblings, 0 replies; 43+ messages in thread
From: Wanpeng Li @ 2014-07-04  8:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jan Kiszka, Bandan Das, Gleb Natapov, Hu Robert, kvm, 3linux-kernel

On Fri, Jul 04, 2014 at 10:14:34AM +0200, Paolo Bonzini wrote:
>Il 04/07/2014 09:59, Wanpeng Li ha scritto:
>>>>You are not running the latest versions of the tests.
>>>>
>>The last commit in my tree is
>>
>>commit daeec9795d3e6d4e9636588b6cb5fcd6e00d6d46
>>Author: Bandan Das <bsd@redhat.com>
>>Date:   Mon Jun 9 17:04:54 2014 -0400
>>
>>    VMX: Updated test_vmclear and test_vmptrld
>
>Can you try recompiling x86/vmx.*?
>

My fault, you are right.

Regards,
Wanpeng Li 

>Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  5:43           ` Jan Kiszka
  2014-07-04  6:08             ` Wanpeng Li
  2014-07-04  7:42             ` Paolo Bonzini
@ 2014-07-04  9:33             ` Jan Kiszka
  2014-07-04  9:38               ` Paolo Bonzini
  2 siblings, 1 reply; 43+ messages in thread
From: Jan Kiszka @ 2014-07-04  9:33 UTC (permalink / raw)
  To: Wanpeng Li, Bandan Das
  Cc: Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

On 2014-07-04 07:43, Jan Kiszka wrote:
> All tests up to
> 307621765a are running fine here, but since a0e30e712d not much is
> completing successfully anymore:
> 
> enabling apic
> paging enabled
> cr0 = 80010011
> cr3 = 7fff000
> cr4 = 20
> PASS: test vmxon with FEATURE_CONTROL cleared
> PASS: test vmxon without FEATURE_CONTROL lock
> PASS: test enable VMX in FEATURE_CONTROL
> PASS: test FEATURE_CONTROL lock bit
> PASS: test vmxon
> FAIL: test vmptrld
> PASS: test vmclear
> init_vmcs : make_vmcs_current error
> FAIL: test vmptrst
> init_vmcs : make_vmcs_current error
> vmx_run : vmlaunch failed.
> FAIL: test vmlaunch
> FAIL: test vmlaunch
> 
> SUMMARY: 10 tests, 4 unexpected failures

Here is the reason for my failures:

000000000000010f <make_vmcs_current>:
     10f:       48 89 7c 24 f8          mov    %rdi,-0x8(%rsp)
     114:       9c                      pushfq
     115:       58                      pop    %rax
     116:       48 83 c8 41             or     $0x41,%rax
     11a:       50                      push   %rax
     11b:       9d                      popfq
     11c:       0f c7 74 24 f8          vmptrld -0x8(%rsp)
     121:       0f 96 c0                setbe  %al
     124:       0f b6 c0                movzbl %al,%eax
     127:       c3                      retq

The compiler is not aware of the fact that push/pop exists in this
function and, thus, places the vmcs parameter on the stack without
reserving the space. So the pushfq will overwrite the vmcs pointer and
let the function fail.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  9:33             ` Jan Kiszka
@ 2014-07-04  9:38               ` Paolo Bonzini
  2014-07-04 10:52                 ` Jan Kiszka
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2014-07-04  9:38 UTC (permalink / raw)
  To: Jan Kiszka, Wanpeng Li, Bandan Das
  Cc: Gleb Natapov, Hu Robert, kvm, linux-kernel

Il 04/07/2014 11:33, Jan Kiszka ha scritto:
>
> The compiler is not aware of the fact that push/pop exists in this
> function and, thus, places the vmcs parameter on the stack without
> reserving the space. So the pushfq will overwrite the vmcs pointer and
> let the function fail.

Is that just a missing "memory" clobber?  push/pop clobbers memory.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  9:38               ` Paolo Bonzini
@ 2014-07-04 10:52                 ` Jan Kiszka
  2014-07-04 11:07                   ` Jan Kiszka
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Kiszka @ 2014-07-04 10:52 UTC (permalink / raw)
  To: Paolo Bonzini, Wanpeng Li, Bandan Das
  Cc: Gleb Natapov, Hu Robert, kvm, linux-kernel

On 2014-07-04 11:38, Paolo Bonzini wrote:
> Il 04/07/2014 11:33, Jan Kiszka ha scritto:
>>
>> The compiler is not aware of the fact that push/pop exists in this
>> function and, thus, places the vmcs parameter on the stack without
>> reserving the space. So the pushfq will overwrite the vmcs pointer and
>> let the function fail.
> 
> Is that just a missing "memory" clobber?  push/pop clobbers memory.

Nope, we would needs some clobber like "stack". I wonder what is
required to use push in inline assembly safely?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04 10:52                 ` Jan Kiszka
@ 2014-07-04 11:07                   ` Jan Kiszka
  2014-07-04 11:28                     ` Paolo Bonzini
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Kiszka @ 2014-07-04 11:07 UTC (permalink / raw)
  To: Paolo Bonzini, Wanpeng Li, Bandan Das
  Cc: Gleb Natapov, Hu Robert, kvm, linux-kernel

On 2014-07-04 12:52, Jan Kiszka wrote:
> On 2014-07-04 11:38, Paolo Bonzini wrote:
>> Il 04/07/2014 11:33, Jan Kiszka ha scritto:
>>>
>>> The compiler is not aware of the fact that push/pop exists in this
>>> function and, thus, places the vmcs parameter on the stack without
>>> reserving the space. So the pushfq will overwrite the vmcs pointer and
>>> let the function fail.
>>
>> Is that just a missing "memory" clobber?  push/pop clobbers memory.
> 
> Nope, we would needs some clobber like "stack". I wonder what is
> required to use push in inline assembly safely?

My colleague just found the answer: -mno-red-zone is required for 64-bit
in order to play freely with the stack (or you need to stay off that
zone, apparently some 128 bytes below the stack pointer). The kernel
sets that switch, our unit tests do not.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04 11:07                   ` Jan Kiszka
@ 2014-07-04 11:28                     ` Paolo Bonzini
  2014-07-04 11:46                       ` [PATCH] Add -mno-red-zone to CFLAGS for x86-64 Jan Kiszka
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2014-07-04 11:28 UTC (permalink / raw)
  To: Jan Kiszka, Wanpeng Li, Bandan Das
  Cc: Gleb Natapov, Hu Robert, kvm, linux-kernel

Il 04/07/2014 13:07, Jan Kiszka ha scritto:
> On 2014-07-04 12:52, Jan Kiszka wrote:
>> On 2014-07-04 11:38, Paolo Bonzini wrote:
>>> Il 04/07/2014 11:33, Jan Kiszka ha scritto:
>>>>
>>>> The compiler is not aware of the fact that push/pop exists in this
>>>> function and, thus, places the vmcs parameter on the stack without
>>>> reserving the space. So the pushfq will overwrite the vmcs pointer and
>>>> let the function fail.
>>>
>>> Is that just a missing "memory" clobber?  push/pop clobbers memory.
>>
>> Nope, we would needs some clobber like "stack". I wonder what is
>> required to use push in inline assembly safely?
>
> My colleague just found the answer: -mno-red-zone is required for 64-bit
> in order to play freely with the stack (or you need to stay off that
> zone, apparently some 128 bytes below the stack pointer). The kernel
> sets that switch, our unit tests do not.

Are you posting a patch?  (Also, what compiler is that?)

Paolo


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH] Add -mno-red-zone to CFLAGS for x86-64
  2014-07-04 11:28                     ` Paolo Bonzini
@ 2014-07-04 11:46                       ` Jan Kiszka
  2014-07-04 11:49                         ` Paolo Bonzini
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Kiszka @ 2014-07-04 11:46 UTC (permalink / raw)
  To: Paolo Bonzini, Bandan Das; +Cc: kvm

This is required in order to use the stack in inline assembly (like
pushf; pop reg) without clashing with the compiler's stack assignment.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Seen with old gcc 4.5.1.

 config/config-x86_64.mak | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config/config-x86_64.mak b/config/config-x86_64.mak
index d69252f..06b2581 100644
--- a/config/config-x86_64.mak
+++ b/config/config-x86_64.mak
@@ -1,7 +1,7 @@
 cstart.o = $(TEST_DIR)/cstart64.o
 bits = 64
 ldarch = elf64-x86-64
-CFLAGS += -D__x86_64__
+CFLAGS += -D__x86_64__ -mno-red-zone
 
 tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
 	  $(TEST_DIR)/emulator.flat $(TEST_DIR)/idt_test.flat \
-- 
1.8.1.1.298.ge7eed54

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH] Add -mno-red-zone to CFLAGS for x86-64
  2014-07-04 11:46                       ` [PATCH] Add -mno-red-zone to CFLAGS for x86-64 Jan Kiszka
@ 2014-07-04 11:49                         ` Paolo Bonzini
  0 siblings, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2014-07-04 11:49 UTC (permalink / raw)
  To: Jan Kiszka, Bandan Das; +Cc: kvm

Il 04/07/2014 13:46, Jan Kiszka ha scritto:
> This is required in order to use the stack in inline assembly (like
> pushf; pop reg) without clashing with the compiler's stack assignment.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Thanks, applying this.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-04  6:17     ` [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race Wanpeng Li
  2014-07-04  7:21       ` Jan Kiszka
@ 2014-07-07  0:56       ` Bandan Das
  2014-07-07  8:46         ` Wanpeng Li
  1 sibling, 1 reply; 43+ messages in thread
From: Bandan Das @ 2014-07-07  0:56 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Jan Kiszka, Paolo Bonzini, Gleb Natapov, Hu Robert, kvm, linux-kernel

Wanpeng Li <wanpeng.li@linux.intel.com> writes:

> On Thu, Jul 03, 2014 at 01:15:26AM -0400, Bandan Das wrote:
>>Jan Kiszka <jan.kiszka@siemens.com> writes:
>>
>>> On 2014-07-02 08:54, Wanpeng Li wrote:
>>>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>>>> 
>>>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>>>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>>>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>>>> there is no still-pending event to L1 which blocked by nested_run_pending. 
>>>> There is a race which lead to an interrupt will be injected to L2 which 
>>>> belong to L1 if L0 send an interrupt to L1 during this window. 
>>>> 
>>>>                VCPU0                               another thread 
>>>> 
>>>> L1 intr not blocked on L2 first entry
>>>> vmx_vcpu_run req event 
>>>> kvm check request req event 
>>>> check_nested_events don't have any intr 
>>>> not nested exit 
>>>>                                             intr occur (8254, lapic timer etc)
>>>> inject_pending_event now have intr 
>>>> inject interrupt 
>>>> 
>>>> This patch fix this race by introduced a l1_events_blocked field in nested_vmx 
>>>> which indicates there is still-pending event which blocked by nested_run_pending, 
>>>> and smart request a KVM_REQ_EVENT if there is a still-pending event which blocked 
>>>> by nested_run_pending.
>>>
>>> There are more, unrelated reasons why KVM_REQ_EVENT could be set. Why
>>> aren't those able to trigger this scenario?
>>>
>>> In any case, unconditionally setting KVM_REQ_EVENT seems strange and
>>> should be changed.
>>
>>
>>Ugh! I think I am hitting another one but this one's probably because 
>>we are not setting KVM_REQ_EVENT for something we should.
>>
>>Before this patch, I was able to hit this bug everytime with 
>>"modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0" and then booting 
>>L2. I can verify that I was indeed hitting the race in inject_pending_event.
>>
>>After this patch, I believe I am hitting another bug - this happens 
>>after I boot L2, as above, and then start a Linux kernel compilation
>>and then wait and watch :) It's a pain to debug because this happens
>>almost once in three times; it never happens if I run with ept=1, however,
>>I think that's only because the test completes sooner. But I can confirm
>>that I don't see it if I always set REQ_EVENT if nested_run_pending is set instead of
>>the approach this patch takes.
>>(Any debug hints help appreciated!)
>>
>>So, I am not sure if this is the right fix. Rather, I think the safer thing
>>to do is to have the interrupt pending check for injection into L1 at
>>the "same site" as the call to kvm_queue_interrupt() just like we had before 
>>commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
>>nested events checks together ?
>>
>
> How about revert commit b6b8a1451 and try if the bug which you mentioned
> is still there?

Sorry, didn't get time at all to look at this over the weekend but thought of 
putting down what I have so far..

So, as mentioned in http://www.spinics.net/linux/lists/kvm/msg105316.html,
I have two tests - one is just booting up L2 with enable_shadow_vmcs=0 and 
ept=0 and the other is compiling the Linux kernel in L2.

Starting *without* your patch, let's apply this change -
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..c28730d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5887,6 +5887,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
 			kvm_x86_ops->set_nmi(vcpu);
 		}
 	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
+		WARN_ON(is_guest_mode(vcpu));
 		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
 			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
 					    false);

This will trigger a warning if we encounter a race (IIUC). Now, when booting L2,
sure enough, I encounter the following in L0. Also, L2 hangs, so the next test
(compiling the kernel) is not applicable anymore.
[139132.361063] Call Trace:
[139132.361070]  [<ffffffff816c0d31>] dump_stack+0x45/0x56
[139132.361075]  [<ffffffff81084a7d>] warn_slowpath_common+0x7d/0xa0
[139132.361077]  [<ffffffff81084b5a>] warn_slowpath_null+0x1a/0x20
[139132.361093]  [<ffffffffa0437697>] kvm_arch_vcpu_ioctl_run+0xf77/0x1130 [kvm]
[139132.361100]  [<ffffffffa04331ee>] ? kvm_arch_vcpu_load+0x4e/0x1e0 [kvm]
[139132.361106]  [<ffffffffa0421bf2>] kvm_vcpu_ioctl+0x2b2/0x590 [kvm]
[139132.361109]  [<ffffffff811eca08>] do_vfs_ioctl+0x2d8/0x4b0
[139132.361111]  [<ffffffff811ecc61>] SyS_ioctl+0x81/0xa0
[139132.361115]  [<ffffffff81114fd6>] ? __audit_syscall_exit+0x1f6/0x2a0
[139132.361118]  [<ffffffff816c7ee9>] system_call_fastpath+0x16/0x1b

The next step is to apply this change -
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..432aa25 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5887,6 +5887,12 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
 			kvm_x86_ops->set_nmi(vcpu);
 		}
 	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
+		if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) {
+			r = kvm_x86_ops->check_nested_events(vcpu, req_int_win);
+			if (r != 0)
+				return r;
+		}
+		WARN_ON(is_guest_mode(vcpu));
 		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
 			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
 					    false);

This will presumably avoid the race (assuming only interrupts) in all 
cases.

And, sure enough, booting up L2 comes up fine. The next test compiling the 
kernel goes fine too.

Finally, let's apply your patch on top of these changes. With your change, L2
boots up fine, and when compiling the kernel in L2, I finally encounter a 
hang after some time. (In my last test it took around 22 minutes and I was 
compiling a kernel with everything enabled). The WARN() that we added doesn't
get hit, so it doesn't seem like the same race.

The only thing I can think of at this point is that since this patch 
sets REQ_EVENT only for certain conditions, it's exposing a bug for a certain 
event which apparently, setting REQ_EVENT for all cases hides. Note that 
I do think this patch is doing the right thing, but it's just exposing another 
bug somewhere else :)

Bandan

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-07  0:56       ` Bandan Das
@ 2014-07-07  8:46         ` Wanpeng Li
  2014-07-07 13:03           ` Paolo Bonzini
  0 siblings, 1 reply; 43+ messages in thread
From: Wanpeng Li @ 2014-07-07  8:46 UTC (permalink / raw)
  To: Bandan Das, Paolo Bonzini
  Cc: Jan Kiszka, Gleb Natapov, Hu Robert, kvm, linux-kernel

On Sun, Jul 06, 2014 at 08:56:07PM -0400, Bandan Das wrote:
[...]
>>
>> How about revert commit b6b8a1451 and try if the bug which you mentioned
>> is still there?
>
>Sorry, didn't get time at all to look at this over the weekend but thought of 
>putting down what I have so far..
>
>So, as mentioned in http://www.spinics.net/linux/lists/kvm/msg105316.html,
>I have two tests - one is just booting up L2 with enable_shadow_vmcs=0 and 
>ept=0 and the other is compiling the Linux kernel in L2.
>
>Starting *without* your patch, let's apply this change -
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index f32a025..c28730d 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -5887,6 +5887,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
> 			kvm_x86_ops->set_nmi(vcpu);
> 		}
> 	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
>+		WARN_ON(is_guest_mode(vcpu));
> 		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
> 			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
> 					    false);
>
>This will trigger a warning if we encounter a race (IIUC). Now, when booting L2,
>sure enough, I encounter the following in L0. Also, L2 hangs, so the next test
>(compiling the kernel) is not applicable anymore.
>[139132.361063] Call Trace:
>[139132.361070]  [<ffffffff816c0d31>] dump_stack+0x45/0x56
>[139132.361075]  [<ffffffff81084a7d>] warn_slowpath_common+0x7d/0xa0
>[139132.361077]  [<ffffffff81084b5a>] warn_slowpath_null+0x1a/0x20
>[139132.361093]  [<ffffffffa0437697>] kvm_arch_vcpu_ioctl_run+0xf77/0x1130 [kvm]
>[139132.361100]  [<ffffffffa04331ee>] ? kvm_arch_vcpu_load+0x4e/0x1e0 [kvm]
>[139132.361106]  [<ffffffffa0421bf2>] kvm_vcpu_ioctl+0x2b2/0x590 [kvm]
>[139132.361109]  [<ffffffff811eca08>] do_vfs_ioctl+0x2d8/0x4b0
>[139132.361111]  [<ffffffff811ecc61>] SyS_ioctl+0x81/0xa0
>[139132.361115]  [<ffffffff81114fd6>] ? __audit_syscall_exit+0x1f6/0x2a0
>[139132.361118]  [<ffffffff816c7ee9>] system_call_fastpath+0x16/0x1b
>
>The next step is to apply this change -
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index f32a025..432aa25 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -5887,6 +5887,12 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
> 			kvm_x86_ops->set_nmi(vcpu);
> 		}
> 	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
>+		if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) {
>+			r = kvm_x86_ops->check_nested_events(vcpu, req_int_win);
>+			if (r != 0)
>+				return r;
>+		}
>+		WARN_ON(is_guest_mode(vcpu));
> 		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
> 			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
> 					    false);
>
>This will presumably avoid the race (assuming only interrupts) in all 
>cases.
>
>And, sure enough, booting up L2 comes up fine. The next test compiling the 
>kernel goes fine too.
>
>Finally, let's apply your patch on top of these changes. With your change, L2
>boots up fine, and when compiling the kernel in L2, I finally encounter a 
>hang after some time. (In my last test it took around 22 minutes and I was 
>compiling a kernel with everything enabled). The WARN() that we added doesn't
>get hit, so it doesn't seem like the same race.

Agreed.

>
>The only thing I can think of at this point is that since this patch 
>sets REQ_EVENT only for certain conditions, it's exposing a bug for a certain 
>event which apparently, setting REQ_EVENT for all cases hides. Note that 
>I do think this patch is doing the right thing, but it's just exposing another 
>bug somewhere else :)

Agreed. 

Hi Paolo, 

Is it ok for you to apply this patch and then more effort should be taken
to figure out the other bug which don't have any relationship with the race 
that this patch fixed?

Regards,
Wanpeng Li 


>
>Bandan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-07  8:46         ` Wanpeng Li
@ 2014-07-07 13:03           ` Paolo Bonzini
  2014-07-07 17:31             ` Bandan Das
  2014-07-07 23:38             ` Wanpeng Li
  0 siblings, 2 replies; 43+ messages in thread
From: Paolo Bonzini @ 2014-07-07 13:03 UTC (permalink / raw)
  To: Wanpeng Li, Bandan Das
  Cc: Jan Kiszka, Gleb Natapov, Hu Robert, kvm, linux-kernel

Il 07/07/2014 10:46, Wanpeng Li ha scritto:
> Hi Paolo,
>
> Is it ok for you to apply this patch and then more effort should be taken
> to figure out the other bug which don't have any relationship with the race
> that this patch fixed?

Which patch?  Yours or Bandan's?

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-07 13:03           ` Paolo Bonzini
@ 2014-07-07 17:31             ` Bandan Das
  2014-07-07 17:34               ` Paolo Bonzini
  2014-07-07 23:38             ` Wanpeng Li
  1 sibling, 1 reply; 43+ messages in thread
From: Bandan Das @ 2014-07-07 17:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, Jan Kiszka, Gleb Natapov, Hu Robert, kvm, linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 07/07/2014 10:46, Wanpeng Li ha scritto:
>> Hi Paolo,
>>
>> Is it ok for you to apply this patch and then more effort should be taken
>> to figure out the other bug which don't have any relationship with the race
>> that this patch fixed?
>
> Which patch?  Yours or Bandan's?
Why don't we hold off on Wanpeng's patch and instead apply the one I proposed
to call check_nested_events() when checking for interrupt in inject_pending_event() ?

I think that will take care of https://bugzilla.kernel.org/show_bug.cgi?id=72381 
too. Once, we figure out what's causing hangs under certain conditions with his 
patch, we can apply that and revert this change.


> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-07 17:31             ` Bandan Das
@ 2014-07-07 17:34               ` Paolo Bonzini
  2014-07-07 17:38                 ` Bandan Das
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2014-07-07 17:34 UTC (permalink / raw)
  To: Bandan Das
  Cc: Wanpeng Li, Jan Kiszka, Gleb Natapov, Hu Robert, kvm, linux-kernel

Il 07/07/2014 19:31, Bandan Das ha scritto:
>> >
>> > Which patch?  Yours or Bandan's?
> Why don't we hold off on Wanpeng's patch and instead apply the one I proposed
> to call check_nested_events() when checking for interrupt in inject_pending_event() ?

Exactly, yours seemed better to apply as a quick regression fix.

Can you post it as a toplevel patch, so that the commit message explains 
what's happening?  Perhaps add a comment in the code as well.

Paolo

> I think that will take care of https://bugzilla.kernel.org/show_bug.cgi?id=72381
> too. Once, we figure out what's causing hangs under certain conditions with his
> patch, we can apply that and revert this change.
>
>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-07 17:34               ` Paolo Bonzini
@ 2014-07-07 17:38                 ` Bandan Das
  2014-07-07 23:14                   ` Wanpeng Li
  0 siblings, 1 reply; 43+ messages in thread
From: Bandan Das @ 2014-07-07 17:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, Jan Kiszka, Gleb Natapov, Hu Robert, kvm, linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 07/07/2014 19:31, Bandan Das ha scritto:
>>> >
>>> > Which patch?  Yours or Bandan's?
>> Why don't we hold off on Wanpeng's patch and instead apply the one I proposed
>> to call check_nested_events() when checking for interrupt in inject_pending_event() ?
>
> Exactly, yours seemed better to apply as a quick regression fix.
>
> Can you post it as a toplevel patch, so that the commit message
> explains what's happening?  Perhaps add a comment in the code as well.

Ok, will do, thanks!

> Paolo
>
>> I think that will take care of https://bugzilla.kernel.org/show_bug.cgi?id=72381
>> too. Once, we figure out what's causing hangs under certain conditions with his
>> patch, we can apply that and revert this change.
>>
>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-07 17:38                 ` Bandan Das
@ 2014-07-07 23:14                   ` Wanpeng Li
  2014-07-08  4:35                     ` Bandan Das
  0 siblings, 1 reply; 43+ messages in thread
From: Wanpeng Li @ 2014-07-07 23:14 UTC (permalink / raw)
  To: Bandan Das
  Cc: Paolo Bonzini, Jan Kiszka, Gleb Natapov, Hu Robert, kvm, linux-kernel

On Mon, Jul 07, 2014 at 01:38:37PM -0400, Bandan Das wrote:
>Paolo Bonzini <pbonzini@redhat.com> writes:
>
>> Il 07/07/2014 19:31, Bandan Das ha scritto:
>>>> >
>>>> > Which patch?  Yours or Bandan's?
>>> Why don't we hold off on Wanpeng's patch and instead apply the one I proposed
>>> to call check_nested_events() when checking for interrupt in inject_pending_event() ?
>>
>> Exactly, yours seemed better to apply as a quick regression fix.
>>
>> Can you post it as a toplevel patch, so that the commit message
>> explains what's happening?  Perhaps add a comment in the code as well.
>
>Ok, will do, thanks!

As Jan metioned in http://www.spinics.net/lists/kvm/msg105238.html, "In any case, 
unconditionally setting KVM_REQ_EVENT seems strange and should be changed." Your 
trick still keep the unconditionally setting KVM_REQ_EVENT which is the root cause 
of the race there, anyway, I focus on fix the hang currently and a patch will be 
submitted soon. 

Regards,
Wanpeng Li 

>
>> Paolo
>>
>>> I think that will take care of https://bugzilla.kernel.org/show_bug.cgi?id=72381
>>> too. Once, we figure out what's causing hangs under certain conditions with his
>>> patch, we can apply that and revert this change.
>>>
>>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-07 13:03           ` Paolo Bonzini
  2014-07-07 17:31             ` Bandan Das
@ 2014-07-07 23:38             ` Wanpeng Li
  2014-07-08  5:49               ` Paolo Bonzini
  1 sibling, 1 reply; 43+ messages in thread
From: Wanpeng Li @ 2014-07-07 23:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Bandan Das, Jan Kiszka, Gleb Natapov, Hu Robert, kvm, linux-kernel

On Mon, Jul 07, 2014 at 03:03:13PM +0200, Paolo Bonzini wrote:
>Il 07/07/2014 10:46, Wanpeng Li ha scritto:
>>Hi Paolo,
>>
>>Is it ok for you to apply this patch and then more effort should be taken
>>to figure out the other bug which don't have any relationship with the race
>>that this patch fixed?
>
>Which patch?  Yours or Bandan's?

Please wait, a patch which fix the hang will be submitted soon.

Regards,
Wanpeng Li 

>
>Paolo
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-07 23:14                   ` Wanpeng Li
@ 2014-07-08  4:35                     ` Bandan Das
  0 siblings, 0 replies; 43+ messages in thread
From: Bandan Das @ 2014-07-08  4:35 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Paolo Bonzini, Jan Kiszka, Gleb Natapov, Hu Robert, kvm, linux-kernel

Wanpeng Li <wanpeng.li@linux.intel.com> writes:
...
>
> As Jan metioned in http://www.spinics.net/lists/kvm/msg105238.html, "In any case, 
> unconditionally setting KVM_REQ_EVENT seems strange and should be changed." Your 
> trick still keep the unconditionally setting KVM_REQ_EVENT which is the root cause 
> of the race there, anyway, I focus on fix the hang currently and a patch will be 
> submitted soon. 

Right, that's the plan. Once you submit an updated fix, we can always revert
this change :)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race
  2014-07-07 23:38             ` Wanpeng Li
@ 2014-07-08  5:49               ` Paolo Bonzini
  0 siblings, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2014-07-08  5:49 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Bandan Das, Jan Kiszka, Gleb Natapov, Hu Robert, kvm, linux-kernel

Il 08/07/2014 01:38, Wanpeng Li ha scritto:
> On Mon, Jul 07, 2014 at 03:03:13PM +0200, Paolo Bonzini wrote:
>> Il 07/07/2014 10:46, Wanpeng Li ha scritto:
>>> Hi Paolo,
>>>
>>> Is it ok for you to apply this patch and then more effort should be taken
>>> to figure out the other bug which don't have any relationship with the race
>>> that this patch fixed?
>>
>> Which patch?  Yours or Bandan's?
>
> Please wait, a patch which fix the hang will be submitted soon.

This is a regression, so I think the right thing to do is to apply 
Bandan's patch to 3.16 and yours to 3.17.

Paolo


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2014-07-08  5:49 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-02  6:54 [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race Wanpeng Li
2014-07-02  7:20 ` Hu, Robert
2014-07-02  9:03   ` Jan Kiszka
2014-07-02  9:13     ` Hu, Robert
2014-07-02  9:16       ` Jan Kiszka
2014-07-02  9:01 ` Jan Kiszka
2014-07-03  2:59   ` Wanpeng Li
2014-07-03  5:15   ` Bandan Das
2014-07-03  6:59     ` Wanpeng Li
2014-07-03 17:27       ` Bandan Das
2014-07-04  2:52         ` Wanpeng Li
2014-07-04  5:43           ` Jan Kiszka
2014-07-04  6:08             ` Wanpeng Li
2014-07-04  7:19               ` Jan Kiszka
2014-07-04  7:39                 ` Wanpeng Li
2014-07-04  7:46                   ` Paolo Bonzini
2014-07-04  7:59                     ` Wanpeng Li
2014-07-04  8:14                       ` Paolo Bonzini
2014-07-04  8:24                         ` Wanpeng Li
2014-07-04  7:42             ` Paolo Bonzini
2014-07-04  9:33             ` Jan Kiszka
2014-07-04  9:38               ` Paolo Bonzini
2014-07-04 10:52                 ` Jan Kiszka
2014-07-04 11:07                   ` Jan Kiszka
2014-07-04 11:28                     ` Paolo Bonzini
2014-07-04 11:46                       ` [PATCH] Add -mno-red-zone to CFLAGS for x86-64 Jan Kiszka
2014-07-04 11:49                         ` Paolo Bonzini
2014-07-04  6:17     ` [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race Wanpeng Li
2014-07-04  7:21       ` Jan Kiszka
2014-07-07  0:56       ` Bandan Das
2014-07-07  8:46         ` Wanpeng Li
2014-07-07 13:03           ` Paolo Bonzini
2014-07-07 17:31             ` Bandan Das
2014-07-07 17:34               ` Paolo Bonzini
2014-07-07 17:38                 ` Bandan Das
2014-07-07 23:14                   ` Wanpeng Li
2014-07-08  4:35                     ` Bandan Das
2014-07-07 23:38             ` Wanpeng Li
2014-07-08  5:49               ` Paolo Bonzini
2014-07-02 16:27 ` Bandan Das
2014-07-03  5:11   ` Wanpeng Li
2014-07-03  5:29     ` Bandan Das
2014-07-03  7:33       ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.