All of lore.kernel.org
 help / color / mirror / Atom feed
* PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
@ 2017-02-07 17:26 anshul makkar
       [not found] ` <9fb45906-bfef-d28a-7a3a-de20b4a77aa7@citrix.com>
  2017-02-08 15:32 ` Jan Beulich
  0 siblings, 2 replies; 10+ messages in thread
From: anshul makkar @ 2017-02-07 17:26 UTC (permalink / raw)
  To: xen-devel; +Cc: kai.huang

Hi,

Facing a issue where bootstorm of guests leads to host crash. I debugged 
and found that that enabling PML  introduces a  race condition during 
guest teardown stage while disabling PML on a vcpu  and context switch 
happening for another vcpu.

Crash happens only on Broadwell processors as PML got introduced in this 
generation.

Here is my analysis:

Race condition:

vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( 
disable_PML); vmx_vmcx_exit();)

In between I have a callpath from another pcpu executing context 
switch-> vmx_fpu_leave() and crash on vmwrite..

   if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) )
{
          v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS;
          __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);  //crash
      }

Debug logs
XEN) [221256.749928] VMWRITE VMCS Invalid !!!!!
(XEN) [221256.754870] **[00] { now 0000c93b4341df1d, hw 
00000035fffea000, op 00000035fffea000 } vmclear
(XEN) [221256.765052] ** frames [ ffff82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]

(XEN) [221256.773969] **[01] { now 0000c93b4341e099, hw 
ffffffffffffffff, op 00000035fffea000 } vmptrld
(XEN) [221256.784150] ** frames [ ffff82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]

(XEN) [221256.792197] **[02] { now 0000c93b4341e1f1, hw 
00000035fffea000, op 00000035fffea000 } vmclear
(XEN) [221256.802378] ** frames [ ffff82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]

(XEN) [221256.811298] **[03] { now 0000c93b5784dd0a, hw 
ffffffffffffffff, op 00000039d7074000 } vmptrld
(XEN) [221256.821478] ** frames [ ffff82d0801f4c31 
vmx_do_resume+0x51/0x150 ]

(XEN) [221256.829139] **[04] { now 0000c93b59d67b5b, hw 
00000039d7074000, op 0000002b9a575000 } vmptrld
(XEN) [221256.839320] ** frames [ ffff82d0801f4c31 
vmx_do_resume+0x51/0x150 ]

(XEN) [221256.882850] **[07] { now 0000c93b59e71e48, hw 
0000002b9a575000, op 00000039d7074000 } vmptrld
(XEN) [221256.893034] ** frames [ ffff82d0801f4d13 
vmx_do_resume+0x133/0x150 ]

(XEN) [221256.900790] **[08] { now 0000c93b59e78675, hw 
00000039d7074000, op 00000040077ae000 } vmptrld
(XEN) [221256.910968] ** frames [ ffff82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]

(XEN) [221256.919015] **[09] { now 0000c93b59e78ac8, hw 
00000040077ae000, op 00000040077ae000 } vmclear
(XEN) [221256.929196] ** frames [ ffff82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]

(XEN) [221256.938117] **[10] { now 0000c93b59e78d72, hw 
ffffffffffffffff, op 00000040077ae000 } vmptrld
(XEN) [221256.948297] ** frames [ ffff82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]

(XEN) [221256.956345] **[11] { now 0000c93b59e78ff0, hw 
00000040077ae000, op 00000040077ae000 } vmclear
(XEN) [221256.966525] ** frames [ ffff82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]

(XEN) [221256.975445] **[12] { now 0000c93b59e7deda, hw 
ffffffffffffffff, op 00000040077b3000 } vmptrld
(XEN) [221256.985626] ** frames [ ffff82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]

(XEN) [221256.993672] **[13] { now 0000c93b59e9fe00, hw 
00000040077b3000, op 00000040077b3000 } vmclear
(XEN) [221257.003852] ** frames [ ffff82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]

(XEN) [221257.012772] **[14] { now 0000c93b59ea007e, hw 
ffffffffffffffff, op 00000040077b3000 } vmptrld
(XEN) [221257.022952] ** frames [ ffff82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]

(XEN) [221257.031000] **[15] { now 0000c93b59ea02ba, hw 
00000040077b3000, op 00000040077b3000 } vmclear
(XEN) [221257.041180] ** frames [ ffff82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]

(XEN) [221257.050101]  ....
(XEN) [221257.053008] vmcs_ptr:0xffffffffffffffff, vcpu->vmcs:0x2b9a575000


vmcs is loaded and between the next call to vm_write, there is a clear 
of vmcs caused by vmx_vcpu_disable_pml.

Above log highlights that IPI is clearing the vmcs in between vmptrld 
and vmwrite  but I also verified that interrupts are disabled during 
context switch and execution of vm_write in vmx_fpu_leave.. This has got 
me confused.

Also, I am not sure if I understand the handling of foreign_vmcs 
correctly, which can also be the cause of the race.

Please if you can share some suggestions here.


Thanks

Anshul Makkar







_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
       [not found] ` <9fb45906-bfef-d28a-7a3a-de20b4a77aa7@citrix.com>
@ 2017-02-08 14:58   ` anshul makkar
  2017-02-08 15:37     ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: anshul makkar @ 2017-02-08 14:58 UTC (permalink / raw)
  To: xen-devel; +Cc: kai.huang, Tian, Kevin

Hi,

Code is trying to destroy multiple vcpus held by the domain: 
complete_domain_destroy->hvm_vcpu_destroy() for each vcpu.

In vmx_vcpu_destroy, we have a call for vmx_vcpu_disable_pml which leads 
to a race while destroying foreign vcpu's with other domains rebooting 
on the same vcpus .

With a single domain reboot, no race is observed.

Commit e18d4274772e52ac81fda1acb246d11ef666e5fe causes this race condition.

Anshul


On 07/02/17 17:58, anshul makkar wrote:
> Hi, Sorry, forgot to include you in cc list.
>
> Anshul
>
>
> On 07/02/17 17:26, anshul makkar wrote:
>> Hi,
>>
>> Facing a issue where bootstorm of guests leads to host crash. I 
>> debugged and found that that enabling PML  introduces a  race 
>> condition during guest teardown stage while disabling PML on a vcpu  
>> and context switch happening for another vcpu.
>>
>> Crash happens only on Broadwell processors as PML got introduced in 
>> this generation.
>>
>> Here is my analysis:
>>
>> Race condition:
>>
>> vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( 
>> disable_PML); vmx_vmcx_exit();)
>>
>> In between I have a callpath from another pcpu executing context 
>> switch-> vmx_fpu_leave() and crash on vmwrite..
>>
>>   if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) )
>> {
>>          v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS;
>>          __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); //crash
>>      }
>>
>> Debug logs
>> XEN) [221256.749928] VMWRITE VMCS Invalid !!!!!
>> (XEN) [221256.754870] **[00] { now 0000c93b4341df1d, hw 
>> 00000035fffea000, op 00000035fffea000 } vmclear
>> (XEN) [221256.765052] ** frames [ ffff82d080134652 
>> smp_call_function_interrupt+0x92/0xa0 ]
>>
>> (XEN) [221256.773969] **[01] { now 0000c93b4341e099, hw 
>> ffffffffffffffff, op 00000035fffea000 } vmptrld
>> (XEN) [221256.784150] ** frames [ ffff82d0801f0765 
>> vmx_vmcs_try_enter+0x95/0xb0 ]
>>
>> (XEN) [221256.792197] **[02] { now 0000c93b4341e1f1, hw 
>> 00000035fffea000, op 00000035fffea000 } vmclear
>> (XEN) [221256.802378] ** frames [ ffff82d080134652 
>> smp_call_function_interrupt+0x92/0xa0 ]
>>
>> (XEN) [221256.811298] **[03] { now 0000c93b5784dd0a, hw 
>> ffffffffffffffff, op 00000039d7074000 } vmptrld
>> (XEN) [221256.821478] ** frames [ ffff82d0801f4c31 
>> vmx_do_resume+0x51/0x150 ]
>>
>> (XEN) [221256.829139] **[04] { now 0000c93b59d67b5b, hw 
>> 00000039d7074000, op 0000002b9a575000 } vmptrld
>> (XEN) [221256.839320] ** frames [ ffff82d0801f4c31 
>> vmx_do_resume+0x51/0x150 ]
>>
>> (XEN) [221256.882850] **[07] { now 0000c93b59e71e48, hw 
>> 0000002b9a575000, op 00000039d7074000 } vmptrld
>> (XEN) [221256.893034] ** frames [ ffff82d0801f4d13 
>> vmx_do_resume+0x133/0x150 ]
>>
>> (XEN) [221256.900790] **[08] { now 0000c93b59e78675, hw 
>> 00000039d7074000, op 00000040077ae000 } vmptrld
>> (XEN) [221256.910968] ** frames [ ffff82d0801f0765 
>> vmx_vmcs_try_enter+0x95/0xb0 ]
>>
>> (XEN) [221256.919015] **[09] { now 0000c93b59e78ac8, hw 
>> 00000040077ae000, op 00000040077ae000 } vmclear
>> (XEN) [221256.929196] ** frames [ ffff82d080134652 
>> smp_call_function_interrupt+0x92/0xa0 ]
>>
>> (XEN) [221256.938117] **[10] { now 0000c93b59e78d72, hw 
>> ffffffffffffffff, op 00000040077ae000 } vmptrld
>> (XEN) [221256.948297] ** frames [ ffff82d0801f0765 
>> vmx_vmcs_try_enter+0x95/0xb0 ]
>>
>> (XEN) [221256.956345] **[11] { now 0000c93b59e78ff0, hw 
>> 00000040077ae000, op 00000040077ae000 } vmclear
>> (XEN) [221256.966525] ** frames [ ffff82d080134652 
>> smp_call_function_interrupt+0x92/0xa0 ]
>>
>> (XEN) [221256.975445] **[12] { now 0000c93b59e7deda, hw 
>> ffffffffffffffff, op 00000040077b3000 } vmptrld
>> (XEN) [221256.985626] ** frames [ ffff82d0801f0765 
>> vmx_vmcs_try_enter+0x95/0xb0 ]
>>
>> (XEN) [221256.993672] **[13] { now 0000c93b59e9fe00, hw 
>> 00000040077b3000, op 00000040077b3000 } vmclear
>> (XEN) [221257.003852] ** frames [ ffff82d080134652 
>> smp_call_function_interrupt+0x92/0xa0 ]
>>
>> (XEN) [221257.012772] **[14] { now 0000c93b59ea007e, hw 
>> ffffffffffffffff, op 00000040077b3000 } vmptrld
>> (XEN) [221257.022952] ** frames [ ffff82d0801f0765 
>> vmx_vmcs_try_enter+0x95/0xb0 ]
>>
>> (XEN) [221257.031000] **[15] { now 0000c93b59ea02ba, hw 
>> 00000040077b3000, op 00000040077b3000 } vmclear
>> (XEN) [221257.041180] ** frames [ ffff82d080134652 
>> smp_call_function_interrupt+0x92/0xa0 ]
>>
>> (XEN) [221257.050101]  ....
>> (XEN) [221257.053008] vmcs_ptr:0xffffffffffffffff, 
>> vcpu->vmcs:0x2b9a575000
>>
>>
>> vmcs is loaded and between the next call to vm_write, there is a 
>> clear of vmcs caused by vmx_vcpu_disable_pml.
>>
>> Above log highlights that IPI is clearing the vmcs in between vmptrld 
>> and vmwrite  but I also verified that interrupts are disabled during 
>> context switch and execution of vm_write in vmx_fpu_leave.. This has 
>> got me confused.
>>
>> Also, I am not sure if I understand the handling of foreign_vmcs 
>> correctly, which can also be the cause of the race.
>>
>> Please if you can share some suggestions here.
>>
>>
>> Thanks
>>
>> Anshul Makkar
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
  2017-02-07 17:26 PML causing race condition during guest bootstorm and host crash on Broadwell cpu anshul makkar
       [not found] ` <9fb45906-bfef-d28a-7a3a-de20b4a77aa7@citrix.com>
@ 2017-02-08 15:32 ` Jan Beulich
  2017-02-09 15:08   ` Jan Beulich
  2017-02-09 16:22   ` Jan Beulich
  1 sibling, 2 replies; 10+ messages in thread
From: Jan Beulich @ 2017-02-08 15:32 UTC (permalink / raw)
  To: anshul makkar; +Cc: kai.huang, Sergey Dyasli, xen-devel

>>> On 07.02.17 at 18:26, <anshul.makkar@citrix.com> wrote:
> Facing a issue where bootstorm of guests leads to host crash. I debugged 
> and found that that enabling PML  introduces a  race condition during 
> guest teardown stage while disabling PML on a vcpu  and context switch 
> happening for another vcpu.
> 
> Crash happens only on Broadwell processors as PML got introduced in this 
> generation.
> 
> Here is my analysis:
> 
> Race condition:
> 
> vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( 
> disable_PML); vmx_vmcx_exit();)
> 
> In between I have a callpath from another pcpu executing context 
> switch-> vmx_fpu_leave() and crash on vmwrite..
> 
>    if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) )
> {
>           v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS;
>           __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);  //crash
>       }

So that's after current has changed already, so it's effectively
dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter().
The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(),
however, assumes that any user of a VMCS either owns the lock
or has current as the owner of the VMCS. Of course such a call
also can't be added here, as a vcpu on the context-switch-from
path can't vcpu_pause() itself.

That taken together with the bypassing of __context_switch()
when the incoming vCPU is the idle one (which means that via
context_saved() ->is_running will be cleared before running
->ctxt_switch_from()), the vcpu_pause() invocation in
vmx_vmcs_try_enter() may not have to wait at all if the call
happens between the clearing of ->is_running and the
eventual invocation of vmx_ctxt_switch_from().

If the above makes sense (which I'm not sure at all), the
question is whether using this_cpu(curr_vcpu) instead of
current in the VMCS enter/exit functions would help.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
  2017-02-08 14:58   ` anshul makkar
@ 2017-02-08 15:37     ` Jan Beulich
  2017-02-08 15:46       ` anshul makkar
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2017-02-08 15:37 UTC (permalink / raw)
  To: anshul makkar; +Cc: kai.huang, xen-devel, Kevin Tian

>>> On 08.02.17 at 15:58, <anshul.makkar@citrix.com> wrote:
> Code is trying to destroy multiple vcpus held by the domain: 
> complete_domain_destroy->hvm_vcpu_destroy() for each vcpu.
> 
> In vmx_vcpu_destroy, we have a call for vmx_vcpu_disable_pml which leads 
> to a race while destroying foreign vcpu's with other domains rebooting 
> on the same vcpus .
> 
> With a single domain reboot, no race is observed.
> 
> Commit e18d4274772e52ac81fda1acb246d11ef666e5fe causes this race condition.

Causes or exposes? I ask because the similar reports we have are
all on 4.4.x, i.e. no PML code there at all.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
  2017-02-08 15:37     ` Jan Beulich
@ 2017-02-08 15:46       ` anshul makkar
  0 siblings, 0 replies; 10+ messages in thread
From: anshul makkar @ 2017-02-08 15:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: kai.huang, xen-devel, Kevin Tian

May be "causes". But not sure, as disabling this feature solves the 
issue (no hypervisor crash) and we have reports that the same code base 
works fine on Haswell (again no hardware support for PML in Haswell, but 
code base is same. But inherently it leads to non-execution of PML code 
path, so no crash)

Anshul


On 08/02/17 15:37, Jan Beulich wrote:
>>>> On 08.02.17 at 15:58, <anshul.makkar@citrix.com> wrote:
>> Code is trying to destroy multiple vcpus held by the domain:
>> complete_domain_destroy->hvm_vcpu_destroy() for each vcpu.
>>
>> In vmx_vcpu_destroy, we have a call for vmx_vcpu_disable_pml which leads
>> to a race while destroying foreign vcpu's with other domains rebooting
>> on the same vcpus .
>>
>> With a single domain reboot, no race is observed.
>>
>> Commit e18d4274772e52ac81fda1acb246d11ef666e5fe causes this race condition.
> Causes or exposes? I ask because the similar reports we have are
> all on 4.4.x, i.e. no PML code there at all.
>
> Jan
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
  2017-02-08 15:32 ` Jan Beulich
@ 2017-02-09 15:08   ` Jan Beulich
  2017-02-09 16:22   ` Jan Beulich
  1 sibling, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2017-02-09 15:08 UTC (permalink / raw)
  To: anshul makkar; +Cc: kai.huang, Sergey Dyasli, xen-devel

>>> On 08.02.17 at 16:32, <JBeulich@suse.com> wrote:
>>>> On 07.02.17 at 18:26, <anshul.makkar@citrix.com> wrote:
>> Facing a issue where bootstorm of guests leads to host crash. I debugged 
>> and found that that enabling PML  introduces a  race condition during 
>> guest teardown stage while disabling PML on a vcpu  and context switch 
>> happening for another vcpu.
>> 
>> Crash happens only on Broadwell processors as PML got introduced in this 
>> generation.
>> 
>> Here is my analysis:
>> 
>> Race condition:
>> 
>> vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( 
>> disable_PML); vmx_vmcx_exit();)
>> 
>> In between I have a callpath from another pcpu executing context 
>> switch-> vmx_fpu_leave() and crash on vmwrite..
>> 
>>    if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) )
>> {
>>           v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS;
>>           __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);  //crash
>>       }
> 
> So that's after current has changed already, so it's effectively
> dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter().
> The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(),
> however, assumes that any user of a VMCS either owns the lock
> or has current as the owner of the VMCS. Of course such a call
> also can't be added here, as a vcpu on the context-switch-from
> path can't vcpu_pause() itself.
> 
> That taken together with the bypassing of __context_switch()
> when the incoming vCPU is the idle one (which means that via
> context_saved() ->is_running will be cleared before running
> ->ctxt_switch_from()), the vcpu_pause() invocation in
> vmx_vmcs_try_enter() may not have to wait at all if the call
> happens between the clearing of ->is_running and the
> eventual invocation of vmx_ctxt_switch_from().
> 
> If the above makes sense (which I'm not sure at all), the
> question is whether using this_cpu(curr_vcpu) instead of
> current in the VMCS enter/exit functions would help.

This won't help, as it won't make vcpu_pause() wait (which is the
core of the problem).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
  2017-02-08 15:32 ` Jan Beulich
  2017-02-09 15:08   ` Jan Beulich
@ 2017-02-09 16:22   ` Jan Beulich
  2017-02-13 16:32     ` anshul makkar
  1 sibling, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2017-02-09 16:22 UTC (permalink / raw)
  To: anshul makkar, Sergey Dyasli; +Cc: kai.huang, xen-devel

[-- Attachment #1: Type: text/plain, Size: 2013 bytes --]

>>> On 08.02.17 at 16:32, <JBeulich@suse.com> wrote:
>>>> On 07.02.17 at 18:26, <anshul.makkar@citrix.com> wrote:
>> Facing a issue where bootstorm of guests leads to host crash. I debugged 
>> and found that that enabling PML  introduces a  race condition during 
>> guest teardown stage while disabling PML on a vcpu  and context switch 
>> happening for another vcpu.
>> 
>> Crash happens only on Broadwell processors as PML got introduced in this 
>> generation.
>> 
>> Here is my analysis:
>> 
>> Race condition:
>> 
>> vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( 
>> disable_PML); vmx_vmcx_exit();)
>> 
>> In between I have a callpath from another pcpu executing context 
>> switch-> vmx_fpu_leave() and crash on vmwrite..
>> 
>>    if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) )
>> {
>>           v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS;
>>           __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);  //crash
>>       }
> 
> So that's after current has changed already, so it's effectively
> dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter().
> The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(),
> however, assumes that any user of a VMCS either owns the lock
> or has current as the owner of the VMCS. Of course such a call
> also can't be added here, as a vcpu on the context-switch-from
> path can't vcpu_pause() itself.
> 
> That taken together with the bypassing of __context_switch()
> when the incoming vCPU is the idle one (which means that via
> context_saved() ->is_running will be cleared before running
> ->ctxt_switch_from()), the vcpu_pause() invocation in
> vmx_vmcs_try_enter() may not have to wait at all if the call
> happens between the clearing of ->is_running and the
> eventual invocation of vmx_ctxt_switch_from().

Mind giving the attached patch a try (which admittedly was only
lightly tested so far - in particular I haven't seen the second of
the debug messages being logged, yet)?

Jan


[-- Attachment #2: VMX-enter-VMCS-race.patch --]
[-- Type: text/plain, Size: 2681 bytes --]


TODO: remove //temp-s

--- unstable.orig/xen/arch/x86/hvm/vmx/vmcs.c	2016-12-20 11:21:35.000000000 +0100
+++ unstable/xen/arch/x86/hvm/vmx/vmcs.c	2017-02-09 16:56:51.000000000 +0100
@@ -553,6 +553,36 @@ static void vmx_load_vmcs(struct vcpu *v
     local_irq_restore(flags);
 }
 
+void vmx_vmcs_reload(struct vcpu *v)
+{
+static unsigned long cnt1, cnt2, thr1, thr2;//temp
+if(++cnt1 > thr1) {//temp
+ thr1 |= cnt1;
+ printk("%pv/%lx (%pv/%lx)\n", v, PFN_DOWN(v->arch.hvm_vmx.vmcs_pa), current, PFN_DOWN(this_cpu(current_vmcs)));
+}
+    /*
+     * As we're running with interrupts disabled, we can't acquire
+     * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
+     * the VMCS can't be taken away from us anymore if we still own it.
+     */
+    ASSERT(!local_irq_is_enabled());
+    if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) )
+        return;
+if(++cnt2 > thr2) {//temp
+ thr2 |= cnt2;
+ printk("%pv reload (on %pv)\n", v, current);
+}
+    ASSERT(!this_cpu(current_vmcs));
+
+    /*
+     * Wait for the remote side to be done with the VMCS before loading
+     * it here.
+     */
+    while ( v->arch.hvm_vmx.active_cpu != -1 )
+        cpu_relax();
+    vmx_load_vmcs(v);
+}
+
 int vmx_cpu_up_prepare(unsigned int cpu)
 {
     /*
--- unstable.orig/xen/arch/x86/hvm/vmx/vmx.c	2017-01-13 16:00:19.000000000 +0100
+++ unstable/xen/arch/x86/hvm/vmx/vmx.c	2017-02-09 16:56:19.000000000 +0100
@@ -936,6 +936,18 @@ static void vmx_ctxt_switch_from(struct
     if ( unlikely(!this_cpu(vmxon)) )
         return;
 
+    if ( !v->is_running )
+    {
+        /*
+         * When this vCPU isn't marked as running anymore, a remote pCPU's
+         * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
+         * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken
+         * away the VMCS from us. As we're running with interrupts disabled,
+         * we also can't call vmx_vmcs_enter().
+         */
+        vmx_vmcs_reload(v);
+    }
+
     vmx_fpu_leave(v);
     vmx_save_guest_msrs(v);
     vmx_restore_host_msrs();
--- unstable.orig/xen/include/asm-x86/hvm/vmx/vmcs.h	2016-12-20 11:21:35.000000000 +0100
+++ unstable/xen/include/asm-x86/hvm/vmx/vmcs.h	2017-02-09 16:50:53.000000000 +0100
@@ -179,6 +179,7 @@ void vmx_destroy_vmcs(struct vcpu *v);
 void vmx_vmcs_enter(struct vcpu *v);
 bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
 void vmx_vmcs_exit(struct vcpu *v);
+void vmx_vmcs_reload(struct vcpu *v);
 
 #define CPU_BASED_VIRTUAL_INTR_PENDING        0x00000004
 #define CPU_BASED_USE_TSC_OFFSETING           0x00000008

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
  2017-02-09 16:22   ` Jan Beulich
@ 2017-02-13 16:32     ` anshul makkar
  2017-02-13 16:56       ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: anshul makkar @ 2017-02-13 16:32 UTC (permalink / raw)
  To: Jan Beulich, Sergey Dyasli; +Cc: kai.huang, xen-devel

Hi Jan,


On 09/02/17 16:22, Jan Beulich wrote:
>>>> On 08.02.17 at 16:32, <JBeulich@suse.com> wrote:
>>>>> On 07.02.17 at 18:26, <anshul.makkar@citrix.com> wrote:
>>> Facing a issue where bootstorm of guests leads to host crash. I debugged
>>> and found that that enabling PML  introduces a  race condition during
>>> guest teardown stage while disabling PML on a vcpu  and context switch
>>> happening for another vcpu.
>>>
>>> Crash happens only on Broadwell processors as PML got introduced in this
>>> generation.
>>>
>>> Here is my analysis:
>>>
>>> Race condition:
>>>
>>> vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write(
>>> disable_PML); vmx_vmcx_exit();)
>>>
>>> In between I have a callpath from another pcpu executing context
>>> switch-> vmx_fpu_leave() and crash on vmwrite..
>>>
>>>     if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) )
>>> {
>>>            v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS;
>>>            __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);  //crash
>>>        }
>> So that's after current has changed already, so it's effectively
>> dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter().
>> The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(),
>> however, assumes that any user of a VMCS either owns the lock
>> or has current as the owner of the VMCS. Of course such a call
>> also can't be added here, as a vcpu on the context-switch-from
>> path can't vcpu_pause() itself.
>>
>> That taken together with the bypassing of __context_switch()
>> when the incoming vCPU is the idle one (which means that via
>> context_saved() ->is_running will be cleared before running
>> ->ctxt_switch_from()), the vcpu_pause() invocation in
>> vmx_vmcs_try_enter() may not have to wait at all if the call
>> happens between the clearing of ->is_running and the
>> eventual invocation of vmx_ctxt_switch_from().
> Mind giving the attached patch a try (which admittedly was only
> lightly tested so far - in particular I haven't seen the second of
> the debug messages being logged, yet)?
Patch looks promising. I couldn't do much thorough testing, but initial 
reboot cycles (around 20 reboots of 32 VMS) went fine.
>
> Jan
>
Anshul

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
  2017-02-13 16:32     ` anshul makkar
@ 2017-02-13 16:56       ` Jan Beulich
  2017-02-13 17:17         ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2017-02-13 16:56 UTC (permalink / raw)
  To: anshul makkar, Sergey Dyasli; +Cc: kai.huang, xen-devel

>>> On 13.02.17 at 17:32, <anshul.makkar@citrix.com> wrote:
> On 09/02/17 16:22, Jan Beulich wrote:
>> Mind giving the attached patch a try (which admittedly was only
>> lightly tested so far - in particular I haven't seen the second of
>> the debug messages being logged, yet)?
> Patch looks promising. I couldn't do much thorough testing, but initial 
> reboot cycles (around 20 reboots of 32 VMS) went fine.

Thanks. The most interesting part though is whether the 2nd of the
two log messages ever showed up. I any event I'll submit the cleaned
up patch, for more formal discussion of the approach to happen there.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
  2017-02-13 16:56       ` Jan Beulich
@ 2017-02-13 17:17         ` Jan Beulich
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2017-02-13 17:17 UTC (permalink / raw)
  To: anshul makkar, Sergey Dyasli; +Cc: kai.huang, xen-devel

>>> On 13.02.17 at 17:56, <JBeulich@suse.com> wrote:
>>>> On 13.02.17 at 17:32, <anshul.makkar@citrix.com> wrote:
>> On 09/02/17 16:22, Jan Beulich wrote:
>>> Mind giving the attached patch a try (which admittedly was only
>>> lightly tested so far - in particular I haven't seen the second of
>>> the debug messages being logged, yet)?
>> Patch looks promising. I couldn't do much thorough testing, but initial 
>> reboot cycles (around 20 reboots of 32 VMS) went fine.
> 
> Thanks. The most interesting part though is whether the 2nd of the
> two log messages ever showed up. I any event I'll submit the cleaned
> up patch, for more formal discussion of the approach to happen there.

Actually, no, while putting together the commit message I thought
of a second situation which likely would also need addressing: If
we bypass __context_switch() also on the context-switch-in path
(because of scheduling the same vCPU again) the CPU may also
have lost control of the VMCS. There's no context switch hook
though to place the reload invocation in, so I'll first have to think
about what the best model here would be (of course I'd be more
than happy about suggestions).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-02-13 17:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-07 17:26 PML causing race condition during guest bootstorm and host crash on Broadwell cpu anshul makkar
     [not found] ` <9fb45906-bfef-d28a-7a3a-de20b4a77aa7@citrix.com>
2017-02-08 14:58   ` anshul makkar
2017-02-08 15:37     ` Jan Beulich
2017-02-08 15:46       ` anshul makkar
2017-02-08 15:32 ` Jan Beulich
2017-02-09 15:08   ` Jan Beulich
2017-02-09 16:22   ` Jan Beulich
2017-02-13 16:32     ` anshul makkar
2017-02-13 16:56       ` Jan Beulich
2017-02-13 17:17         ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.