* [PATCH 0/2] x86: context switch handling adjustments @ 2017-02-14 10:23 Jan Beulich 2017-02-14 10:28 ` [PATCH 1/2] VMX: fix VMCS race on context-switch paths Jan Beulich 2017-02-14 10:29 ` [PATCH 2/2] x86: package up context switch hook pointers Jan Beulich 0 siblings, 2 replies; 27+ messages in thread From: Jan Beulich @ 2017-02-14 10:23 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper, Kevin Tian, Jun Nakajima 1: VMX: fix VMCS race on context-switch paths 2: x86: package up context switch hook pointers Signed-off-by: Jan Beulich <jbeulich@suse.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-14 10:23 [PATCH 0/2] x86: context switch handling adjustments Jan Beulich @ 2017-02-14 10:28 ` Jan Beulich 2017-02-14 15:16 ` Andrew Cooper 2017-02-15 10:27 ` Sergey Dyasli 2017-02-14 10:29 ` [PATCH 2/2] x86: package up context switch hook pointers Jan Beulich 1 sibling, 2 replies; 27+ messages in thread From: Jan Beulich @ 2017-02-14 10:28 UTC (permalink / raw) To: xen-devel Cc: Sergey Dyasli, Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, Jun Nakajima [-- Attachment #1: Type: text/plain, Size: 4355 bytes --] When __context_switch() is being bypassed during original context switch handling, the vCPU "owning" the VMCS partially loses control of it: It will appear non-running to remote CPUs, and hence their attempt to pause the owning vCPU will have no effect on it (as it already looks to be paused). At the same time the "owning" CPU will re-enable interrupts eventually (the lastest when entering the idle loop) and hence becomes subject to IPIs from other CPUs requesting access to the VMCS. As a result, when __context_switch() finally gets run, the CPU may no longer have the VMCS loaded, and hence any accesses to it would fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from(). Similarly, when __context_switch() is being bypassed also on the second (switch-in) path, VMCS ownership may have been lost and hence needs re-establishing. Since there's no existing hook to put this in, add a new one. Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de> Reported-by: Anshul Makkar <anshul.makkar@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -2098,11 +2098,14 @@ void context_switch(struct vcpu *prev, s set_current(next); - if ( (per_cpu(curr_vcpu, cpu) == next) || - (is_idle_domain(nextd) && cpu_online(cpu)) ) + if ( (per_cpu(curr_vcpu, cpu) == next) ) { + if ( next->arch.ctxt_switch_same ) + next->arch.ctxt_switch_same(next); local_irq_enable(); } + else if ( is_idle_domain(nextd) && cpu_online(cpu) ) + local_irq_enable(); else { __context_switch(); --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -552,6 +552,27 @@ static void vmx_load_vmcs(struct vcpu *v local_irq_restore(flags); } +void vmx_vmcs_reload(struct vcpu *v) +{ + /* + * As we're running with interrupts disabled, we can't acquire + * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled + * the VMCS can't be taken away from us anymore if we still own it. + */ + ASSERT(!local_irq_is_enabled()); + if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) ) + return; + ASSERT(!this_cpu(current_vmcs)); + + /* + * Wait for the remote side to be done with the VMCS before loading + * it here. + */ + while ( v->arch.hvm_vmx.active_cpu != -1 ) + cpu_relax(); + vmx_load_vmcs(v); +} + int vmx_cpu_up_prepare(unsigned int cpu) { /* --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -298,6 +298,7 @@ static int vmx_vcpu_initialise(struct vc v->arch.schedule_tail = vmx_do_resume; v->arch.ctxt_switch_from = vmx_ctxt_switch_from; v->arch.ctxt_switch_to = vmx_ctxt_switch_to; + v->arch.ctxt_switch_same = vmx_vmcs_reload; if ( (rc = vmx_create_vmcs(v)) != 0 ) { @@ -936,6 +937,18 @@ static void vmx_ctxt_switch_from(struct if ( unlikely(!this_cpu(vmxon)) ) return; + if ( !v->is_running ) + { + /* + * When this vCPU isn't marked as running anymore, a remote pCPU's + * attempt to pause us (from vmx_vmcs_enter()) won't have a reason + * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken + * away the VMCS from us. As we're running with interrupts disabled, + * we also can't call vmx_vmcs_enter(). + */ + vmx_vmcs_reload(v); + } + vmx_fpu_leave(v); vmx_save_guest_msrs(v); vmx_restore_host_msrs(); --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -514,6 +514,7 @@ struct arch_vcpu void (*ctxt_switch_from) (struct vcpu *); void (*ctxt_switch_to) (struct vcpu *); + void (*ctxt_switch_same) (struct vcpu *); struct vpmu_struct vpmu; --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -174,6 +174,7 @@ void vmx_destroy_vmcs(struct vcpu *v); void vmx_vmcs_enter(struct vcpu *v); bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v); void vmx_vmcs_exit(struct vcpu *v); +void vmx_vmcs_reload(struct vcpu *v); #define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 #define CPU_BASED_USE_TSC_OFFSETING 0x00000008 [-- Attachment #2: VMX-enter-VMCS-race.patch --] [-- Type: text/plain, Size: 4397 bytes --] VMX: fix VMCS race on context-switch paths When __context_switch() is being bypassed during original context switch handling, the vCPU "owning" the VMCS partially loses control of it: It will appear non-running to remote CPUs, and hence their attempt to pause the owning vCPU will have no effect on it (as it already looks to be paused). At the same time the "owning" CPU will re-enable interrupts eventually (the lastest when entering the idle loop) and hence becomes subject to IPIs from other CPUs requesting access to the VMCS. As a result, when __context_switch() finally gets run, the CPU may no longer have the VMCS loaded, and hence any accesses to it would fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from(). Similarly, when __context_switch() is being bypassed also on the second (switch-in) path, VMCS ownership may have been lost and hence needs re-establishing. Since there's no existing hook to put this in, add a new one. Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de> Reported-by: Anshul Makkar <anshul.makkar@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -2098,11 +2098,14 @@ void context_switch(struct vcpu *prev, s set_current(next); - if ( (per_cpu(curr_vcpu, cpu) == next) || - (is_idle_domain(nextd) && cpu_online(cpu)) ) + if ( (per_cpu(curr_vcpu, cpu) == next) ) { + if ( next->arch.ctxt_switch_same ) + next->arch.ctxt_switch_same(next); local_irq_enable(); } + else if ( is_idle_domain(nextd) && cpu_online(cpu) ) + local_irq_enable(); else { __context_switch(); --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -552,6 +552,27 @@ static void vmx_load_vmcs(struct vcpu *v local_irq_restore(flags); } +void vmx_vmcs_reload(struct vcpu *v) +{ + /* + * As we're running with interrupts disabled, we can't acquire + * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled + * the VMCS can't be taken away from us anymore if we still own it. + */ + ASSERT(!local_irq_is_enabled()); + if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) ) + return; + ASSERT(!this_cpu(current_vmcs)); + + /* + * Wait for the remote side to be done with the VMCS before loading + * it here. + */ + while ( v->arch.hvm_vmx.active_cpu != -1 ) + cpu_relax(); + vmx_load_vmcs(v); +} + int vmx_cpu_up_prepare(unsigned int cpu) { /* --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -298,6 +298,7 @@ static int vmx_vcpu_initialise(struct vc v->arch.schedule_tail = vmx_do_resume; v->arch.ctxt_switch_from = vmx_ctxt_switch_from; v->arch.ctxt_switch_to = vmx_ctxt_switch_to; + v->arch.ctxt_switch_same = vmx_vmcs_reload; if ( (rc = vmx_create_vmcs(v)) != 0 ) { @@ -936,6 +937,18 @@ static void vmx_ctxt_switch_from(struct if ( unlikely(!this_cpu(vmxon)) ) return; + if ( !v->is_running ) + { + /* + * When this vCPU isn't marked as running anymore, a remote pCPU's + * attempt to pause us (from vmx_vmcs_enter()) won't have a reason + * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken + * away the VMCS from us. As we're running with interrupts disabled, + * we also can't call vmx_vmcs_enter(). + */ + vmx_vmcs_reload(v); + } + vmx_fpu_leave(v); vmx_save_guest_msrs(v); vmx_restore_host_msrs(); --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -514,6 +514,7 @@ struct arch_vcpu void (*ctxt_switch_from) (struct vcpu *); void (*ctxt_switch_to) (struct vcpu *); + void (*ctxt_switch_same) (struct vcpu *); struct vpmu_struct vpmu; --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -174,6 +174,7 @@ void vmx_destroy_vmcs(struct vcpu *v); void vmx_vmcs_enter(struct vcpu *v); bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v); void vmx_vmcs_exit(struct vcpu *v); +void vmx_vmcs_reload(struct vcpu *v); #define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 #define CPU_BASED_USE_TSC_OFFSETING 0x00000008 [-- Attachment #3: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-14 10:28 ` [PATCH 1/2] VMX: fix VMCS race on context-switch paths Jan Beulich @ 2017-02-14 15:16 ` Andrew Cooper 2017-02-15 8:37 ` Jan Beulich 2017-02-15 10:27 ` Sergey Dyasli 1 sibling, 1 reply; 27+ messages in thread From: Andrew Cooper @ 2017-02-14 15:16 UTC (permalink / raw) To: Jan Beulich, xen-devel Cc: Sergey Dyasli, Anshul Makkar, Kevin Tian, Jun Nakajima, Kevin.Mayer On 14/02/17 10:28, Jan Beulich wrote: > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -552,6 +552,27 @@ static void vmx_load_vmcs(struct vcpu *v > local_irq_restore(flags); > } > > +void vmx_vmcs_reload(struct vcpu *v) > +{ > + /* > + * As we're running with interrupts disabled, we can't acquire > + * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled > + * the VMCS can't be taken away from us anymore if we still own it. > + */ > + ASSERT(!local_irq_is_enabled()); > + if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) ) > + return; > + ASSERT(!this_cpu(current_vmcs)); > + > + /* > + * Wait for the remote side to be done with the VMCS before loading > + * it here. > + */ > + while ( v->arch.hvm_vmx.active_cpu != -1 ) > + cpu_relax(); Doesn't this need a ACCESS_ONCE() read? While the compiled code (using GCC 4.9) isn't an infinite loop, I am not aware of anything which prevents a compiler hoisting the comparison out as being constant. Otherwise, everything else LGTM. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-14 15:16 ` Andrew Cooper @ 2017-02-15 8:37 ` Jan Beulich 2017-02-15 11:29 ` Andrew Cooper 0 siblings, 1 reply; 27+ messages in thread From: Jan Beulich @ 2017-02-15 8:37 UTC (permalink / raw) To: Andrew Cooper Cc: Sergey Dyasli, Kevin Tian, Kevin.Mayer, Anshul Makkar, Jun Nakajima, xen-devel >>> On 14.02.17 at 16:16, <andrew.cooper3@citrix.com> wrote: > On 14/02/17 10:28, Jan Beulich wrote: >> --- a/xen/arch/x86/hvm/vmx/vmcs.c >> +++ b/xen/arch/x86/hvm/vmx/vmcs.c >> @@ -552,6 +552,27 @@ static void vmx_load_vmcs(struct vcpu *v >> local_irq_restore(flags); >> } >> >> +void vmx_vmcs_reload(struct vcpu *v) >> +{ >> + /* >> + * As we're running with interrupts disabled, we can't acquire >> + * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled >> + * the VMCS can't be taken away from us anymore if we still own it. >> + */ >> + ASSERT(!local_irq_is_enabled()); >> + if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) ) >> + return; >> + ASSERT(!this_cpu(current_vmcs)); >> + >> + /* >> + * Wait for the remote side to be done with the VMCS before loading >> + * it here. >> + */ >> + while ( v->arch.hvm_vmx.active_cpu != -1 ) >> + cpu_relax(); > > Doesn't this need a ACCESS_ONCE() read? > > While the compiled code (using GCC 4.9) isn't an infinite loop, I am not > aware of anything which prevents a compiler hoisting the comparison out > as being constant. That's the (intended) side effect of cpu_relax() having a memory clobber. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 8:37 ` Jan Beulich @ 2017-02-15 11:29 ` Andrew Cooper 0 siblings, 0 replies; 27+ messages in thread From: Andrew Cooper @ 2017-02-15 11:29 UTC (permalink / raw) To: Jan Beulich Cc: Sergey Dyasli, Kevin Tian, Kevin.Mayer, Anshul Makkar, Jun Nakajima, xen-devel On 15/02/17 08:37, Jan Beulich wrote: >>>> On 14.02.17 at 16:16, <andrew.cooper3@citrix.com> wrote: >> On 14/02/17 10:28, Jan Beulich wrote: >>> --- a/xen/arch/x86/hvm/vmx/vmcs.c >>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c >>> @@ -552,6 +552,27 @@ static void vmx_load_vmcs(struct vcpu *v >>> local_irq_restore(flags); >>> } >>> >>> +void vmx_vmcs_reload(struct vcpu *v) >>> +{ >>> + /* >>> + * As we're running with interrupts disabled, we can't acquire >>> + * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled >>> + * the VMCS can't be taken away from us anymore if we still own it. >>> + */ >>> + ASSERT(!local_irq_is_enabled()); >>> + if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) ) >>> + return; >>> + ASSERT(!this_cpu(current_vmcs)); >>> + >>> + /* >>> + * Wait for the remote side to be done with the VMCS before loading >>> + * it here. >>> + */ >>> + while ( v->arch.hvm_vmx.active_cpu != -1 ) >>> + cpu_relax(); >> Doesn't this need a ACCESS_ONCE() read? >> >> While the compiled code (using GCC 4.9) isn't an infinite loop, I am not >> aware of anything which prevents a compiler hoisting the comparison out >> as being constant. > That's the (intended) side effect of cpu_relax() having a memory > clobber. Ah ok. In which case that should be fine. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-14 10:28 ` [PATCH 1/2] VMX: fix VMCS race on context-switch paths Jan Beulich 2017-02-14 15:16 ` Andrew Cooper @ 2017-02-15 10:27 ` Sergey Dyasli 2017-02-15 11:00 ` Jan Beulich ` (2 more replies) 1 sibling, 3 replies; 27+ messages in thread From: Sergey Dyasli @ 2017-02-15 10:27 UTC (permalink / raw) To: JBeulich, xen-devel Cc: Sergey Dyasli, Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima On Tue, 2017-02-14 at 03:28 -0700, Jan Beulich wrote: > When __context_switch() is being bypassed during original context > switch handling, the vCPU "owning" the VMCS partially loses control of > it: It will appear non-running to remote CPUs, and hence their attempt > to pause the owning vCPU will have no effect on it (as it already > looks to be paused). At the same time the "owning" CPU will re-enable > interrupts eventually (the lastest when entering the idle loop) and > hence becomes subject to IPIs from other CPUs requesting access to the > VMCS. As a result, when __context_switch() finally gets run, the CPU > may no longer have the VMCS loaded, and hence any accesses to it would > fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from(). > > Similarly, when __context_switch() is being bypassed also on the second > (switch-in) path, VMCS ownership may have been lost and hence needs > re-establishing. Since there's no existing hook to put this in, add a > new one. > > Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de> > Reported-by: Anshul Makkar <anshul.makkar@citrix.com> > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -2098,11 +2098,14 @@ void context_switch(struct vcpu *prev, s > > set_current(next); > > - if ( (per_cpu(curr_vcpu, cpu) == next) || > - (is_idle_domain(nextd) && cpu_online(cpu)) ) > + if ( (per_cpu(curr_vcpu, cpu) == next) ) > { > + if ( next->arch.ctxt_switch_same ) > + next->arch.ctxt_switch_same(next); > local_irq_enable(); > } > + else if ( is_idle_domain(nextd) && cpu_online(cpu) ) > + local_irq_enable(); > else > { > __context_switch(); > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -552,6 +552,27 @@ static void vmx_load_vmcs(struct vcpu *v > local_irq_restore(flags); > } > > +void vmx_vmcs_reload(struct vcpu *v) > +{ > + /* > + * As we're running with interrupts disabled, we can't acquire > + * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled > + * the VMCS can't be taken away from us anymore if we still own it. > + */ > + ASSERT(!local_irq_is_enabled()); > + if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) ) > + return; > + ASSERT(!this_cpu(current_vmcs)); > + > + /* > + * Wait for the remote side to be done with the VMCS before loading > + * it here. > + */ > + while ( v->arch.hvm_vmx.active_cpu != -1 ) > + cpu_relax(); > + vmx_load_vmcs(v); > +} > + > int vmx_cpu_up_prepare(unsigned int cpu) > { > /* > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -298,6 +298,7 @@ static int vmx_vcpu_initialise(struct vc > v->arch.schedule_tail = vmx_do_resume; > v->arch.ctxt_switch_from = vmx_ctxt_switch_from; > v->arch.ctxt_switch_to = vmx_ctxt_switch_to; > + v->arch.ctxt_switch_same = vmx_vmcs_reload; > > if ( (rc = vmx_create_vmcs(v)) != 0 ) > { > @@ -936,6 +937,18 @@ static void vmx_ctxt_switch_from(struct > if ( unlikely(!this_cpu(vmxon)) ) > return; > > + if ( !v->is_running ) > + { > + /* > + * When this vCPU isn't marked as running anymore, a remote pCPU's > + * attempt to pause us (from vmx_vmcs_enter()) won't have a reason > + * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken > + * away the VMCS from us. As we're running with interrupts disabled, > + * we also can't call vmx_vmcs_enter(). > + */ > + vmx_vmcs_reload(v); > + } > + > vmx_fpu_leave(v); > vmx_save_guest_msrs(v); > vmx_restore_host_msrs(); > --- a/xen/include/asm-x86/domain.h > +++ b/xen/include/asm-x86/domain.h > @@ -514,6 +514,7 @@ struct arch_vcpu > > void (*ctxt_switch_from) (struct vcpu *); > void (*ctxt_switch_to) (struct vcpu *); > + void (*ctxt_switch_same) (struct vcpu *); > > struct vpmu_struct vpmu; > > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h > @@ -174,6 +174,7 @@ void vmx_destroy_vmcs(struct vcpu *v); > void vmx_vmcs_enter(struct vcpu *v); > bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v); > void vmx_vmcs_exit(struct vcpu *v); > +void vmx_vmcs_reload(struct vcpu *v); > > #define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 > #define CPU_BASED_USE_TSC_OFFSETING 0x00000008 > > Using the above patch with the following change for Xen-4.6.1: - if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) ) + if ( v->arch.hvm_vmx.vmcs == this_cpu(current_vmcs) ) This is what I'm getting during the original test case (32 VMs reboot): (XEN) [ 1407.789329] Watchdog timer detects that CPU12 is stuck! (XEN) [ 1407.795726] ----[ Xen-4.6.1-xs-local x86_64 debug=n Not tainted ]---- (XEN) [ 1407.803774] CPU: 12 (XEN) [ 1407.806975] RIP: e008:[<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50 (XEN) [ 1407.814926] RFLAGS: 0000000000000013 CONTEXT: hypervisor (d230v0) (XEN) [ 1407.822486] rax: 0000000000000000 rbx: ffff830079ee7000 rcx: 0000000000000000 (XEN) [ 1407.831407] rdx: 0000006f8f72ce00 rsi: ffff8329b3efbfe8 rdi: ffff830079ee7000 (XEN) [ 1407.840326] rbp: ffff83007bab7000 rsp: ffff83400fab7dc8 r8: 000001468e9e3ccc (XEN) [ 1407.849246] r9: ffff83403ffe7000 r10: 00000146c91c1737 r11: ffff833a9558c310 (XEN) [ 1407.858166] r12: ffff833a9558c000 r13: 000000000000000c r14: ffff83403ffe7000 (XEN) [ 1407.867085] r15: ffff82d080364640 cr0: 0000000080050033 cr4: 00000000003526e0 (XEN) [ 1407.876005] cr3: 000000294b074000 cr2: 000007fefd7ce150 (XEN) [ 1407.882599] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) [ 1407.890938] Xen code around <ffff82d0801ea2a2> (vmx_vmcs_reload+0x32/0x50): (XEN) [ 1407.899277] 84 00 00 00 00 00 f3 90 <83> bf e8 05 00 00 ff 75 f5 e9 a0 fa ff ff f3 c3 (XEN) [ 1407.908679] Xen stack trace from rsp=ffff83400fab7dc8: (XEN) [ 1407.914982] ffff82d08016c58d 0000000000001000 0000000000000000 0000000000000000 (XEN) [ 1407.923998] 0000000000000206 0000000000000086 0000000000000286 000000000000000c (XEN) [ 1407.933017] ffff83007bab7058 ffff82d080364640 ffff83007bab7000 00000146a7f26495 (XEN) [ 1407.942032] ffff830079ee7000 ffff833a9558cf84 ffff833a9558c000 ffff82d080364640 (XEN) [ 1407.951048] ffff82d08012fb8e ffff83400fabda98 ffff83400faba148 ffff83403ffe7000 (XEN) [ 1407.960067] ffff83400faba160 ffff83400fabda40 ffff82d080164305 000000000000000c (XEN) [ 1407.969083] ffff830079ee7000 0000000001c9c380 ffff82d080136400 000000440000011d (XEN) [ 1407.978101] 00000000ffffffff ffffffffffffffff ffff83400fab0000 ffff82d080348d00 (XEN) [ 1407.987116] ffff833a9558c000 ffff82d080364640 ffff82d08013311c ffff830079ee7000 (XEN) [ 1407.996134] ffff83400fab0000 ffff830079ee7000 ffff83403ffe7000 00000000ffffffff (XEN) [ 1408.005151] ffff82d080167d35 ffff83007bab7000 0000000000000001 fffffa80077f9700 (XEN) [ 1408.014167] fffffa80075bf900 fffffa80077f9820 0000000000000000 0000000000000000 (XEN) [ 1408.023184] fffffa8008889c00 0000000002fa1e78 0000003b6ed18d78 0000000000000000 (XEN) [ 1408.032202] 00000000068e7780 fffffa80075ba790 fffffa80077f9848 fffff800027f9e80 (XEN) [ 1408.041220] 0000000000000001 000000fc00000000 fffff880042499c2 0000000000000000 (XEN) [ 1408.050235] 0000000000000246 fffff80000b9cb58 0000000000000000 80248e00e008e1f0 (XEN) [ 1408.059253] 00000000ffff82d0 80248e00e008e200 00000000ffff82d0 80248e000000000c (XEN) [ 1408.068268] ffff830079ee7000 0000006f8f72ce00 00000000ffff82d0 (XEN) [ 1408.075638] Xen call trace: (XEN) [ 1408.079322] [<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50 (XEN) [ 1408.086303] [<ffff82d08016c58d>] context_switch+0x85d/0xeb0 (XEN) [ 1408.093380] [<ffff82d08012fb8e>] schedule.c#schedule+0x46e/0x7d0 (XEN) [ 1408.100942] [<ffff82d080164305>] reprogram_timer+0x75/0xe0 (XEN) [ 1408.107925] [<ffff82d080136400>] timer.c#timer_softirq_action+0x90/0x210 (XEN) [ 1408.116263] [<ffff82d08013311c>] softirq.c#__do_softirq+0x5c/0x90 (XEN) [ 1408.123921] [<ffff82d080167d35>] domain.c#idle_loop+0x25/0x60 Currently I'm testing the following patch that fixes at least one possible scenario: commit f76dc832c2de4950872fc27962c56c609cff1160 Author: Sergey Dyasli <sergey.dyasli@citrix.com> Date: Tue Feb 14 15:23:54 2017 +0000 x86/vmx: use curr_vcpu in vmx_vmcs_exit() During context_switch() from a HVM vCPU to the idle vCPU, current is updated but curr_vcpu retains the HMV vCPU's value. If after that, for some reason, vmx_vmcs_enter() + vmx_vmcs_exit() pair will be executed, the test for has_hvm_container_vcpu(current) will fail and vmcs for curr_vcpu will not be loaded. This will lead to BUG() during the next __context_switch() when vmx_ctxt_switch_from() will be executed and __vmwrite() will fail. Fix this by testing has_hvm_container_vcpu() against curr_vcpu. Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 88db7ee..f0d71ae 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -742,6 +742,8 @@ void vmx_vmcs_enter(struct vcpu *v) void vmx_vmcs_exit(struct vcpu *v) { struct foreign_vmcs *fv; + unsigned int cpu = smp_processor_id(); + struct vcpu *p = per_cpu(curr_vcpu, cpu); if ( likely(v == current) ) return; @@ -754,8 +756,8 @@ void vmx_vmcs_exit(struct vcpu *v) { /* Don't confuse vmx_do_resume (for @v or @current!) */ vmx_clear_vmcs(v); - if ( has_hvm_container_vcpu(current) ) - vmx_load_vmcs(current); + if ( has_hvm_container_vcpu(p) ) + vmx_load_vmcs(p); spin_unlock(&v->arch.hvm_vmx.vmcs_lock); vcpu_unpause(v); So far no crashes observed but testing continues. -- Thanks, Sergey _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 10:27 ` Sergey Dyasli @ 2017-02-15 11:00 ` Jan Beulich 2017-02-15 11:13 ` Sergey Dyasli 2017-02-15 11:39 ` Jan Beulich 2017-02-15 13:20 ` Jan Beulich 2 siblings, 1 reply; 27+ messages in thread From: Jan Beulich @ 2017-02-15 11:00 UTC (permalink / raw) To: Sergey Dyasli Cc: Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel >>> On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: > This is what I'm getting during the original test case (32 VMs reboot): > > (XEN) [ 1407.789329] Watchdog timer detects that CPU12 is stuck! > (XEN) [ 1407.795726] ----[ Xen-4.6.1-xs-local x86_64 debug=n Not tainted ]---- > (XEN) [ 1407.803774] CPU: 12 > (XEN) [ 1407.806975] RIP: e008:[<ffff82d0801ea2a2>] > vmx_vmcs_reload+0x32/0x50 > (XEN) [ 1407.814926] RFLAGS: 0000000000000013 CONTEXT: hypervisor (d230v0) > (XEN) [ 1407.822486] rax: 0000000000000000 rbx: ffff830079ee7000 rcx: 0000000000000000 > (XEN) [ 1407.831407] rdx: 0000006f8f72ce00 rsi: ffff8329b3efbfe8 rdi: ffff830079ee7000 > (XEN) [ 1407.840326] rbp: ffff83007bab7000 rsp: ffff83400fab7dc8 r8: 000001468e9e3ccc > (XEN) [ 1407.849246] r9: ffff83403ffe7000 r10: 00000146c91c1737 r11: ffff833a9558c310 > (XEN) [ 1407.858166] r12: ffff833a9558c000 r13: 000000000000000c r14: ffff83403ffe7000 > (XEN) [ 1407.867085] r15: ffff82d080364640 cr0: 0000000080050033 cr4: 00000000003526e0 > (XEN) [ 1407.876005] cr3: 000000294b074000 cr2: 000007fefd7ce150 > (XEN) [ 1407.882599] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) [ 1407.890938] Xen code around <ffff82d0801ea2a2> (vmx_vmcs_reload+0x32/0x50): > (XEN) [ 1407.899277] 84 00 00 00 00 00 f3 90 <83> bf e8 05 00 00 ff 75 f5 e9 a0 fa ff ff f3 c3 > (XEN) [ 1407.908679] Xen stack trace from rsp=ffff83400fab7dc8: > (XEN) [ 1407.914982] ffff82d08016c58d 0000000000001000 0000000000000000 0000000000000000 > (XEN) [ 1407.923998] 0000000000000206 0000000000000086 0000000000000286 000000000000000c > (XEN) [ 1407.933017] ffff83007bab7058 ffff82d080364640 ffff83007bab7000 00000146a7f26495 > (XEN) [ 1407.942032] ffff830079ee7000 ffff833a9558cf84 ffff833a9558c000 ffff82d080364640 > (XEN) [ 1407.951048] ffff82d08012fb8e ffff83400fabda98 ffff83400faba148 ffff83403ffe7000 > (XEN) [ 1407.960067] ffff83400faba160 ffff83400fabda40 ffff82d080164305 000000000000000c > (XEN) [ 1407.969083] ffff830079ee7000 0000000001c9c380 ffff82d080136400 000000440000011d > (XEN) [ 1407.978101] 00000000ffffffff ffffffffffffffff ffff83400fab0000 ffff82d080348d00 > (XEN) [ 1407.987116] ffff833a9558c000 ffff82d080364640 ffff82d08013311c ffff830079ee7000 > (XEN) [ 1407.996134] ffff83400fab0000 ffff830079ee7000 ffff83403ffe7000 00000000ffffffff > (XEN) [ 1408.005151] ffff82d080167d35 ffff83007bab7000 0000000000000001 fffffa80077f9700 > (XEN) [ 1408.014167] fffffa80075bf900 fffffa80077f9820 0000000000000000 0000000000000000 > (XEN) [ 1408.023184] fffffa8008889c00 0000000002fa1e78 0000003b6ed18d78 0000000000000000 > (XEN) [ 1408.032202] 00000000068e7780 fffffa80075ba790 fffffa80077f9848 fffff800027f9e80 > (XEN) [ 1408.041220] 0000000000000001 000000fc00000000 fffff880042499c2 0000000000000000 > (XEN) [ 1408.050235] 0000000000000246 fffff80000b9cb58 0000000000000000 80248e00e008e1f0 > (XEN) [ 1408.059253] 00000000ffff82d0 80248e00e008e200 00000000ffff82d0 80248e000000000c > (XEN) [ 1408.068268] ffff830079ee7000 0000006f8f72ce00 00000000ffff82d0 > (XEN) [ 1408.075638] Xen call trace: > (XEN) [ 1408.079322] [<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50 > (XEN) [ 1408.086303] [<ffff82d08016c58d>] context_switch+0x85d/0xeb0 > (XEN) [ 1408.093380] [<ffff82d08012fb8e>] schedule.c#schedule+0x46e/0x7d0 > (XEN) [ 1408.100942] [<ffff82d080164305>] reprogram_timer+0x75/0xe0 > (XEN) [ 1408.107925] [<ffff82d080136400>] timer.c#timer_softirq_action+0x90/0x210 > (XEN) [ 1408.116263] [<ffff82d08013311c>] softirq.c#__do_softirq+0x5c/0x90 > (XEN) [ 1408.123921] [<ffff82d080167d35>] domain.c#idle_loop+0x25/0x60 > > Currently I'm testing the following patch that fixes at least one possible > scenario: > > commit f76dc832c2de4950872fc27962c56c609cff1160 > Author: Sergey Dyasli <sergey.dyasli@citrix.com> > Date: Tue Feb 14 15:23:54 2017 +0000 > > x86/vmx: use curr_vcpu in vmx_vmcs_exit() > > During context_switch() from a HVM vCPU to the idle vCPU, current is > updated but curr_vcpu retains the HMV vCPU's value. If after that, > for some reason, vmx_vmcs_enter() + vmx_vmcs_exit() pair will be > executed, the test for has_hvm_container_vcpu(current) will fail and > vmcs for curr_vcpu will not be loaded. I'm not seeing the connection to the watchdog invocation above: I take it that vmx_vmcs_reload() was caught while waiting for active_cpu to become -1. Yet that's what vmx_clear_vmcs() achieves, not vmx_load_vmcs(). > This will lead to BUG() during the next __context_switch() when > vmx_ctxt_switch_from() will be executed and __vmwrite() will fail. But that case is specifically (supposed to be) taken care of by calling vmx_vmcs_reload() from vmx_ctxt_switch_from(). > Fix this by testing has_hvm_container_vcpu() against curr_vcpu. If at all possible, I like to avoid having to use curr_vcpu in the fixing of this (whole) issue. > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -742,6 +742,8 @@ void vmx_vmcs_enter(struct vcpu *v) > void vmx_vmcs_exit(struct vcpu *v) > { > struct foreign_vmcs *fv; > + unsigned int cpu = smp_processor_id(); > + struct vcpu *p = per_cpu(curr_vcpu, cpu); > > if ( likely(v == current) ) > return; > @@ -754,8 +756,8 @@ void vmx_vmcs_exit(struct vcpu *v) > { > /* Don't confuse vmx_do_resume (for @v or @current!) */ > vmx_clear_vmcs(v); > - if ( has_hvm_container_vcpu(current) ) > - vmx_load_vmcs(current); > + if ( has_hvm_container_vcpu(p) ) > + vmx_load_vmcs(p); If this change (or something similar) really turned out to be necessary, the variable declarations should be moved into this more narrow scope (and I'd prefer the "cpu" one to be dropped altogether). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 11:00 ` Jan Beulich @ 2017-02-15 11:13 ` Sergey Dyasli 2017-02-15 11:24 ` Jan Beulich 0 siblings, 1 reply; 27+ messages in thread From: Sergey Dyasli @ 2017-02-15 11:13 UTC (permalink / raw) To: JBeulich Cc: Sergey Dyasli, Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel On Wed, 2017-02-15 at 04:00 -0700, Jan Beulich wrote: > > > > On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: > > > > This is what I'm getting during the original test case (32 VMs reboot): > > > > (XEN) [ 1407.789329] Watchdog timer detects that CPU12 is stuck! > > (XEN) [ 1407.795726] ----[ Xen-4.6.1-xs-local x86_64 debug=n Not tainted ]---- > > (XEN) [ 1407.803774] CPU: 12 > > (XEN) [ 1407.806975] RIP: e008:[<ffff82d0801ea2a2>] > > vmx_vmcs_reload+0x32/0x50 > > (XEN) [ 1407.814926] RFLAGS: 0000000000000013 CONTEXT: hypervisor (d230v0) > > (XEN) [ 1407.822486] rax: 0000000000000000 rbx: ffff830079ee7000 rcx: 0000000000000000 > > (XEN) [ 1407.831407] rdx: 0000006f8f72ce00 rsi: ffff8329b3efbfe8 rdi: ffff830079ee7000 > > (XEN) [ 1407.840326] rbp: ffff83007bab7000 rsp: ffff83400fab7dc8 r8: 000001468e9e3ccc > > (XEN) [ 1407.849246] r9: ffff83403ffe7000 r10: 00000146c91c1737 r11: ffff833a9558c310 > > (XEN) [ 1407.858166] r12: ffff833a9558c000 r13: 000000000000000c r14: ffff83403ffe7000 > > (XEN) [ 1407.867085] r15: ffff82d080364640 cr0: 0000000080050033 cr4: 00000000003526e0 > > (XEN) [ 1407.876005] cr3: 000000294b074000 cr2: 000007fefd7ce150 > > (XEN) [ 1407.882599] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > > (XEN) [ 1407.890938] Xen code around <ffff82d0801ea2a2> (vmx_vmcs_reload+0x32/0x50): > > (XEN) [ 1407.899277] 84 00 00 00 00 00 f3 90 <83> bf e8 05 00 00 ff 75 f5 e9 a0 fa ff ff f3 c3 > > (XEN) [ 1407.908679] Xen stack trace from rsp=ffff83400fab7dc8: > > (XEN) [ 1407.914982] ffff82d08016c58d 0000000000001000 0000000000000000 0000000000000000 > > (XEN) [ 1407.923998] 0000000000000206 0000000000000086 0000000000000286 000000000000000c > > (XEN) [ 1407.933017] ffff83007bab7058 ffff82d080364640 ffff83007bab7000 00000146a7f26495 > > (XEN) [ 1407.942032] ffff830079ee7000 ffff833a9558cf84 ffff833a9558c000 ffff82d080364640 > > (XEN) [ 1407.951048] ffff82d08012fb8e ffff83400fabda98 ffff83400faba148 ffff83403ffe7000 > > (XEN) [ 1407.960067] ffff83400faba160 ffff83400fabda40 ffff82d080164305 000000000000000c > > (XEN) [ 1407.969083] ffff830079ee7000 0000000001c9c380 ffff82d080136400 000000440000011d > > (XEN) [ 1407.978101] 00000000ffffffff ffffffffffffffff ffff83400fab0000 ffff82d080348d00 > > (XEN) [ 1407.987116] ffff833a9558c000 ffff82d080364640 ffff82d08013311c ffff830079ee7000 > > (XEN) [ 1407.996134] ffff83400fab0000 ffff830079ee7000 ffff83403ffe7000 00000000ffffffff > > (XEN) [ 1408.005151] ffff82d080167d35 ffff83007bab7000 0000000000000001 fffffa80077f9700 > > (XEN) [ 1408.014167] fffffa80075bf900 fffffa80077f9820 0000000000000000 0000000000000000 > > (XEN) [ 1408.023184] fffffa8008889c00 0000000002fa1e78 0000003b6ed18d78 0000000000000000 > > (XEN) [ 1408.032202] 00000000068e7780 fffffa80075ba790 fffffa80077f9848 fffff800027f9e80 > > (XEN) [ 1408.041220] 0000000000000001 000000fc00000000 fffff880042499c2 0000000000000000 > > (XEN) [ 1408.050235] 0000000000000246 fffff80000b9cb58 0000000000000000 80248e00e008e1f0 > > (XEN) [ 1408.059253] 00000000ffff82d0 80248e00e008e200 00000000ffff82d0 80248e000000000c > > (XEN) [ 1408.068268] ffff830079ee7000 0000006f8f72ce00 00000000ffff82d0 > > (XEN) [ 1408.075638] Xen call trace: > > (XEN) [ 1408.079322] [<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50 > > (XEN) [ 1408.086303] [<ffff82d08016c58d>] context_switch+0x85d/0xeb0 > > (XEN) [ 1408.093380] [<ffff82d08012fb8e>] schedule.c#schedule+0x46e/0x7d0 > > (XEN) [ 1408.100942] [<ffff82d080164305>] reprogram_timer+0x75/0xe0 > > (XEN) [ 1408.107925] [<ffff82d080136400>] timer.c#timer_softirq_action+0x90/0x210 > > (XEN) [ 1408.116263] [<ffff82d08013311c>] softirq.c#__do_softirq+0x5c/0x90 > > (XEN) [ 1408.123921] [<ffff82d080167d35>] domain.c#idle_loop+0x25/0x60 > > > > Currently I'm testing the following patch that fixes at least one possible > > scenario: > > > > commit f76dc832c2de4950872fc27962c56c609cff1160 > > Author: Sergey Dyasli <sergey.dyasli@citrix.com> > > Date: Tue Feb 14 15:23:54 2017 +0000 > > > > x86/vmx: use curr_vcpu in vmx_vmcs_exit() > > > > During context_switch() from a HVM vCPU to the idle vCPU, current is > > updated but curr_vcpu retains the HMV vCPU's value. If after that, > > for some reason, vmx_vmcs_enter() + vmx_vmcs_exit() pair will be > > executed, the test for has_hvm_container_vcpu(current) will fail and > > vmcs for curr_vcpu will not be loaded. > > I'm not seeing the connection to the watchdog invocation above: > I take it that vmx_vmcs_reload() was caught while waiting for > active_cpu to become -1. Yet that's what vmx_clear_vmcs() > achieves, not vmx_load_vmcs(). Sorry for misunderstanding but these 2 patches are being tested separately, not together. My patch is an alternative approach for the issue. It fixes at least one observed BUG() scenario with PML and I'm currently looking for others. > Similarly, when __context_switch() is being bypassed also on the second > (switch-in) path, VMCS ownership may have been lost and hence needs > re-establishing. Since there's no existing hook to put this in, add a > new one. BTW, and what about vmx_do_resume()? It does vmx_load_vmcs() in cases when VMCS was cleared by someone. > > > This will lead to BUG() during the next __context_switch() when > > vmx_ctxt_switch_from() will be executed and __vmwrite() will fail. > > But that case is specifically (supposed to be) taken care of by > calling vmx_vmcs_reload() from vmx_ctxt_switch_from(). > > > Fix this by testing has_hvm_container_vcpu() against curr_vcpu. > > If at all possible, I like to avoid having to use curr_vcpu in the fixing > of this (whole) issue. > > > --- a/xen/arch/x86/hvm/vmx/vmcs.c > > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > > @@ -742,6 +742,8 @@ void vmx_vmcs_enter(struct vcpu *v) > > void vmx_vmcs_exit(struct vcpu *v) > > { > > struct foreign_vmcs *fv; > > + unsigned int cpu = smp_processor_id(); > > + struct vcpu *p = per_cpu(curr_vcpu, cpu); > > > > if ( likely(v == current) ) > > return; > > @@ -754,8 +756,8 @@ void vmx_vmcs_exit(struct vcpu *v) > > { > > /* Don't confuse vmx_do_resume (for @v or @current!) */ > > vmx_clear_vmcs(v); > > - if ( has_hvm_container_vcpu(current) ) > > - vmx_load_vmcs(current); > > + if ( has_hvm_container_vcpu(p) ) > > + vmx_load_vmcs(p); > > If this change (or something similar) really turned out to be > necessary, the variable declarations should be moved into this > more narrow scope (and I'd prefer the "cpu" one to be dropped > altogether). > > Jan -- Thanks, Sergey _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 11:13 ` Sergey Dyasli @ 2017-02-15 11:24 ` Jan Beulich 0 siblings, 0 replies; 27+ messages in thread From: Jan Beulich @ 2017-02-15 11:24 UTC (permalink / raw) To: Sergey Dyasli Cc: Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel >>> On 15.02.17 at 12:13, <sergey.dyasli@citrix.com> wrote: > On Wed, 2017-02-15 at 04:00 -0700, Jan Beulich wrote: >> Similarly, when __context_switch() is being bypassed also on the second >> (switch-in) path, VMCS ownership may have been lost and hence needs >> re-establishing. Since there's no existing hook to put this in, add a >> new one. > > BTW, and what about vmx_do_resume()? It does vmx_load_vmcs() in cases > when VMCS was cleared by someone. Not sure I understand the question correctly - it would seem to me that this was a half hearted attempt to deal with the situation we now find needs more changes, i.e. I would assume that call could now be dropped from the function. Or if it was to stay, it should likely become a vmx_vmcs_reload() call (with the inner conditional dropped). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 10:27 ` Sergey Dyasli 2017-02-15 11:00 ` Jan Beulich @ 2017-02-15 11:39 ` Jan Beulich 2017-02-15 11:48 ` Sergey Dyasli 2017-02-15 13:20 ` Jan Beulich 2 siblings, 1 reply; 27+ messages in thread From: Jan Beulich @ 2017-02-15 11:39 UTC (permalink / raw) To: Sergey Dyasli Cc: Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel >>> On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: > This is what I'm getting during the original test case (32 VMs reboot): > > (XEN) [ 1407.789329] Watchdog timer detects that CPU12 is stuck! > (XEN) [ 1407.795726] ----[ Xen-4.6.1-xs-local x86_64 debug=n Not tainted ]---- > (XEN) [ 1407.803774] CPU: 12 > (XEN) [ 1407.806975] RIP: e008:[<ffff82d0801ea2a2>] > vmx_vmcs_reload+0x32/0x50 > (XEN) [ 1407.814926] RFLAGS: 0000000000000013 CONTEXT: hypervisor (d230v0) > (XEN) [ 1407.822486] rax: 0000000000000000 rbx: ffff830079ee7000 rcx: 0000000000000000 > (XEN) [ 1407.831407] rdx: 0000006f8f72ce00 rsi: ffff8329b3efbfe8 rdi: ffff830079ee7000 > (XEN) [ 1407.840326] rbp: ffff83007bab7000 rsp: ffff83400fab7dc8 r8: 000001468e9e3ccc > (XEN) [ 1407.849246] r9: ffff83403ffe7000 r10: 00000146c91c1737 r11: ffff833a9558c310 > (XEN) [ 1407.858166] r12: ffff833a9558c000 r13: 000000000000000c r14: ffff83403ffe7000 > (XEN) [ 1407.867085] r15: ffff82d080364640 cr0: 0000000080050033 cr4: 00000000003526e0 > (XEN) [ 1407.876005] cr3: 000000294b074000 cr2: 000007fefd7ce150 > (XEN) [ 1407.882599] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) [ 1407.890938] Xen code around <ffff82d0801ea2a2> > (vmx_vmcs_reload+0x32/0x50): > (XEN) [ 1407.899277] 84 00 00 00 00 00 f3 90 <83> bf e8 05 00 00 ff 75 f5 e9 a0 fa ff ff f3 c3 > (XEN) [ 1407.908679] Xen stack trace from rsp=ffff83400fab7dc8: > (XEN) [ 1407.914982] ffff82d08016c58d 0000000000001000 0000000000000000 0000000000000000 > (XEN) [ 1407.923998] 0000000000000206 0000000000000086 0000000000000286 000000000000000c > (XEN) [ 1407.933017] ffff83007bab7058 ffff82d080364640 ffff83007bab7000 00000146a7f26495 > (XEN) [ 1407.942032] ffff830079ee7000 ffff833a9558cf84 ffff833a9558c000 ffff82d080364640 > (XEN) [ 1407.951048] ffff82d08012fb8e ffff83400fabda98 ffff83400faba148 ffff83403ffe7000 > (XEN) [ 1407.960067] ffff83400faba160 ffff83400fabda40 ffff82d080164305 000000000000000c > (XEN) [ 1407.969083] ffff830079ee7000 0000000001c9c380 ffff82d080136400 000000440000011d > (XEN) [ 1407.978101] 00000000ffffffff ffffffffffffffff ffff83400fab0000 ffff82d080348d00 > (XEN) [ 1407.987116] ffff833a9558c000 ffff82d080364640 ffff82d08013311c ffff830079ee7000 > (XEN) [ 1407.996134] ffff83400fab0000 ffff830079ee7000 ffff83403ffe7000 00000000ffffffff > (XEN) [ 1408.005151] ffff82d080167d35 ffff83007bab7000 0000000000000001 fffffa80077f9700 > (XEN) [ 1408.014167] fffffa80075bf900 fffffa80077f9820 0000000000000000 0000000000000000 > (XEN) [ 1408.023184] fffffa8008889c00 0000000002fa1e78 0000003b6ed18d78 0000000000000000 > (XEN) [ 1408.032202] 00000000068e7780 fffffa80075ba790 fffffa80077f9848 fffff800027f9e80 > (XEN) [ 1408.041220] 0000000000000001 000000fc00000000 fffff880042499c2 0000000000000000 > (XEN) [ 1408.050235] 0000000000000246 fffff80000b9cb58 0000000000000000 80248e00e008e1f0 > (XEN) [ 1408.059253] 00000000ffff82d0 80248e00e008e200 00000000ffff82d0 80248e000000000c > (XEN) [ 1408.068268] ffff830079ee7000 0000006f8f72ce00 00000000ffff82d0 > (XEN) [ 1408.075638] Xen call trace: > (XEN) [ 1408.079322] [<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50 > (XEN) [ 1408.086303] [<ffff82d08016c58d>] context_switch+0x85d/0xeb0 > (XEN) [ 1408.093380] [<ffff82d08012fb8e>] schedule.c#schedule+0x46e/0x7d0 > (XEN) [ 1408.100942] [<ffff82d080164305>] reprogram_timer+0x75/0xe0 > (XEN) [ 1408.107925] [<ffff82d080136400>] timer.c#timer_softirq_action+0x90/0x210 > (XEN) [ 1408.116263] [<ffff82d08013311c>] softirq.c#__do_softirq+0x5c/0x90 > (XEN) [ 1408.123921] [<ffff82d080167d35>] domain.c#idle_loop+0x25/0x60 Taking your later reply into account - were you able to figure out what other party held onto the VMCS being waited for here? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 11:39 ` Jan Beulich @ 2017-02-15 11:48 ` Sergey Dyasli 2017-02-15 11:55 ` Jan Beulich 0 siblings, 1 reply; 27+ messages in thread From: Sergey Dyasli @ 2017-02-15 11:48 UTC (permalink / raw) To: JBeulich Cc: Sergey Dyasli, Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel On Wed, 2017-02-15 at 04:39 -0700, Jan Beulich wrote: > > > > On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: > > > > This is what I'm getting during the original test case (32 VMs reboot): > > > > (XEN) [ 1407.789329] Watchdog timer detects that CPU12 is stuck! > > (XEN) [ 1407.795726] ----[ Xen-4.6.1-xs-local x86_64 debug=n Not tainted ]---- > > (XEN) [ 1407.803774] CPU: 12 > > (XEN) [ 1407.806975] RIP: e008:[<ffff82d0801ea2a2>] > > vmx_vmcs_reload+0x32/0x50 > > (XEN) [ 1407.814926] RFLAGS: 0000000000000013 CONTEXT: hypervisor (d230v0) > > (XEN) [ 1407.822486] rax: 0000000000000000 rbx: ffff830079ee7000 rcx: 0000000000000000 > > (XEN) [ 1407.831407] rdx: 0000006f8f72ce00 rsi: ffff8329b3efbfe8 rdi: ffff830079ee7000 > > (XEN) [ 1407.840326] rbp: ffff83007bab7000 rsp: ffff83400fab7dc8 r8: 000001468e9e3ccc > > (XEN) [ 1407.849246] r9: ffff83403ffe7000 r10: 00000146c91c1737 r11: ffff833a9558c310 > > (XEN) [ 1407.858166] r12: ffff833a9558c000 r13: 000000000000000c r14: ffff83403ffe7000 > > (XEN) [ 1407.867085] r15: ffff82d080364640 cr0: 0000000080050033 cr4: 00000000003526e0 > > (XEN) [ 1407.876005] cr3: 000000294b074000 cr2: 000007fefd7ce150 > > (XEN) [ 1407.882599] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > > (XEN) [ 1407.890938] Xen code around <ffff82d0801ea2a2> > > (vmx_vmcs_reload+0x32/0x50): > > (XEN) [ 1407.899277] 84 00 00 00 00 00 f3 90 <83> bf e8 05 00 00 ff 75 f5 e9 a0 fa ff ff f3 c3 > > (XEN) [ 1407.908679] Xen stack trace from rsp=ffff83400fab7dc8: > > (XEN) [ 1407.914982] ffff82d08016c58d 0000000000001000 0000000000000000 0000000000000000 > > (XEN) [ 1407.923998] 0000000000000206 0000000000000086 0000000000000286 000000000000000c > > (XEN) [ 1407.933017] ffff83007bab7058 ffff82d080364640 ffff83007bab7000 00000146a7f26495 > > (XEN) [ 1407.942032] ffff830079ee7000 ffff833a9558cf84 ffff833a9558c000 ffff82d080364640 > > (XEN) [ 1407.951048] ffff82d08012fb8e ffff83400fabda98 ffff83400faba148 ffff83403ffe7000 > > (XEN) [ 1407.960067] ffff83400faba160 ffff83400fabda40 ffff82d080164305 000000000000000c > > (XEN) [ 1407.969083] ffff830079ee7000 0000000001c9c380 ffff82d080136400 000000440000011d > > (XEN) [ 1407.978101] 00000000ffffffff ffffffffffffffff ffff83400fab0000 ffff82d080348d00 > > (XEN) [ 1407.987116] ffff833a9558c000 ffff82d080364640 ffff82d08013311c ffff830079ee7000 > > (XEN) [ 1407.996134] ffff83400fab0000 ffff830079ee7000 ffff83403ffe7000 00000000ffffffff > > (XEN) [ 1408.005151] ffff82d080167d35 ffff83007bab7000 0000000000000001 fffffa80077f9700 > > (XEN) [ 1408.014167] fffffa80075bf900 fffffa80077f9820 0000000000000000 0000000000000000 > > (XEN) [ 1408.023184] fffffa8008889c00 0000000002fa1e78 0000003b6ed18d78 0000000000000000 > > (XEN) [ 1408.032202] 00000000068e7780 fffffa80075ba790 fffffa80077f9848 fffff800027f9e80 > > (XEN) [ 1408.041220] 0000000000000001 000000fc00000000 fffff880042499c2 0000000000000000 > > (XEN) [ 1408.050235] 0000000000000246 fffff80000b9cb58 0000000000000000 80248e00e008e1f0 > > (XEN) [ 1408.059253] 00000000ffff82d0 80248e00e008e200 00000000ffff82d0 80248e000000000c > > (XEN) [ 1408.068268] ffff830079ee7000 0000006f8f72ce00 00000000ffff82d0 > > (XEN) [ 1408.075638] Xen call trace: > > (XEN) [ 1408.079322] [<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50 > > (XEN) [ 1408.086303] [<ffff82d08016c58d>] context_switch+0x85d/0xeb0 > > (XEN) [ 1408.093380] [<ffff82d08012fb8e>] schedule.c#schedule+0x46e/0x7d0 > > (XEN) [ 1408.100942] [<ffff82d080164305>] reprogram_timer+0x75/0xe0 > > (XEN) [ 1408.107925] [<ffff82d080136400>] timer.c#timer_softirq_action+0x90/0x210 > > (XEN) [ 1408.116263] [<ffff82d08013311c>] softirq.c#__do_softirq+0x5c/0x90 > > (XEN) [ 1408.123921] [<ffff82d080167d35>] domain.c#idle_loop+0x25/0x60 > > Taking your later reply into account - were you able to figure out > what other party held onto the VMCS being waited for here? Unfortunately, no. It was unclear from debug logs. But judging from the following vmx_do_resume() code: if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() ) { if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) ) vmx_load_vmcs(v); } If both of the above conditions are true then vmx_vmcs_reload() will probably hang. -- Thanks, Sergey _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 11:48 ` Sergey Dyasli @ 2017-02-15 11:55 ` Jan Beulich 2017-02-15 13:03 ` Jan Beulich 0 siblings, 1 reply; 27+ messages in thread From: Jan Beulich @ 2017-02-15 11:55 UTC (permalink / raw) To: Sergey Dyasli Cc: Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel >>> On 15.02.17 at 12:48, <sergey.dyasli@citrix.com> wrote: > On Wed, 2017-02-15 at 04:39 -0700, Jan Beulich wrote: >> > > > On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: >> > (XEN) [ 1408.075638] Xen call trace: >> > (XEN) [ 1408.079322] [<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50 >> > (XEN) [ 1408.086303] [<ffff82d08016c58d>] context_switch+0x85d/0xeb0 >> > (XEN) [ 1408.093380] [<ffff82d08012fb8e>] schedule.c#schedule+0x46e/0x7d0 >> > (XEN) [ 1408.100942] [<ffff82d080164305>] reprogram_timer+0x75/0xe0 >> > (XEN) [ 1408.107925] [<ffff82d080136400>] timer.c#timer_softirq_action+0x90/0x210 >> > (XEN) [ 1408.116263] [<ffff82d08013311c>] softirq.c#__do_softirq+0x5c/0x90 >> > (XEN) [ 1408.123921] [<ffff82d080167d35>] domain.c#idle_loop+0x25/0x60 >> >> Taking your later reply into account - were you able to figure out >> what other party held onto the VMCS being waited for here? > > Unfortunately, no. It was unclear from debug logs. But judging from > the following vmx_do_resume() code: > > if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() ) > { > if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) ) > vmx_load_vmcs(v); > } > > If both of the above conditions are true then vmx_vmcs_reload() will > probably hang. I don't follow (reload should run before this, not after), but I must be missing something more general anyway, as I'm seeing the code above being needed despite the reload additions. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 11:55 ` Jan Beulich @ 2017-02-15 13:03 ` Jan Beulich 2017-02-15 13:40 ` Sergey Dyasli 0 siblings, 1 reply; 27+ messages in thread From: Jan Beulich @ 2017-02-15 13:03 UTC (permalink / raw) To: Sergey Dyasli Cc: Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel >>> On 15.02.17 at 12:55, <JBeulich@suse.com> wrote: >>>> On 15.02.17 at 12:48, <sergey.dyasli@citrix.com> wrote: >> On Wed, 2017-02-15 at 04:39 -0700, Jan Beulich wrote: >>> > > > On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: >>> > (XEN) [ 1408.075638] Xen call trace: >>> > (XEN) [ 1408.079322] [<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50 >>> > (XEN) [ 1408.086303] [<ffff82d08016c58d>] context_switch+0x85d/0xeb0 >>> > (XEN) [ 1408.093380] [<ffff82d08012fb8e>] schedule.c#schedule+0x46e/0x7d0 >>> > (XEN) [ 1408.100942] [<ffff82d080164305>] reprogram_timer+0x75/0xe0 >>> > (XEN) [ 1408.107925] [<ffff82d080136400>] > timer.c#timer_softirq_action+0x90/0x210 >>> > (XEN) [ 1408.116263] [<ffff82d08013311c>] softirq.c#__do_softirq+0x5c/0x90 >>> > (XEN) [ 1408.123921] [<ffff82d080167d35>] domain.c#idle_loop+0x25/0x60 >>> >>> Taking your later reply into account - were you able to figure out >>> what other party held onto the VMCS being waited for here? >> >> Unfortunately, no. It was unclear from debug logs. But judging from >> the following vmx_do_resume() code: >> >> if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() ) >> { >> if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) ) >> vmx_load_vmcs(v); >> } >> >> If both of the above conditions are true then vmx_vmcs_reload() will >> probably hang. > > I don't follow (reload should run before this, not after), but I must > be missing something more general anyway, as I'm seeing the code > above being needed despite the reload additions. I think I've understood part of it over lunch: Surprisingly enough vmx_ctxt_switch_to() doesn't re-establish the VMCS, so it needs to be done here. Which I think means we don't need the new hook at all, as that way the state is no different between going through ->to() or bypassing it. What I continue to not understand is why vmcs_pa would ever not match current_vmcs when active_cpu is smp_processor_id(). So far I thought both are always updated together. Looking further ... Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 13:03 ` Jan Beulich @ 2017-02-15 13:40 ` Sergey Dyasli 2017-02-15 14:29 ` Jan Beulich 0 siblings, 1 reply; 27+ messages in thread From: Sergey Dyasli @ 2017-02-15 13:40 UTC (permalink / raw) To: JBeulich Cc: Sergey Dyasli, Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel On Wed, 2017-02-15 at 06:03 -0700, Jan Beulich wrote: > > > > On 15.02.17 at 12:55, <JBeulich@suse.com> wrote: > > > > > On 15.02.17 at 12:48, <sergey.dyasli@citrix.com> wrote: > > > > > > On Wed, 2017-02-15 at 04:39 -0700, Jan Beulich wrote: > > > > > > > On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: > > > > > > > > > > (XEN) [ 1408.075638] Xen call trace: > > > > > (XEN) [ 1408.079322] [<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50 > > > > > (XEN) [ 1408.086303] [<ffff82d08016c58d>] context_switch+0x85d/0xeb0 > > > > > (XEN) [ 1408.093380] [<ffff82d08012fb8e>] schedule.c#schedule+0x46e/0x7d0 > > > > > (XEN) [ 1408.100942] [<ffff82d080164305>] reprogram_timer+0x75/0xe0 > > > > > (XEN) [ 1408.107925] [<ffff82d080136400>] > > > > timer.c#timer_softirq_action+0x90/0x210 > > > > > (XEN) [ 1408.116263] [<ffff82d08013311c>] softirq.c#__do_softirq+0x5c/0x90 > > > > > (XEN) [ 1408.123921] [<ffff82d080167d35>] domain.c#idle_loop+0x25/0x60 > > > > > > > > Taking your later reply into account - were you able to figure out > > > > what other party held onto the VMCS being waited for here? > > > > > > Unfortunately, no. It was unclear from debug logs. But judging from > > > the following vmx_do_resume() code: > > > > > > if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() ) > > > { > > > if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) ) > > > vmx_load_vmcs(v); > > > } > > > > > > If both of the above conditions are true then vmx_vmcs_reload() will > > > probably hang. > > > > I don't follow (reload should run before this, not after), but I must > > be missing something more general anyway, as I'm seeing the code > > above being needed despite the reload additions. > > I think I've understood part of it over lunch: Surprisingly enough > vmx_ctxt_switch_to() doesn't re-establish the VMCS, so it needs > to be done here. Which I think means we don't need the new > hook at all, as that way the state is no different between going > through ->to() or bypassing it. > > What I continue to not understand is why vmcs_pa would ever > not match current_vmcs when active_cpu is smp_processor_id(). > So far I thought both are always updated together. Looking > further ... This is exactly what will happen should the 3.1 occur: 1. HVM vCPU#1 --> idle vCPU context_switch 2. softirq --> vmx_vmcs_enter() + vmx_vmcs_exit() for a remote vCPU [scenario with PML] This will switch current_vmcs to a remote one. has_hvm_container_vcpu(current) will be false and vmcs will not be reloaded. 3.1. idle vCPU --> HVM vCPU#1 (same) context_switch vmx_do_resume v->arch.hvm_vmx.active_cpu == smp_processor_id() v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) 3.2 idle vCPU --> HVM vCPU#2 (different) __context_switch() vmwrite BUG() This is the original BUG() scenario which my patch addresses. -- Thanks, Sergey _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 13:40 ` Sergey Dyasli @ 2017-02-15 14:29 ` Jan Beulich 2017-02-15 14:44 ` Jan Beulich 0 siblings, 1 reply; 27+ messages in thread From: Jan Beulich @ 2017-02-15 14:29 UTC (permalink / raw) To: Sergey Dyasli Cc: Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel >>> On 15.02.17 at 14:40, <sergey.dyasli@citrix.com> wrote: > On Wed, 2017-02-15 at 06:03 -0700, Jan Beulich wrote: >> > > > On 15.02.17 at 12:55, <JBeulich@suse.com> wrote: >> > > > > On 15.02.17 at 12:48, <sergey.dyasli@citrix.com> wrote: >> > > >> > > On Wed, 2017-02-15 at 04:39 -0700, Jan Beulich wrote: >> > > > > > > On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: >> > > > > >> > > > > (XEN) [ 1408.075638] Xen call trace: >> > > > > (XEN) [ 1408.079322] [<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50 >> > > > > (XEN) [ 1408.086303] [<ffff82d08016c58d>] context_switch+0x85d/0xeb0 >> > > > > (XEN) [ 1408.093380] [<ffff82d08012fb8e>] > schedule.c#schedule+0x46e/0x7d0 >> > > > > (XEN) [ 1408.100942] [<ffff82d080164305>] reprogram_timer+0x75/0xe0 >> > > > > (XEN) [ 1408.107925] [<ffff82d080136400>] >> > >> > timer.c#timer_softirq_action+0x90/0x210 >> > > > > (XEN) [ 1408.116263] [<ffff82d08013311c>] > softirq.c#__do_softirq+0x5c/0x90 >> > > > > (XEN) [ 1408.123921] [<ffff82d080167d35>] domain.c#idle_loop+0x25/0x60 >> > > > >> > > > Taking your later reply into account - were you able to figure out >> > > > what other party held onto the VMCS being waited for here? >> > > >> > > Unfortunately, no. It was unclear from debug logs. But judging from >> > > the following vmx_do_resume() code: >> > > >> > > if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() ) >> > > { >> > > if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) ) >> > > vmx_load_vmcs(v); >> > > } >> > > >> > > If both of the above conditions are true then vmx_vmcs_reload() will >> > > probably hang. >> > >> > I don't follow (reload should run before this, not after), but I must >> > be missing something more general anyway, as I'm seeing the code >> > above being needed despite the reload additions. >> >> I think I've understood part of it over lunch: Surprisingly enough >> vmx_ctxt_switch_to() doesn't re-establish the VMCS, so it needs >> to be done here. Which I think means we don't need the new >> hook at all, as that way the state is no different between going >> through ->to() or bypassing it. >> >> What I continue to not understand is why vmcs_pa would ever >> not match current_vmcs when active_cpu is smp_processor_id(). >> So far I thought both are always updated together. Looking >> further ... > > This is exactly what will happen should the 3.1 occur: > > 1. HVM vCPU#1 --> idle vCPU context_switch > > 2. softirq --> vmx_vmcs_enter() + vmx_vmcs_exit() for a remote vCPU > [scenario with PML] > This will switch current_vmcs to a remote one. > has_hvm_container_vcpu(current) will be false and vmcs will not > be reloaded. Oh, right - this updates v's active_cpu, but doesn't update the field for the vCPU the VMCS is being taken away from. > 3.1. idle vCPU --> HVM vCPU#1 (same) context_switch > vmx_do_resume > v->arch.hvm_vmx.active_cpu == smp_processor_id() > v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) > > 3.2 idle vCPU --> HVM vCPU#2 (different) > __context_switch() > vmwrite > BUG() > This is the original BUG() scenario which my patch > addresses. Which I'm not convinced of, as indicated in an earlier reply (on the thread Anshul did start) to my own consideration of using curr_vcpu. The problem of vcpu_pause() not spinning when v->is_running is clear does not get addressed by your patch afaict, so to me it looks as if you only shrink the critical time window without fully eliminating it. Nor can I see how the above would have been an issue in the first place - that's precisely what vmx_do_resume() is taking care of (unless you've found a vmwrite that happens before this function runs, which would then get us back to the question why vmx_ctxt_switch_to() doesn't load the VMCS). Afaics the problem is on the switch-out and switch-same paths, but not on the switch-in one (or if there is one there, then it's yet another one, with a yet to be explained way of the VMCS being taken away underneath the vCPU's feet). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 14:29 ` Jan Beulich @ 2017-02-15 14:44 ` Jan Beulich 0 siblings, 0 replies; 27+ messages in thread From: Jan Beulich @ 2017-02-15 14:44 UTC (permalink / raw) To: Sergey Dyasli Cc: Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel >>> On 15.02.17 at 15:29, <JBeulich@suse.com> wrote: >>>> On 15.02.17 at 14:40, <sergey.dyasli@citrix.com> wrote: >> On Wed, 2017-02-15 at 06:03 -0700, Jan Beulich wrote: >>> > > > On 15.02.17 at 12:55, <JBeulich@suse.com> wrote: >>> > > > > On 15.02.17 at 12:48, <sergey.dyasli@citrix.com> wrote: >>> > > >>> > > On Wed, 2017-02-15 at 04:39 -0700, Jan Beulich wrote: >>> > > > > > > On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: >>> > > > > >>> > > > > (XEN) [ 1408.075638] Xen call trace: >>> > > > > (XEN) [ 1408.079322] [<ffff82d0801ea2a2>] vmx_vmcs_reload+0x32/0x50 >>> > > > > (XEN) [ 1408.086303] [<ffff82d08016c58d>] context_switch+0x85d/0xeb0 >>> > > > > (XEN) [ 1408.093380] [<ffff82d08012fb8e>] >> schedule.c#schedule+0x46e/0x7d0 >>> > > > > (XEN) [ 1408.100942] [<ffff82d080164305>] reprogram_timer+0x75/0xe0 >>> > > > > (XEN) [ 1408.107925] [<ffff82d080136400>] >>> > >>> > timer.c#timer_softirq_action+0x90/0x210 >>> > > > > (XEN) [ 1408.116263] [<ffff82d08013311c>] >> softirq.c#__do_softirq+0x5c/0x90 >>> > > > > (XEN) [ 1408.123921] [<ffff82d080167d35>] domain.c#idle_loop+0x25/0x60 >>> > > > >>> > > > Taking your later reply into account - were you able to figure out >>> > > > what other party held onto the VMCS being waited for here? >>> > > >>> > > Unfortunately, no. It was unclear from debug logs. But judging from >>> > > the following vmx_do_resume() code: >>> > > >>> > > if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() ) >>> > > { >>> > > if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) ) >>> > > vmx_load_vmcs(v); >>> > > } >>> > > >>> > > If both of the above conditions are true then vmx_vmcs_reload() will >>> > > probably hang. >>> > >>> > I don't follow (reload should run before this, not after), but I must >>> > be missing something more general anyway, as I'm seeing the code >>> > above being needed despite the reload additions. >>> >>> I think I've understood part of it over lunch: Surprisingly enough >>> vmx_ctxt_switch_to() doesn't re-establish the VMCS, so it needs >>> to be done here. Which I think means we don't need the new >>> hook at all, as that way the state is no different between going >>> through ->to() or bypassing it. >>> >>> What I continue to not understand is why vmcs_pa would ever >>> not match current_vmcs when active_cpu is smp_processor_id(). >>> So far I thought both are always updated together. Looking >>> further ... >> >> This is exactly what will happen should the 3.1 occur: >> >> 1. HVM vCPU#1 --> idle vCPU context_switch >> >> 2. softirq --> vmx_vmcs_enter() + vmx_vmcs_exit() for a remote vCPU >> [scenario with PML] >> This will switch current_vmcs to a remote one. >> has_hvm_container_vcpu(current) will be false and vmcs will not >> be reloaded. > > Oh, right - this updates v's active_cpu, but doesn't update the > field for the vCPU the VMCS is being taken away from. Which then also indicates a likely reason for the endless loop you saw with my patch - the spin loop (which I'm no longer convinced is needed at all) would need to be skipped if the current CPU is the active one: void vmx_vmcs_reload(struct vcpu *v) { /* * As we're running with interrupts disabled, we can't acquire * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled * the VMCS can't be taken away from us anymore if we still own it. */ ASSERT(!local_irq_is_enabled()); if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) ) return; ASSERT(!this_cpu(current_vmcs)); if ( v->arch.hvm_vmx.active_cpu != smp_processor_id() ) { /* * Wait for the remote side to be done with the VMCS before * loading it here. */ while ( v->arch.hvm_vmx.active_cpu != -1 ) cpu_relax(); } vmx_load_vmcs(v); } As to the spinning not being needed in the first place - the remote CPU using the VMCS won't unpause us before being done with the VMCS anyway. Thoughts? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 10:27 ` Sergey Dyasli 2017-02-15 11:00 ` Jan Beulich 2017-02-15 11:39 ` Jan Beulich @ 2017-02-15 13:20 ` Jan Beulich 2017-02-15 14:55 ` Sergey Dyasli 2 siblings, 1 reply; 27+ messages in thread From: Jan Beulich @ 2017-02-15 13:20 UTC (permalink / raw) To: Sergey Dyasli Cc: Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel >>> On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: > This is what I'm getting during the original test case (32 VMs reboot): > > (XEN) [ 1407.789329] Watchdog timer detects that CPU12 is stuck! > (XEN) [ 1407.795726] ----[ Xen-4.6.1-xs-local x86_64 debug=n Not tainted ]---- Hmm, this was with a non-debug build, so the ASSERT() in vmx_vmcs_reload() was a no-op, yet it would have been useful to know whether active_cpu was -1 when getting stuck here. Btw - there was no nested virt in the picture in your try, was there? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 13:20 ` Jan Beulich @ 2017-02-15 14:55 ` Sergey Dyasli 2017-02-15 15:15 ` Jan Beulich 0 siblings, 1 reply; 27+ messages in thread From: Sergey Dyasli @ 2017-02-15 14:55 UTC (permalink / raw) To: JBeulich Cc: Sergey Dyasli, Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel On Wed, 2017-02-15 at 06:20 -0700, Jan Beulich wrote: > > > > On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: > > > > This is what I'm getting during the original test case (32 VMs reboot): > > > > (XEN) [ 1407.789329] Watchdog timer detects that CPU12 is stuck! > > (XEN) [ 1407.795726] ----[ Xen-4.6.1-xs-local x86_64 debug=n Not tainted ]---- > > Hmm, this was with a non-debug build, so the ASSERT() in > vmx_vmcs_reload() was a no-op, yet it would have been useful > to know whether active_cpu was -1 when getting stuck here. > Btw - there was no nested virt in the picture in your try, was > there? No nested virt is involved in the test case. Is it worth giving your patch another try with removing ctxt_switch_same() since we figured out that vmx_do_resume() will reload vmcs either way? And I will also update vmx_vmcs_reload() from your last email. -- Thanks, Sergey _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 14:55 ` Sergey Dyasli @ 2017-02-15 15:15 ` Jan Beulich 2017-02-16 8:29 ` Sergey Dyasli 0 siblings, 1 reply; 27+ messages in thread From: Jan Beulich @ 2017-02-15 15:15 UTC (permalink / raw) To: Sergey Dyasli Cc: Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel >>> On 15.02.17 at 15:55, <sergey.dyasli@citrix.com> wrote: > On Wed, 2017-02-15 at 06:20 -0700, Jan Beulich wrote: >> > > > On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: >> > >> > This is what I'm getting during the original test case (32 VMs reboot): >> > >> > (XEN) [ 1407.789329] Watchdog timer detects that CPU12 is stuck! >> > (XEN) [ 1407.795726] ----[ Xen-4.6.1-xs-local x86_64 debug=n Not tainted ]---- >> >> Hmm, this was with a non-debug build, so the ASSERT() in >> vmx_vmcs_reload() was a no-op, yet it would have been useful >> to know whether active_cpu was -1 when getting stuck here. >> Btw - there was no nested virt in the picture in your try, was >> there? > > No nested virt is involved in the test case. > > Is it worth giving your patch another try with removing ctxt_switch_same() > since we figured out that vmx_do_resume() will reload vmcs either way? Yes, but that's the cosmetic part, whereras ... > And I will also update vmx_vmcs_reload() from your last email. ... this looks to be the actual bug fix. If you agree with my reasoning of removing the loop altogether, you may want to go with that version instead of adding the conditional. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-15 15:15 ` Jan Beulich @ 2017-02-16 8:29 ` Sergey Dyasli 2017-02-16 9:26 ` Jan Beulich 0 siblings, 1 reply; 27+ messages in thread From: Sergey Dyasli @ 2017-02-16 8:29 UTC (permalink / raw) To: JBeulich Cc: Sergey Dyasli, Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel On Wed, 2017-02-15 at 08:15 -0700, Jan Beulich wrote: > > > > On 15.02.17 at 15:55, <sergey.dyasli@citrix.com> wrote: > > > > On Wed, 2017-02-15 at 06:20 -0700, Jan Beulich wrote: > > > > > > On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote: > > > > > > > > This is what I'm getting during the original test case (32 VMs reboot): > > > > > > > > (XEN) [ 1407.789329] Watchdog timer detects that CPU12 is stuck! > > > > (XEN) [ 1407.795726] ----[ Xen-4.6.1-xs-local x86_64 debug=n Not tainted ]---- > > > > > > Hmm, this was with a non-debug build, so the ASSERT() in > > > vmx_vmcs_reload() was a no-op, yet it would have been useful > > > to know whether active_cpu was -1 when getting stuck here. > > > Btw - there was no nested virt in the picture in your try, was > > > there? > > > > No nested virt is involved in the test case. > > > > Is it worth giving your patch another try with removing ctxt_switch_same() > > since we figured out that vmx_do_resume() will reload vmcs either way? > > Yes, but that's the cosmetic part, whereras ... > > > And I will also update vmx_vmcs_reload() from your last email. > > ... this looks to be the actual bug fix. If you agree with my > reasoning of removing the loop altogether, you may want to go > with that version instead of adding the conditional. After extensive night testing, it can be safe to assume that below patch fixes the PML issue. I agree about removing the spinning since vmx_vmcs_enter/exit are synchronized with the scheduler by schedule_lock. But it costs nothing to check so I added a debug message to the loop. Needless to say, it was never printed. My patch for vmx_vmcs_exit() is obviously a half measure because it doesn't protect against VMCS clearing by an external IPI when current is idle. I'm not sure such situation is possible but there is nothing that prevents it. This clearly makes your approach superior and I think you need to submit v2 for proper review. diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 88db7ee..07e8527 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -551,6 +551,33 @@ static void vmx_load_vmcs(struct vcpu *v) local_irq_restore(flags); } +void vmx_vmcs_reload(struct vcpu *v) +{ + /* + * As we're running with interrupts disabled, we can't acquire + * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled + * the VMCS can't be taken away from us anymore if we still own it. + */ + ASSERT(!local_irq_is_enabled()); + if ( v->arch.hvm_vmx.vmcs == this_cpu(current_vmcs) ) + return; + ASSERT(!this_cpu(current_vmcs)); + + if ( v->arch.hvm_vmx.active_cpu != smp_processor_id() ) + { + /* + * Wait for the remote side to be done with the VMCS before loading + * it here. + */ + while ( v->arch.hvm_vmx.active_cpu != -1 ) { + printk("DS: v->arch.hvm_vmx.active_cpu == %d\n", + v->arch.hvm_vmx.active_cpu); + cpu_relax(); + } + } + vmx_load_vmcs(v); +} + int vmx_cpu_up_prepare(unsigned int cpu) { /* diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 8cafec2..ccf433f 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -734,6 +734,18 @@ static void vmx_ctxt_switch_from(struct vcpu *v) if ( unlikely(!this_cpu(vmxon)) ) return; + if ( !v->is_running ) + { + /* + * When this vCPU isn't marked as running anymore, a remote pCPU's + * attempt to pause us (from vmx_vmcs_enter()) won't have a reason + * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken + * away the VMCS from us. As we're running with interrupts disabled, + * we also can't call vmx_vmcs_enter(). + */ + vmx_vmcs_reload(v); + } + vmx_fpu_leave(v); vmx_save_guest_msrs(v); vmx_restore_host_msrs(); diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index 5974cce..2bf8829 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -157,6 +157,7 @@ void vmx_destroy_vmcs(struct vcpu *v); void vmx_vmcs_enter(struct vcpu *v); bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v); void vmx_vmcs_exit(struct vcpu *v); +void vmx_vmcs_reload(struct vcpu *v); #define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 #define CPU_BASED_USE_TSC_OFFSETING 0x00000008 -- Thanks, Sergey _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] VMX: fix VMCS race on context-switch paths 2017-02-16 8:29 ` Sergey Dyasli @ 2017-02-16 9:26 ` Jan Beulich 0 siblings, 0 replies; 27+ messages in thread From: Jan Beulich @ 2017-02-16 9:26 UTC (permalink / raw) To: Sergey Dyasli Cc: Kevin Tian, Kevin.Mayer, Andrew Cooper, Anshul Makkar, jun.nakajima, xen-devel >>> On 16.02.17 at 09:29, <sergey.dyasli@citrix.com> wrote: > On Wed, 2017-02-15 at 08:15 -0700, Jan Beulich wrote: >> > > > On 15.02.17 at 15:55, <sergey.dyasli@citrix.com> wrote: >> > Is it worth giving your patch another try with removing ctxt_switch_same() >> > since we figured out that vmx_do_resume() will reload vmcs either way? >> >> Yes, but that's the cosmetic part, whereras ... >> >> > And I will also update vmx_vmcs_reload() from your last email. >> >> ... this looks to be the actual bug fix. If you agree with my >> reasoning of removing the loop altogether, you may want to go >> with that version instead of adding the conditional. > > After extensive night testing, it can be safe to assume that below > patch fixes the PML issue. I agree about removing the spinning since > vmx_vmcs_enter/exit are synchronized with the scheduler by schedule_lock. > But it costs nothing to check so I added a debug message to the loop. > Needless to say, it was never printed. Thanks, that's good to know. I'll remove the loop in v2. > My patch for vmx_vmcs_exit() is obviously a half measure because it > doesn't protect against VMCS clearing by an external IPI when current > is idle. I'm not sure such situation is possible but there is nothing > that prevents it. > > This clearly makes your approach superior and I think you need to > submit v2 for proper review. Will do. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 2/2] x86: package up context switch hook pointers 2017-02-14 10:23 [PATCH 0/2] x86: context switch handling adjustments Jan Beulich 2017-02-14 10:28 ` [PATCH 1/2] VMX: fix VMCS race on context-switch paths Jan Beulich @ 2017-02-14 10:29 ` Jan Beulich 2017-02-14 15:26 ` Andrew Cooper 2017-02-14 22:18 ` Boris Ostrovsky 1 sibling, 2 replies; 27+ messages in thread From: Jan Beulich @ 2017-02-14 10:29 UTC (permalink / raw) To: xen-devel Cc: Boris Ostrovsky, Andrew Cooper, Kevin Tian, Jun Nakajima, Suravee Suthikulpanit [-- Attachment #1: Type: text/plain, Size: 6771 bytes --] They're all solely dependent on guest type, so we don't need to repeat all the same four pointers in every vCPU control structure. Instead use static const structures, and store pointers to them in the domain control structure. Since touching it anyway, take the opportunity and move schedule_tail() into the only C file needing it. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -426,16 +426,8 @@ int vcpu_initialise(struct vcpu *v) /* PV guests by default have a 100Hz ticker. */ v->periodic_period = MILLISECS(10); } - - v->arch.schedule_tail = continue_nonidle_domain; - v->arch.ctxt_switch_from = paravirt_ctxt_switch_from; - v->arch.ctxt_switch_to = paravirt_ctxt_switch_to; - - if ( is_idle_domain(d) ) - { - v->arch.schedule_tail = continue_idle_domain; - v->arch.cr3 = __pa(idle_pg_table); - } + else + v->arch.cr3 = __pa(idle_pg_table); v->arch.pv_vcpu.ctrlreg[4] = real_cr4_to_pv_guest_cr4(mmu_cr4_features); @@ -642,8 +634,23 @@ int arch_domain_create(struct domain *d, goto fail; } else + { + static const struct arch_csw pv_csw = { + .from = paravirt_ctxt_switch_from, + .to = paravirt_ctxt_switch_to, + .tail = continue_nonidle_domain, + }; + static const struct arch_csw idle_csw = { + .from = paravirt_ctxt_switch_from, + .to = paravirt_ctxt_switch_to, + .tail = continue_idle_domain, + }; + + d->arch.ctxt_switch = is_idle_domain(d) ? &idle_csw : &pv_csw; + /* 64-bit PV guest by default. */ d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0; + } /* initialize default tsc behavior in case tools don't */ tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0); @@ -1997,7 +2004,7 @@ static void __context_switch(void) { memcpy(&p->arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES); vcpu_save_fpu(p); - p->arch.ctxt_switch_from(p); + pd->arch.ctxt_switch->from(p); } /* @@ -2023,7 +2030,7 @@ static void __context_switch(void) set_msr_xss(n->arch.hvm_vcpu.msr_xss); } vcpu_restore_fpu_eager(n); - n->arch.ctxt_switch_to(n); + nd->arch.ctxt_switch->to(n); } psr_ctxt_switch_to(nd); @@ -2066,6 +2073,15 @@ static void __context_switch(void) per_cpu(curr_vcpu, cpu) = n; } +/* + * Schedule tail *should* be a terminal function pointer, but leave a bugframe + * around just incase it returns, to save going back into the context + * switching code and leaving a far more subtle crash to diagnose. + */ +#define schedule_tail(vcpu) do { \ + (((vcpu)->domain->arch.ctxt_switch->tail)(vcpu)); \ + BUG(); \ + } while (0) void context_switch(struct vcpu *prev, struct vcpu *next) { @@ -2100,8 +2116,8 @@ void context_switch(struct vcpu *prev, s if ( (per_cpu(curr_vcpu, cpu) == next) ) { - if ( next->arch.ctxt_switch_same ) - next->arch.ctxt_switch_same(next); + if ( nextd->arch.ctxt_switch->same ) + nextd->arch.ctxt_switch->same(next); local_irq_enable(); } else if ( is_idle_domain(nextd) && cpu_online(cpu) ) --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1144,6 +1144,14 @@ void svm_host_osvw_init() static int svm_domain_initialise(struct domain *d) { + static const struct arch_csw csw = { + .from = svm_ctxt_switch_from, + .to = svm_ctxt_switch_to, + .tail = svm_do_resume, + }; + + d->arch.ctxt_switch = &csw; + return 0; } @@ -1155,10 +1163,6 @@ static int svm_vcpu_initialise(struct vc { int rc; - v->arch.schedule_tail = svm_do_resume; - v->arch.ctxt_switch_from = svm_ctxt_switch_from; - v->arch.ctxt_switch_to = svm_ctxt_switch_to; - v->arch.hvm_svm.launch_core = -1; if ( (rc = svm_create_vmcb(v)) != 0 ) --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -268,8 +268,16 @@ void vmx_pi_hooks_deassign(struct domain static int vmx_domain_initialise(struct domain *d) { + static const struct arch_csw csw = { + .from = vmx_ctxt_switch_from, + .to = vmx_ctxt_switch_to, + .same = vmx_vmcs_reload, + .tail = vmx_do_resume, + }; int rc; + d->arch.ctxt_switch = &csw; + if ( !has_vlapic(d) ) return 0; @@ -295,11 +303,6 @@ static int vmx_vcpu_initialise(struct vc INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocking.list); - v->arch.schedule_tail = vmx_do_resume; - v->arch.ctxt_switch_from = vmx_ctxt_switch_from; - v->arch.ctxt_switch_to = vmx_ctxt_switch_to; - v->arch.ctxt_switch_same = vmx_vmcs_reload; - if ( (rc = vmx_create_vmcs(v)) != 0 ) { dprintk(XENLOG_WARNING, --- a/xen/include/asm-x86/current.h +++ b/xen/include/asm-x86/current.h @@ -103,16 +103,6 @@ unsigned long get_stack_dump_bottom (uns }) /* - * Schedule tail *should* be a terminal function pointer, but leave a bugframe - * around just incase it returns, to save going back into the context - * switching code and leaving a far more subtle crash to diagnose. - */ -#define schedule_tail(vcpu) do { \ - (((vcpu)->arch.schedule_tail)(vcpu)); \ - BUG(); \ - } while (0) - -/* * Which VCPU's state is currently running on each CPU? * This is not necesasrily the same as 'current' as a CPU may be * executing a lazy state switch. --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -314,6 +314,13 @@ struct arch_domain } relmem; struct page_list_head relmem_list; + const struct arch_csw { + void (*from)(struct vcpu *); + void (*to)(struct vcpu *); + void (*same)(struct vcpu *); + void (*tail)(struct vcpu *); + } *ctxt_switch; + /* nestedhvm: translate l2 guest physical to host physical */ struct p2m_domain *nested_p2m[MAX_NESTEDP2M]; mm_lock_t nested_p2m_lock; @@ -510,12 +517,6 @@ struct arch_vcpu unsigned long flags; /* TF_ */ - void (*schedule_tail) (struct vcpu *); - - void (*ctxt_switch_from) (struct vcpu *); - void (*ctxt_switch_to) (struct vcpu *); - void (*ctxt_switch_same) (struct vcpu *); - struct vpmu_struct vpmu; /* Virtual Machine Extensions */ [-- Attachment #2: x86-csw-func-package.patch --] [-- Type: text/plain, Size: 6815 bytes --] x86: package up context switch hook pointers They're all solely dependent on guest type, so we don't need to repeat all the same four pointers in every vCPU control structure. Instead use static const structures, and store pointers to them in the domain control structure. Since touching it anyway, take the opportunity and move schedule_tail() into the only C file needing it. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -426,16 +426,8 @@ int vcpu_initialise(struct vcpu *v) /* PV guests by default have a 100Hz ticker. */ v->periodic_period = MILLISECS(10); } - - v->arch.schedule_tail = continue_nonidle_domain; - v->arch.ctxt_switch_from = paravirt_ctxt_switch_from; - v->arch.ctxt_switch_to = paravirt_ctxt_switch_to; - - if ( is_idle_domain(d) ) - { - v->arch.schedule_tail = continue_idle_domain; - v->arch.cr3 = __pa(idle_pg_table); - } + else + v->arch.cr3 = __pa(idle_pg_table); v->arch.pv_vcpu.ctrlreg[4] = real_cr4_to_pv_guest_cr4(mmu_cr4_features); @@ -642,8 +634,23 @@ int arch_domain_create(struct domain *d, goto fail; } else + { + static const struct arch_csw pv_csw = { + .from = paravirt_ctxt_switch_from, + .to = paravirt_ctxt_switch_to, + .tail = continue_nonidle_domain, + }; + static const struct arch_csw idle_csw = { + .from = paravirt_ctxt_switch_from, + .to = paravirt_ctxt_switch_to, + .tail = continue_idle_domain, + }; + + d->arch.ctxt_switch = is_idle_domain(d) ? &idle_csw : &pv_csw; + /* 64-bit PV guest by default. */ d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0; + } /* initialize default tsc behavior in case tools don't */ tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0); @@ -1997,7 +2004,7 @@ static void __context_switch(void) { memcpy(&p->arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES); vcpu_save_fpu(p); - p->arch.ctxt_switch_from(p); + pd->arch.ctxt_switch->from(p); } /* @@ -2023,7 +2030,7 @@ static void __context_switch(void) set_msr_xss(n->arch.hvm_vcpu.msr_xss); } vcpu_restore_fpu_eager(n); - n->arch.ctxt_switch_to(n); + nd->arch.ctxt_switch->to(n); } psr_ctxt_switch_to(nd); @@ -2066,6 +2073,15 @@ static void __context_switch(void) per_cpu(curr_vcpu, cpu) = n; } +/* + * Schedule tail *should* be a terminal function pointer, but leave a bugframe + * around just incase it returns, to save going back into the context + * switching code and leaving a far more subtle crash to diagnose. + */ +#define schedule_tail(vcpu) do { \ + (((vcpu)->domain->arch.ctxt_switch->tail)(vcpu)); \ + BUG(); \ + } while (0) void context_switch(struct vcpu *prev, struct vcpu *next) { @@ -2100,8 +2116,8 @@ void context_switch(struct vcpu *prev, s if ( (per_cpu(curr_vcpu, cpu) == next) ) { - if ( next->arch.ctxt_switch_same ) - next->arch.ctxt_switch_same(next); + if ( nextd->arch.ctxt_switch->same ) + nextd->arch.ctxt_switch->same(next); local_irq_enable(); } else if ( is_idle_domain(nextd) && cpu_online(cpu) ) --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1144,6 +1144,14 @@ void svm_host_osvw_init() static int svm_domain_initialise(struct domain *d) { + static const struct arch_csw csw = { + .from = svm_ctxt_switch_from, + .to = svm_ctxt_switch_to, + .tail = svm_do_resume, + }; + + d->arch.ctxt_switch = &csw; + return 0; } @@ -1155,10 +1163,6 @@ static int svm_vcpu_initialise(struct vc { int rc; - v->arch.schedule_tail = svm_do_resume; - v->arch.ctxt_switch_from = svm_ctxt_switch_from; - v->arch.ctxt_switch_to = svm_ctxt_switch_to; - v->arch.hvm_svm.launch_core = -1; if ( (rc = svm_create_vmcb(v)) != 0 ) --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -268,8 +268,16 @@ void vmx_pi_hooks_deassign(struct domain static int vmx_domain_initialise(struct domain *d) { + static const struct arch_csw csw = { + .from = vmx_ctxt_switch_from, + .to = vmx_ctxt_switch_to, + .same = vmx_vmcs_reload, + .tail = vmx_do_resume, + }; int rc; + d->arch.ctxt_switch = &csw; + if ( !has_vlapic(d) ) return 0; @@ -295,11 +303,6 @@ static int vmx_vcpu_initialise(struct vc INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocking.list); - v->arch.schedule_tail = vmx_do_resume; - v->arch.ctxt_switch_from = vmx_ctxt_switch_from; - v->arch.ctxt_switch_to = vmx_ctxt_switch_to; - v->arch.ctxt_switch_same = vmx_vmcs_reload; - if ( (rc = vmx_create_vmcs(v)) != 0 ) { dprintk(XENLOG_WARNING, --- a/xen/include/asm-x86/current.h +++ b/xen/include/asm-x86/current.h @@ -103,16 +103,6 @@ unsigned long get_stack_dump_bottom (uns }) /* - * Schedule tail *should* be a terminal function pointer, but leave a bugframe - * around just incase it returns, to save going back into the context - * switching code and leaving a far more subtle crash to diagnose. - */ -#define schedule_tail(vcpu) do { \ - (((vcpu)->arch.schedule_tail)(vcpu)); \ - BUG(); \ - } while (0) - -/* * Which VCPU's state is currently running on each CPU? * This is not necesasrily the same as 'current' as a CPU may be * executing a lazy state switch. --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -314,6 +314,13 @@ struct arch_domain } relmem; struct page_list_head relmem_list; + const struct arch_csw { + void (*from)(struct vcpu *); + void (*to)(struct vcpu *); + void (*same)(struct vcpu *); + void (*tail)(struct vcpu *); + } *ctxt_switch; + /* nestedhvm: translate l2 guest physical to host physical */ struct p2m_domain *nested_p2m[MAX_NESTEDP2M]; mm_lock_t nested_p2m_lock; @@ -510,12 +517,6 @@ struct arch_vcpu unsigned long flags; /* TF_ */ - void (*schedule_tail) (struct vcpu *); - - void (*ctxt_switch_from) (struct vcpu *); - void (*ctxt_switch_to) (struct vcpu *); - void (*ctxt_switch_same) (struct vcpu *); - struct vpmu_struct vpmu; /* Virtual Machine Extensions */ [-- Attachment #3: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/2] x86: package up context switch hook pointers 2017-02-14 10:29 ` [PATCH 2/2] x86: package up context switch hook pointers Jan Beulich @ 2017-02-14 15:26 ` Andrew Cooper 2017-02-15 8:42 ` Jan Beulich 2017-02-14 22:18 ` Boris Ostrovsky 1 sibling, 1 reply; 27+ messages in thread From: Andrew Cooper @ 2017-02-14 15:26 UTC (permalink / raw) To: Jan Beulich, xen-devel Cc: Boris Ostrovsky, Kevin Tian, Jun Nakajima, Suravee Suthikulpanit On 14/02/17 10:29, Jan Beulich wrote: > They're all solely dependent on guest type, so we don't need to repeat > all the same four pointers in every vCPU control structure. Instead use > static const structures, and store pointers to them in the domain > control structure. > > Since touching it anyway, take the opportunity and move schedule_tail() > into the only C file needing it. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> +1. This has been a niggle on my todo list for ages. > @@ -2066,6 +2073,15 @@ static void __context_switch(void) > per_cpu(curr_vcpu, cpu) = n; > } > > +/* > + * Schedule tail *should* be a terminal function pointer, but leave a bugframe > + * around just incase it returns, to save going back into the context > + * switching code and leaving a far more subtle crash to diagnose. > + */ > +#define schedule_tail(vcpu) do { \ > + (((vcpu)->domain->arch.ctxt_switch->tail)(vcpu)); \ > + BUG(); \ > + } while (0) schedule_tail() is used only twice. I'd suggest dropping it entirely and calling the ->tail() function pointer normally, rather than hiding it this. Upon reviewing, this patch, don't we also need a ->same() call in the continue_same() path in the previous patch? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/2] x86: package up context switch hook pointers 2017-02-14 15:26 ` Andrew Cooper @ 2017-02-15 8:42 ` Jan Beulich 2017-02-15 11:34 ` Andrew Cooper 0 siblings, 1 reply; 27+ messages in thread From: Jan Beulich @ 2017-02-15 8:42 UTC (permalink / raw) To: Andrew Cooper Cc: Boris Ostrovsky, xen-devel, Kevin Tian, Jun Nakajima, Suravee Suthikulpanit >>> On 14.02.17 at 16:26, <andrew.cooper3@citrix.com> wrote: > On 14/02/17 10:29, Jan Beulich wrote: >> @@ -2066,6 +2073,15 @@ static void __context_switch(void) >> per_cpu(curr_vcpu, cpu) = n; >> } >> >> +/* >> + * Schedule tail *should* be a terminal function pointer, but leave a bugframe >> + * around just incase it returns, to save going back into the context >> + * switching code and leaving a far more subtle crash to diagnose. >> + */ >> +#define schedule_tail(vcpu) do { \ >> + (((vcpu)->domain->arch.ctxt_switch->tail)(vcpu)); \ >> + BUG(); \ >> + } while (0) > > schedule_tail() is used only twice. I'd suggest dropping it entirely > and calling the ->tail() function pointer normally, rather than hiding > it this. I had considered this too, and now that you ask for it I'll happily do so. > Upon reviewing, this patch, don't we also need a ->same() call in the > continue_same() path in the previous patch? No, I did specifically check this already: The call to continue_same() sits (far) ahead of the clearing of ->is_running, and as long as that flag is set, vcpu_pause() (or vcpu_sleep_sync(), to be precise) will spin as necessary. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/2] x86: package up context switch hook pointers 2017-02-15 8:42 ` Jan Beulich @ 2017-02-15 11:34 ` Andrew Cooper 2017-02-15 11:40 ` Jan Beulich 0 siblings, 1 reply; 27+ messages in thread From: Andrew Cooper @ 2017-02-15 11:34 UTC (permalink / raw) To: Jan Beulich Cc: Boris Ostrovsky, xen-devel, Kevin Tian, Jun Nakajima, Suravee Suthikulpanit On 15/02/17 08:42, Jan Beulich wrote: >>>> On 14.02.17 at 16:26, <andrew.cooper3@citrix.com> wrote: >> On 14/02/17 10:29, Jan Beulich wrote: >>> @@ -2066,6 +2073,15 @@ static void __context_switch(void) >>> per_cpu(curr_vcpu, cpu) = n; >>> } >>> >>> +/* >>> + * Schedule tail *should* be a terminal function pointer, but leave a bugframe >>> + * around just incase it returns, to save going back into the context >>> + * switching code and leaving a far more subtle crash to diagnose. >>> + */ >>> +#define schedule_tail(vcpu) do { \ >>> + (((vcpu)->domain->arch.ctxt_switch->tail)(vcpu)); \ >>> + BUG(); \ >>> + } while (0) >> schedule_tail() is used only twice. I'd suggest dropping it entirely >> and calling the ->tail() function pointer normally, rather than hiding >> it this. > I had considered this too, and now that you ask for it I'll happily > do so. Thinking more, it would be a good idea to annotate the respective functions noreturn, so the compiler will catch anyone who accidently puts a return statement in. > >> Upon reviewing, this patch, don't we also need a ->same() call in the >> continue_same() path in the previous patch? > No, I did specifically check this already: The call to continue_same() > sits (far) ahead of the clearing of ->is_running, and as long as that > flag is set, vcpu_pause() (or vcpu_sleep_sync(), to be precise) will > spin as necessary. Ok. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/2] x86: package up context switch hook pointers 2017-02-15 11:34 ` Andrew Cooper @ 2017-02-15 11:40 ` Jan Beulich 0 siblings, 0 replies; 27+ messages in thread From: Jan Beulich @ 2017-02-15 11:40 UTC (permalink / raw) To: Andrew Cooper Cc: Boris Ostrovsky, xen-devel, Kevin Tian, Jun Nakajima, Suravee Suthikulpanit >>> On 15.02.17 at 12:34, <andrew.cooper3@citrix.com> wrote: > On 15/02/17 08:42, Jan Beulich wrote: >>>>> On 14.02.17 at 16:26, <andrew.cooper3@citrix.com> wrote: >>> On 14/02/17 10:29, Jan Beulich wrote: >>>> @@ -2066,6 +2073,15 @@ static void __context_switch(void) >>>> per_cpu(curr_vcpu, cpu) = n; >>>> } >>>> >>>> +/* >>>> + * Schedule tail *should* be a terminal function pointer, but leave a > bugframe >>>> + * around just incase it returns, to save going back into the context >>>> + * switching code and leaving a far more subtle crash to diagnose. >>>> + */ >>>> +#define schedule_tail(vcpu) do { \ >>>> + (((vcpu)->domain->arch.ctxt_switch->tail)(vcpu)); \ >>>> + BUG(); \ >>>> + } while (0) >>> schedule_tail() is used only twice. I'd suggest dropping it entirely >>> and calling the ->tail() function pointer normally, rather than hiding >>> it this. >> I had considered this too, and now that you ask for it I'll happily >> do so. > > Thinking more, it would be a good idea to annotate the respective > functions noreturn, so the compiler will catch anyone who accidently > puts a return statement in. Right, but in another patch I would say. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/2] x86: package up context switch hook pointers 2017-02-14 10:29 ` [PATCH 2/2] x86: package up context switch hook pointers Jan Beulich 2017-02-14 15:26 ` Andrew Cooper @ 2017-02-14 22:18 ` Boris Ostrovsky 1 sibling, 0 replies; 27+ messages in thread From: Boris Ostrovsky @ 2017-02-14 22:18 UTC (permalink / raw) To: Jan Beulich, xen-devel Cc: Andrew Cooper, Kevin Tian, Jun Nakajima, Suravee Suthikulpanit On 02/14/2017 05:29 AM, Jan Beulich wrote: > They're all solely dependent on guest type, so we don't need to repeat > all the same four pointers in every vCPU control structure. Instead use > static const structures, and store pointers to them in the domain > control structure. > > Since touching it anyway, take the opportunity and move schedule_tail() > into the only C file needing it. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2017-02-16 9:26 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-02-14 10:23 [PATCH 0/2] x86: context switch handling adjustments Jan Beulich 2017-02-14 10:28 ` [PATCH 1/2] VMX: fix VMCS race on context-switch paths Jan Beulich 2017-02-14 15:16 ` Andrew Cooper 2017-02-15 8:37 ` Jan Beulich 2017-02-15 11:29 ` Andrew Cooper 2017-02-15 10:27 ` Sergey Dyasli 2017-02-15 11:00 ` Jan Beulich 2017-02-15 11:13 ` Sergey Dyasli 2017-02-15 11:24 ` Jan Beulich 2017-02-15 11:39 ` Jan Beulich 2017-02-15 11:48 ` Sergey Dyasli 2017-02-15 11:55 ` Jan Beulich 2017-02-15 13:03 ` Jan Beulich 2017-02-15 13:40 ` Sergey Dyasli 2017-02-15 14:29 ` Jan Beulich 2017-02-15 14:44 ` Jan Beulich 2017-02-15 13:20 ` Jan Beulich 2017-02-15 14:55 ` Sergey Dyasli 2017-02-15 15:15 ` Jan Beulich 2017-02-16 8:29 ` Sergey Dyasli 2017-02-16 9:26 ` Jan Beulich 2017-02-14 10:29 ` [PATCH 2/2] x86: package up context switch hook pointers Jan Beulich 2017-02-14 15:26 ` Andrew Cooper 2017-02-15 8:42 ` Jan Beulich 2017-02-15 11:34 ` Andrew Cooper 2017-02-15 11:40 ` Jan Beulich 2017-02-14 22:18 ` Boris Ostrovsky
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.