[PATCH v2 0/2] x86: context switch handling adjustments

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] x86: context switch handling adjustments
@ 2017-02-16 11:10 Jan Beulich
  2017-02-16 11:15 ` [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths Jan Beulich
  2017-02-16 11:16 ` [PATCH v2 2/2] x86: package up context switch hook pointers Jan Beulich
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Beulich @ 2017-02-16 11:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Kevin Tian, Jun Nakajima

1: VMX: fix VMCS race on context-switch paths
2: x86: package up context switch hook pointers

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Several changes to patch 1 (see there) requiring adjustment to
   patch 2 (again, see there).


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-02-16 11:10 [PATCH v2 0/2] x86: context switch handling adjustments Jan Beulich
@ 2017-02-16 11:15 ` Jan Beulich
  2017-02-16 12:27   ` Andrew Cooper
                     ` (2 more replies)
  2017-02-16 11:16 ` [PATCH v2 2/2] x86: package up context switch hook pointers Jan Beulich
  1 sibling, 3 replies; 30+ messages in thread
From: Jan Beulich @ 2017-02-16 11:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Anshul Makkar, Kevin Tian, Jun Nakajima, Sergey Dyasli

[-- Attachment #1: Type: text/plain, Size: 3509 bytes --]

When __context_switch() is being bypassed during original context
switch handling, the vCPU "owning" the VMCS partially loses control of
it: It will appear non-running to remote CPUs, and hence their attempt
to pause the owning vCPU will have no effect on it (as it already
looks to be paused). At the same time the "owning" CPU will re-enable
interrupts eventually (the lastest when entering the idle loop) and
hence becomes subject to IPIs from other CPUs requesting access to the
VMCS. As a result, when __context_switch() finally gets run, the CPU
may no longer have the VMCS loaded, and hence any accesses to it would
fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().

Similarly, when __context_switch() is being bypassed also on the second
(switch-in) path, VMCS ownership may have been lost and hence needs
re-establishing. Since there's no existing hook to put this in, add a
new one.

Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de>
Reported-by: Anshul Makkar <anshul.makkar@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Drop the spin loop from vmx_vmc_reload(). Use the function in
    vmx_do_resume() instead of open coding it there (requiring the
    ASSERT()s to be adjusted/dropped). Drop the new
    ->ctxt_switch_same() hook.

--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -552,6 +552,20 @@ static void vmx_load_vmcs(struct vcpu *v
     local_irq_restore(flags);
 }
 
+void vmx_vmcs_reload(struct vcpu *v)
+{
+    /*
+     * As we may be running with interrupts disabled, we can't acquire
+     * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
+     * the VMCS can't be taken away from us anymore if we still own it.
+     */
+    ASSERT(v->is_running || !local_irq_is_enabled());
+    if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) )
+        return;
+
+    vmx_load_vmcs(v);
+}
+
 int vmx_cpu_up_prepare(unsigned int cpu)
 {
     /*
@@ -1678,10 +1692,7 @@ void vmx_do_resume(struct vcpu *v)
     bool_t debug_state;
 
     if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
-    {
-        if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) )
-            vmx_load_vmcs(v);
-    }
+        vmx_vmcs_reload(v);
     else
     {
         /*
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -936,6 +937,18 @@ static void vmx_ctxt_switch_from(struct
     if ( unlikely(!this_cpu(vmxon)) )
         return;
 
+    if ( !v->is_running )
+    {
+        /*
+         * When this vCPU isn't marked as running anymore, a remote pCPU's
+         * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
+         * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken
+         * away the VMCS from us. As we're running with interrupts disabled,
+         * we also can't call vmx_vmcs_enter().
+         */
+        vmx_vmcs_reload(v);
+    }
+
     vmx_fpu_leave(v);
     vmx_save_guest_msrs(v);
     vmx_restore_host_msrs();
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -174,6 +174,7 @@ void vmx_destroy_vmcs(struct vcpu *v);
 void vmx_vmcs_enter(struct vcpu *v);
 bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
 void vmx_vmcs_exit(struct vcpu *v);
+void vmx_vmcs_reload(struct vcpu *v);
 
 #define CPU_BASED_VIRTUAL_INTR_PENDING        0x00000004
 #define CPU_BASED_USE_TSC_OFFSETING           0x00000008




[-- Attachment #2: VMX-enter-VMCS-race.patch --]
[-- Type: text/plain, Size: 3549 bytes --]

VMX: fix VMCS race on context-switch paths

When __context_switch() is being bypassed during original context
switch handling, the vCPU "owning" the VMCS partially loses control of
it: It will appear non-running to remote CPUs, and hence their attempt
to pause the owning vCPU will have no effect on it (as it already
looks to be paused). At the same time the "owning" CPU will re-enable
interrupts eventually (the lastest when entering the idle loop) and
hence becomes subject to IPIs from other CPUs requesting access to the
VMCS. As a result, when __context_switch() finally gets run, the CPU
may no longer have the VMCS loaded, and hence any accesses to it would
fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().

Similarly, when __context_switch() is being bypassed also on the second
(switch-in) path, VMCS ownership may have been lost and hence needs
re-establishing. Since there's no existing hook to put this in, add a
new one.

Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de>
Reported-by: Anshul Makkar <anshul.makkar@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Drop the spin loop from vmx_vmc_reload(). Use the function in
    vmx_do_resume() instead of open coding it there (requiring the
    ASSERT()s to be adjusted/dropped). Drop the new
    ->ctxt_switch_same() hook.

--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -552,6 +552,20 @@ static void vmx_load_vmcs(struct vcpu *v
     local_irq_restore(flags);
 }
 
+void vmx_vmcs_reload(struct vcpu *v)
+{
+    /*
+     * As we may be running with interrupts disabled, we can't acquire
+     * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
+     * the VMCS can't be taken away from us anymore if we still own it.
+     */
+    ASSERT(v->is_running || !local_irq_is_enabled());
+    if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) )
+        return;
+
+    vmx_load_vmcs(v);
+}
+
 int vmx_cpu_up_prepare(unsigned int cpu)
 {
     /*
@@ -1678,10 +1692,7 @@ void vmx_do_resume(struct vcpu *v)
     bool_t debug_state;
 
     if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
-    {
-        if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) )
-            vmx_load_vmcs(v);
-    }
+        vmx_vmcs_reload(v);
     else
     {
         /*
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -936,6 +937,18 @@ static void vmx_ctxt_switch_from(struct
     if ( unlikely(!this_cpu(vmxon)) )
         return;
 
+    if ( !v->is_running )
+    {
+        /*
+         * When this vCPU isn't marked as running anymore, a remote pCPU's
+         * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
+         * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken
+         * away the VMCS from us. As we're running with interrupts disabled,
+         * we also can't call vmx_vmcs_enter().
+         */
+        vmx_vmcs_reload(v);
+    }
+
     vmx_fpu_leave(v);
     vmx_save_guest_msrs(v);
     vmx_restore_host_msrs();
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -174,6 +174,7 @@ void vmx_destroy_vmcs(struct vcpu *v);
 void vmx_vmcs_enter(struct vcpu *v);
 bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
 void vmx_vmcs_exit(struct vcpu *v);
+void vmx_vmcs_reload(struct vcpu *v);
 
 #define CPU_BASED_VIRTUAL_INTR_PENDING        0x00000004
 #define CPU_BASED_USE_TSC_OFFSETING           0x00000008

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 2/2] x86: package up context switch hook pointers
  2017-02-16 11:10 [PATCH v2 0/2] x86: context switch handling adjustments Jan Beulich
  2017-02-16 11:15 ` [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths Jan Beulich
@ 2017-02-16 11:16 ` Jan Beulich
  2017-02-16 11:23   ` Andrew Cooper
  2017-02-17  3:49   ` Tian, Kevin
  1 sibling, 2 replies; 30+ messages in thread
From: Jan Beulich @ 2017-02-16 11:16 UTC (permalink / raw)
  To: xen-devel
  Cc: Boris Ostrovsky, Andrew Cooper, Kevin Tian, Jun Nakajima,
	Suravee Suthikulpanit

[-- Attachment #1: Type: text/plain, Size: 6517 bytes --]

They're all solely dependent on guest type, so we don't need to repeat
all the same three pointers in every vCPU control structure. Instead use
static const structures, and store pointers to them in the domain
control structure.

Since touching it anyway, take the opportunity and expand
schedule_tail() in the only two places invoking it, allowing the macro
to be dropped.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
v2: Drop the schedule_tail() macro altogether. Re-base after dropped
    ->ctxt_switch_same() hook.

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -426,16 +426,8 @@ int vcpu_initialise(struct vcpu *v)
         /* PV guests by default have a 100Hz ticker. */
         v->periodic_period = MILLISECS(10);
     }
-
-    v->arch.schedule_tail = continue_nonidle_domain;
-    v->arch.ctxt_switch_from = paravirt_ctxt_switch_from;
-    v->arch.ctxt_switch_to   = paravirt_ctxt_switch_to;
-
-    if ( is_idle_domain(d) )
-    {
-        v->arch.schedule_tail = continue_idle_domain;
-        v->arch.cr3           = __pa(idle_pg_table);
-    }
+    else
+        v->arch.cr3 = __pa(idle_pg_table);
 
     v->arch.pv_vcpu.ctrlreg[4] = real_cr4_to_pv_guest_cr4(mmu_cr4_features);
 
@@ -642,8 +634,23 @@ int arch_domain_create(struct domain *d,
             goto fail;
     }
     else
+    {
+        static const struct arch_csw pv_csw = {
+            .from = paravirt_ctxt_switch_from,
+            .to   = paravirt_ctxt_switch_to,
+            .tail = continue_nonidle_domain,
+        };
+        static const struct arch_csw idle_csw = {
+            .from = paravirt_ctxt_switch_from,
+            .to   = paravirt_ctxt_switch_to,
+            .tail = continue_idle_domain,
+        };
+
+        d->arch.ctxt_switch = is_idle_domain(d) ? &idle_csw : &pv_csw;
+
         /* 64-bit PV guest by default. */
         d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
+    }
 
     /* initialize default tsc behavior in case tools don't */
     tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0);
@@ -1997,7 +2004,7 @@ static void __context_switch(void)
     {
         memcpy(&p->arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
         vcpu_save_fpu(p);
-        p->arch.ctxt_switch_from(p);
+        pd->arch.ctxt_switch->from(p);
     }
 
     /*
@@ -2023,7 +2030,7 @@ static void __context_switch(void)
                 set_msr_xss(n->arch.hvm_vcpu.msr_xss);
         }
         vcpu_restore_fpu_eager(n);
-        n->arch.ctxt_switch_to(n);
+        nd->arch.ctxt_switch->to(n);
     }
 
     psr_ctxt_switch_to(nd);
@@ -2142,12 +2149,20 @@ void context_switch(struct vcpu *prev, s
     /* Ensure that the vcpu has an up-to-date time base. */
     update_vcpu_system_time(next);
 
-    schedule_tail(next);
+    /*
+     * Schedule tail *should* be a terminal function pointer, but leave a
+     * bug frame around just in case it returns, to save going back into the
+     * context switching code and leaving a far more subtle crash to diagnose.
+     */
+    nextd->arch.ctxt_switch->tail(next);
+    BUG();
 }
 
 void continue_running(struct vcpu *same)
 {
-    schedule_tail(same);
+    /* See the comment above. */
+    same->domain->arch.ctxt_switch->tail(same);
+    BUG();
 }
 
 int __sync_local_execstate(void)
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1144,6 +1144,14 @@ void svm_host_osvw_init()
 
 static int svm_domain_initialise(struct domain *d)
 {
+    static const struct arch_csw csw = {
+        .from = svm_ctxt_switch_from,
+        .to   = svm_ctxt_switch_to,
+        .tail = svm_do_resume,
+    };
+
+    d->arch.ctxt_switch = &csw;
+
     return 0;
 }
 
@@ -1155,10 +1163,6 @@ static int svm_vcpu_initialise(struct vc
 {
     int rc;
 
-    v->arch.schedule_tail    = svm_do_resume;
-    v->arch.ctxt_switch_from = svm_ctxt_switch_from;
-    v->arch.ctxt_switch_to   = svm_ctxt_switch_to;
-
     v->arch.hvm_svm.launch_core = -1;
 
     if ( (rc = svm_create_vmcb(v)) != 0 )
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -268,8 +268,15 @@ void vmx_pi_hooks_deassign(struct domain
 
 static int vmx_domain_initialise(struct domain *d)
 {
+    static const struct arch_csw csw = {
+        .from = vmx_ctxt_switch_from,
+        .to   = vmx_ctxt_switch_to,
+        .tail = vmx_do_resume,
+    };
     int rc;
 
+    d->arch.ctxt_switch = &csw;
+
     if ( !has_vlapic(d) )
         return 0;
 
@@ -295,10 +302,6 @@ static int vmx_vcpu_initialise(struct vc
 
     INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocking.list);
 
-    v->arch.schedule_tail    = vmx_do_resume;
-    v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
-    v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
-
     if ( (rc = vmx_create_vmcs(v)) != 0 )
     {
         dprintk(XENLOG_WARNING,
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -103,16 +103,6 @@ unsigned long get_stack_dump_bottom (uns
     })
 
 /*
- * Schedule tail *should* be a terminal function pointer, but leave a bugframe
- * around just incase it returns, to save going back into the context
- * switching code and leaving a far more subtle crash to diagnose.
- */
-#define schedule_tail(vcpu) do {                \
-        (((vcpu)->arch.schedule_tail)(vcpu));   \
-        BUG();                                  \
-    } while (0)
-
-/*
  * Which VCPU's state is currently running on each CPU?
  * This is not necesasrily the same as 'current' as a CPU may be
  * executing a lazy state switch.
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -314,6 +314,12 @@ struct arch_domain
     } relmem;
     struct page_list_head relmem_list;
 
+    const struct arch_csw {
+        void (*from)(struct vcpu *);
+        void (*to)(struct vcpu *);
+        void (*tail)(struct vcpu *);
+    } *ctxt_switch;
+
     /* nestedhvm: translate l2 guest physical to host physical */
     struct p2m_domain *nested_p2m[MAX_NESTEDP2M];
     mm_lock_t nested_p2m_lock;
@@ -510,11 +516,6 @@ struct arch_vcpu
 
     unsigned long      flags; /* TF_ */
 
-    void (*schedule_tail) (struct vcpu *);
-
-    void (*ctxt_switch_from) (struct vcpu *);
-    void (*ctxt_switch_to) (struct vcpu *);
-
     struct vpmu_struct vpmu;
 
     /* Virtual Machine Extensions */



[-- Attachment #2: x86-csw-func-package.patch --]
[-- Type: text/plain, Size: 6561 bytes --]

x86: package up context switch hook pointers

They're all solely dependent on guest type, so we don't need to repeat
all the same three pointers in every vCPU control structure. Instead use
static const structures, and store pointers to them in the domain
control structure.

Since touching it anyway, take the opportunity and expand
schedule_tail() in the only two places invoking it, allowing the macro
to be dropped.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
v2: Drop the schedule_tail() macro altogether. Re-base after dropped
    ->ctxt_switch_same() hook.

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -426,16 +426,8 @@ int vcpu_initialise(struct vcpu *v)
         /* PV guests by default have a 100Hz ticker. */
         v->periodic_period = MILLISECS(10);
     }
-
-    v->arch.schedule_tail = continue_nonidle_domain;
-    v->arch.ctxt_switch_from = paravirt_ctxt_switch_from;
-    v->arch.ctxt_switch_to   = paravirt_ctxt_switch_to;
-
-    if ( is_idle_domain(d) )
-    {
-        v->arch.schedule_tail = continue_idle_domain;
-        v->arch.cr3           = __pa(idle_pg_table);
-    }
+    else
+        v->arch.cr3 = __pa(idle_pg_table);
 
     v->arch.pv_vcpu.ctrlreg[4] = real_cr4_to_pv_guest_cr4(mmu_cr4_features);
 
@@ -642,8 +634,23 @@ int arch_domain_create(struct domain *d,
             goto fail;
     }
     else
+    {
+        static const struct arch_csw pv_csw = {
+            .from = paravirt_ctxt_switch_from,
+            .to   = paravirt_ctxt_switch_to,
+            .tail = continue_nonidle_domain,
+        };
+        static const struct arch_csw idle_csw = {
+            .from = paravirt_ctxt_switch_from,
+            .to   = paravirt_ctxt_switch_to,
+            .tail = continue_idle_domain,
+        };
+
+        d->arch.ctxt_switch = is_idle_domain(d) ? &idle_csw : &pv_csw;
+
         /* 64-bit PV guest by default. */
         d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
+    }
 
     /* initialize default tsc behavior in case tools don't */
     tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0);
@@ -1997,7 +2004,7 @@ static void __context_switch(void)
     {
         memcpy(&p->arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
         vcpu_save_fpu(p);
-        p->arch.ctxt_switch_from(p);
+        pd->arch.ctxt_switch->from(p);
     }
 
     /*
@@ -2023,7 +2030,7 @@ static void __context_switch(void)
                 set_msr_xss(n->arch.hvm_vcpu.msr_xss);
         }
         vcpu_restore_fpu_eager(n);
-        n->arch.ctxt_switch_to(n);
+        nd->arch.ctxt_switch->to(n);
     }
 
     psr_ctxt_switch_to(nd);
@@ -2142,12 +2149,20 @@ void context_switch(struct vcpu *prev, s
     /* Ensure that the vcpu has an up-to-date time base. */
     update_vcpu_system_time(next);
 
-    schedule_tail(next);
+    /*
+     * Schedule tail *should* be a terminal function pointer, but leave a
+     * bug frame around just in case it returns, to save going back into the
+     * context switching code and leaving a far more subtle crash to diagnose.
+     */
+    nextd->arch.ctxt_switch->tail(next);
+    BUG();
 }
 
 void continue_running(struct vcpu *same)
 {
-    schedule_tail(same);
+    /* See the comment above. */
+    same->domain->arch.ctxt_switch->tail(same);
+    BUG();
 }
 
 int __sync_local_execstate(void)
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1144,6 +1144,14 @@ void svm_host_osvw_init()
 
 static int svm_domain_initialise(struct domain *d)
 {
+    static const struct arch_csw csw = {
+        .from = svm_ctxt_switch_from,
+        .to   = svm_ctxt_switch_to,
+        .tail = svm_do_resume,
+    };
+
+    d->arch.ctxt_switch = &csw;
+
     return 0;
 }
 
@@ -1155,10 +1163,6 @@ static int svm_vcpu_initialise(struct vc
 {
     int rc;
 
-    v->arch.schedule_tail    = svm_do_resume;
-    v->arch.ctxt_switch_from = svm_ctxt_switch_from;
-    v->arch.ctxt_switch_to   = svm_ctxt_switch_to;
-
     v->arch.hvm_svm.launch_core = -1;
 
     if ( (rc = svm_create_vmcb(v)) != 0 )
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -268,8 +268,15 @@ void vmx_pi_hooks_deassign(struct domain
 
 static int vmx_domain_initialise(struct domain *d)
 {
+    static const struct arch_csw csw = {
+        .from = vmx_ctxt_switch_from,
+        .to   = vmx_ctxt_switch_to,
+        .tail = vmx_do_resume,
+    };
     int rc;
 
+    d->arch.ctxt_switch = &csw;
+
     if ( !has_vlapic(d) )
         return 0;
 
@@ -295,10 +302,6 @@ static int vmx_vcpu_initialise(struct vc
 
     INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocking.list);
 
-    v->arch.schedule_tail    = vmx_do_resume;
-    v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
-    v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
-
     if ( (rc = vmx_create_vmcs(v)) != 0 )
     {
         dprintk(XENLOG_WARNING,
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -103,16 +103,6 @@ unsigned long get_stack_dump_bottom (uns
     })
 
 /*
- * Schedule tail *should* be a terminal function pointer, but leave a bugframe
- * around just incase it returns, to save going back into the context
- * switching code and leaving a far more subtle crash to diagnose.
- */
-#define schedule_tail(vcpu) do {                \
-        (((vcpu)->arch.schedule_tail)(vcpu));   \
-        BUG();                                  \
-    } while (0)
-
-/*
  * Which VCPU's state is currently running on each CPU?
  * This is not necesasrily the same as 'current' as a CPU may be
  * executing a lazy state switch.
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -314,6 +314,12 @@ struct arch_domain
     } relmem;
     struct page_list_head relmem_list;
 
+    const struct arch_csw {
+        void (*from)(struct vcpu *);
+        void (*to)(struct vcpu *);
+        void (*tail)(struct vcpu *);
+    } *ctxt_switch;
+
     /* nestedhvm: translate l2 guest physical to host physical */
     struct p2m_domain *nested_p2m[MAX_NESTEDP2M];
     mm_lock_t nested_p2m_lock;
@@ -510,11 +516,6 @@ struct arch_vcpu
 
     unsigned long      flags; /* TF_ */
 
-    void (*schedule_tail) (struct vcpu *);
-
-    void (*ctxt_switch_from) (struct vcpu *);
-    void (*ctxt_switch_to) (struct vcpu *);
-
     struct vpmu_struct vpmu;
 
     /* Virtual Machine Extensions */

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/2] x86: package up context switch hook pointers
  2017-02-16 11:16 ` [PATCH v2 2/2] x86: package up context switch hook pointers Jan Beulich
@ 2017-02-16 11:23   ` Andrew Cooper
  2017-02-17  3:49   ` Tian, Kevin
  1 sibling, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2017-02-16 11:23 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Boris Ostrovsky, Kevin Tian, Jun Nakajima, Suravee Suthikulpanit

On 16/02/17 11:16, Jan Beulich wrote:
> They're all solely dependent on guest type, so we don't need to repeat
> all the same three pointers in every vCPU control structure. Instead use
> static const structures, and store pointers to them in the domain
> control structure.
>
> Since touching it anyway, take the opportunity and expand
> schedule_tail() in the only two places invoking it, allowing the macro
> to be dropped.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-02-16 11:15 ` [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths Jan Beulich
@ 2017-02-16 12:27   ` Andrew Cooper
  2017-02-16 12:35     ` Jan Beulich
  2017-02-17  8:40   ` Sergey Dyasli
  2017-10-27 17:42   ` Igor Druzhinin
  2 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2017-02-16 12:27 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Sergey Dyasli, Anshul Makkar, Kevin Tian, Jun Nakajima, Kevin Mayer

On 16/02/17 11:15, Jan Beulich wrote:
> When __context_switch() is being bypassed during original context
> switch handling, the vCPU "owning" the VMCS partially loses control of
> it: It will appear non-running to remote CPUs, and hence their attempt
> to pause the owning vCPU will have no effect on it (as it already
> looks to be paused). At the same time the "owning" CPU will re-enable
> interrupts eventually (the lastest when entering the idle loop) and
> hence becomes subject to IPIs from other CPUs requesting access to the
> VMCS. As a result, when __context_switch() finally gets run, the CPU
> may no longer have the VMCS loaded, and hence any accesses to it would
> fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
>
> Similarly, when __context_switch() is being bypassed also on the second
> (switch-in) path, VMCS ownership may have been lost and hence needs
> re-establishing. Since there's no existing hook to put this in, add a
> new one.
>
> Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de>
> Reported-by: Anshul Makkar <anshul.makkar@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Although I would certainly prefer if we can get another round of testing
on this series for confidence.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-02-16 12:27   ` Andrew Cooper
@ 2017-02-16 12:35     ` Jan Beulich
  2017-02-17  3:48       ` Tian, Kevin
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-02-16 12:35 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Sergey Dyasli, Kevin Tian, Kevin Mayer, Anshul Makkar,
	Jun Nakajima, xen-devel

>>> On 16.02.17 at 13:27, <andrew.cooper3@citrix.com> wrote:
> On 16/02/17 11:15, Jan Beulich wrote:
>> When __context_switch() is being bypassed during original context
>> switch handling, the vCPU "owning" the VMCS partially loses control of
>> it: It will appear non-running to remote CPUs, and hence their attempt
>> to pause the owning vCPU will have no effect on it (as it already
>> looks to be paused). At the same time the "owning" CPU will re-enable
>> interrupts eventually (the lastest when entering the idle loop) and
>> hence becomes subject to IPIs from other CPUs requesting access to the
>> VMCS. As a result, when __context_switch() finally gets run, the CPU
>> may no longer have the VMCS loaded, and hence any accesses to it would
>> fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
>>
>> Similarly, when __context_switch() is being bypassed also on the second
>> (switch-in) path, VMCS ownership may have been lost and hence needs
>> re-establishing. Since there's no existing hook to put this in, add a
>> new one.
>>
>> Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de>
>> Reported-by: Anshul Makkar <anshul.makkar@citrix.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Although I would certainly prefer if we can get another round of testing
> on this series for confidence.

Sure, I'd certainly like to stick a Tested-by on it. Plus VMX maintainer
feedback will need waiting for anyway.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-02-16 12:35     ` Jan Beulich
@ 2017-02-17  3:48       ` Tian, Kevin
  0 siblings, 0 replies; 30+ messages in thread
From: Tian, Kevin @ 2017-02-17  3:48 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: Sergey Dyasli, Anshul Makkar, xen-devel, Nakajima, Jun, Kevin Mayer

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, February 16, 2017 8:36 PM
> 
> >>> On 16.02.17 at 13:27, <andrew.cooper3@citrix.com> wrote:
> > On 16/02/17 11:15, Jan Beulich wrote:
> >> When __context_switch() is being bypassed during original context
> >> switch handling, the vCPU "owning" the VMCS partially loses control of
> >> it: It will appear non-running to remote CPUs, and hence their attempt
> >> to pause the owning vCPU will have no effect on it (as it already
> >> looks to be paused). At the same time the "owning" CPU will re-enable
> >> interrupts eventually (the lastest when entering the idle loop) and
> >> hence becomes subject to IPIs from other CPUs requesting access to the
> >> VMCS. As a result, when __context_switch() finally gets run, the CPU
> >> may no longer have the VMCS loaded, and hence any accesses to it would
> >> fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
> >>
> >> Similarly, when __context_switch() is being bypassed also on the second
> >> (switch-in) path, VMCS ownership may have been lost and hence needs
> >> re-establishing. Since there's no existing hook to put this in, add a
> >> new one.
> >>
> >> Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de>
> >> Reported-by: Anshul Makkar <anshul.makkar@citrix.com>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >
> > Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >
> > Although I would certainly prefer if we can get another round of testing
> > on this series for confidence.
> 
> Sure, I'd certainly like to stick a Tested-by on it. Plus VMX maintainer
> feedback will need waiting for anyway.
> 

logic looks clean to me:

Acked-by: Kevin Tian <kevin.tian@intel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/2] x86: package up context switch hook pointers
  2017-02-16 11:16 ` [PATCH v2 2/2] x86: package up context switch hook pointers Jan Beulich
  2017-02-16 11:23   ` Andrew Cooper
@ 2017-02-17  3:49   ` Tian, Kevin
  1 sibling, 0 replies; 30+ messages in thread
From: Tian, Kevin @ 2017-02-17  3:49 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Andrew Cooper, Boris Ostrovsky, Nakajima, Jun, Suravee Suthikulpanit

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, February 16, 2017 7:16 PM
> 
> They're all solely dependent on guest type, so we don't need to repeat
> all the same three pointers in every vCPU control structure. Instead use
> static const structures, and store pointers to them in the domain
> control structure.
> 
> Since touching it anyway, take the opportunity and expand
> schedule_tail() in the only two places invoking it, allowing the macro
> to be dropped.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-02-16 11:15 ` [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths Jan Beulich
  2017-02-16 12:27   ` Andrew Cooper
@ 2017-02-17  8:40   ` Sergey Dyasli
  2017-02-17  9:01     ` Jan Beulich
  2017-10-27 17:42   ` Igor Druzhinin
  2 siblings, 1 reply; 30+ messages in thread
From: Sergey Dyasli @ 2017-02-17  8:40 UTC (permalink / raw)
  To: JBeulich, xen-devel
  Cc: Sergey Dyasli, Kevin Tian, Anshul Makkar, jun.nakajima, Andrew Cooper

On Thu, 2017-02-16 at 04:15 -0700, Jan Beulich wrote:
> When __context_switch() is being bypassed during original context
> switch handling, the vCPU "owning" the VMCS partially loses control of
> it: It will appear non-running to remote CPUs, and hence their attempt
> to pause the owning vCPU will have no effect on it (as it already
> looks to be paused). At the same time the "owning" CPU will re-enable
> interrupts eventually (the lastest when entering the idle loop) and
> hence becomes subject to IPIs from other CPUs requesting access to the
> VMCS. As a result, when __context_switch() finally gets run, the CPU
> may no longer have the VMCS loaded, and hence any accesses to it would
> fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
> 
> Similarly, when __context_switch() is being bypassed also on the second
> (switch-in) path, VMCS ownership may have been lost and hence needs
> re-establishing. Since there's no existing hook to put this in, add a
> new one.

This paragraph now has to be replaced with something about
vmx_do_resume() change.

> 
> Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de>
> Reported-by: Anshul Makkar <anshul.makkar@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Drop the spin loop from vmx_vmc_reload(). Use the function in
>     vmx_do_resume() instead of open coding it there (requiring the
>     ASSERT()s to be adjusted/dropped). Drop the new
>     ->ctxt_switch_same() hook.


For the code itself:

Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>

And since night testing of the PML scenario (reboot of 32 VMs)
didn't find any issues:

Tested-by: Sergey Dyasli <sergey.dyasli@citrix.com>

-- 
Thanks,
Sergey
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-02-17  8:40   ` Sergey Dyasli
@ 2017-02-17  9:01     ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2017-02-17  9:01 UTC (permalink / raw)
  To: Sergey Dyasli
  Cc: AndrewCooper, Anshul Makkar, Kevin Tian, jun.nakajima, xen-devel

>>> On 17.02.17 at 09:40, <sergey.dyasli@citrix.com> wrote:
> On Thu, 2017-02-16 at 04:15 -0700, Jan Beulich wrote:
>> When __context_switch() is being bypassed during original context
>> switch handling, the vCPU "owning" the VMCS partially loses control of
>> it: It will appear non-running to remote CPUs, and hence their attempt
>> to pause the owning vCPU will have no effect on it (as it already
>> looks to be paused). At the same time the "owning" CPU will re-enable
>> interrupts eventually (the lastest when entering the idle loop) and
>> hence becomes subject to IPIs from other CPUs requesting access to the
>> VMCS. As a result, when __context_switch() finally gets run, the CPU
>> may no longer have the VMCS loaded, and hence any accesses to it would
>> fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
>> 
>> Similarly, when __context_switch() is being bypassed also on the second
>> (switch-in) path, VMCS ownership may have been lost and hence needs
>> re-establishing. Since there's no existing hook to put this in, add a
>> new one.
> 
> This paragraph now has to be replaced with something about
> vmx_do_resume() change.

Oh, I had tried to remember to update this, but then forgot (ending
up mentioning this only in the v2 info). Thanks for noticing.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-02-16 11:15 ` [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths Jan Beulich
  2017-02-16 12:27   ` Andrew Cooper
  2017-02-17  8:40   ` Sergey Dyasli
@ 2017-10-27 17:42   ` Igor Druzhinin
  2017-11-02 19:46     ` Igor Druzhinin
  2 siblings, 1 reply; 30+ messages in thread
From: Igor Druzhinin @ 2017-10-27 17:42 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Andrew Cooper, Anshul Makkar, Kevin Tian, Jun Nakajima, Sergey Dyasli

On 16/02/17 11:15, Jan Beulich wrote:
> When __context_switch() is being bypassed during original context
> switch handling, the vCPU "owning" the VMCS partially loses control of
> it: It will appear non-running to remote CPUs, and hence their attempt
> to pause the owning vCPU will have no effect on it (as it already
> looks to be paused). At the same time the "owning" CPU will re-enable
> interrupts eventually (the lastest when entering the idle loop) and
> hence becomes subject to IPIs from other CPUs requesting access to the
> VMCS. As a result, when __context_switch() finally gets run, the CPU
> may no longer have the VMCS loaded, and hence any accesses to it would
> fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
> 
> Similarly, when __context_switch() is being bypassed also on the second
> (switch-in) path, VMCS ownership may have been lost and hence needs
> re-establishing. Since there's no existing hook to put this in, add a
> new one.
> 
> Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de>
> Reported-by: Anshul Makkar <anshul.makkar@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Drop the spin loop from vmx_vmc_reload(). Use the function in
>     vmx_do_resume() instead of open coding it there (requiring the
>     ASSERT()s to be adjusted/dropped). Drop the new
>     ->ctxt_switch_same() hook.
> 
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -552,6 +552,20 @@ static void vmx_load_vmcs(struct vcpu *v
>      local_irq_restore(flags);
>  }
>  
> +void vmx_vmcs_reload(struct vcpu *v)
> +{
> +    /*
> +     * As we may be running with interrupts disabled, we can't acquire
> +     * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
> +     * the VMCS can't be taken away from us anymore if we still own it.
> +     */
> +    ASSERT(v->is_running || !local_irq_is_enabled());
> +    if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) )
> +        return;
> +
> +    vmx_load_vmcs(v);
> +}
> +
>  int vmx_cpu_up_prepare(unsigned int cpu)
>  {
>      /*
> @@ -1678,10 +1692,7 @@ void vmx_do_resume(struct vcpu *v)
>      bool_t debug_state;
>  
>      if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
> -    {
> -        if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) )
> -            vmx_load_vmcs(v);
> -    }
> +        vmx_vmcs_reload(v);
>      else
>      {
>          /*
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -936,6 +937,18 @@ static void vmx_ctxt_switch_from(struct
>      if ( unlikely(!this_cpu(vmxon)) )
>          return;
>  
> +    if ( !v->is_running )
> +    {
> +        /*
> +         * When this vCPU isn't marked as running anymore, a remote pCPU's
> +         * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
> +         * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken
> +         * away the VMCS from us. As we're running with interrupts disabled,
> +         * we also can't call vmx_vmcs_enter().
> +         */
> +        vmx_vmcs_reload(v);
> +    }
> +
>      vmx_fpu_leave(v);
>      vmx_save_guest_msrs(v);
>      vmx_restore_host_msrs();
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -174,6 +174,7 @@ void vmx_destroy_vmcs(struct vcpu *v);
>  void vmx_vmcs_enter(struct vcpu *v);
>  bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
>  void vmx_vmcs_exit(struct vcpu *v);
> +void vmx_vmcs_reload(struct vcpu *v);
>  
>  #define CPU_BASED_VIRTUAL_INTR_PENDING        0x00000004
>  #define CPU_BASED_USE_TSC_OFFSETING           0x00000008
> 

Hi Jan,

I'm not entirely sure if it's something related but the end result looks
similar to the issue that this patch solved. We are now getting reports of
a similar race condition with the following stack trace on 4.7.1 with this
patch backported but I'm pretty sure this should be the case for master
as well:

(XEN) [480198.570165] Xen call trace:
(XEN) [480198.570168]    [<ffff82d0801eb53e>] vmx.c#arch/x86/hvm/vmx/vmx.o.unlikely+0x136/0x1a8
(XEN) [480198.570171]    [<ffff82d080160095>] domain.c#__context_switch+0x10c/0x3a4
(XEN) [480198.570176]    [<ffff82d08016560c>] __sync_local_execstate+0x35/0x51
(XEN) [480198.570179]    [<ffff82d08018bd82>] invalidate_interrupt+0x40/0x73
(XEN) [480198.570183]    [<ffff82d08016ea1f>] do_IRQ+0x8c/0x5cb
(XEN) [480198.570186]    [<ffff82d08022d93f>] common_interrupt+0x5f/0x70
(XEN) [480198.570189]    [<ffff82d0801b32b0>] vpmu_destroy+0/0x100
(XEN) [480198.570192]    [<ffff82d0801e7dc9>] vmx.c#vmx_vcpu_destroy+0x21/0x30
(XEN) [480198.570195]    [<ffff82d0801c2bf6>] hvm_vcpu_destroy+0x70/0x77
(XEN) [480198.570197]    [<ffff82d08016101e>] vcpu_destroy+0x5d/0x72
(XEN) [480198.570201]    [<ffff82d080107510>] domain.c#complete_domain_destroy+0x49/0x182
(XEN) [480198.570204]    [<ffff82d0801266d2>] rcupdate.c#rcu_process_callbacks+0x141/0x1a3
(XEN) [480198.570207]    [<ffff82d080132f95>] softirq.c#__do_softirq+0x75/0x80
(XEN) [480198.570209]    [<ffff82d080132fae>] process_pending_softirqs+0xe/0x10
(XEN) [480198.570212]    [<ffff82d0801b256f>] mwait-idle.c#mwait_idle+0xf5/0x2c3
(XEN) [480198.570214]    [<ffff82d0801e0d00>] vmx_intr_assist+0x3bf/0x4f2
(XEN) [480198.570216]    [<ffff82d08015fd57>] domain.c#idle_loop+0x38/0x4d

So far all the attempts to get a repro locally failed - the race is quite rare -
it only happens when (probably) the issue with stolen VMCS appears AND TLB flush
IPI comes at the moment of domain destruction. For the issue to appear several
conditions should be met:
1) TLB flush IPI should execute __sync_local_execstate and enter VMX context
switch
2) This should come at the VCPU destroy loop in RCU callback
3) VMCS pointer should be invalid (possibly stolen or cleared somehow)

My idea was to provoke the crash somehow by simulating the conditions described
above. Using Andrew's suggestion I now can satisfy conditions 1 and 2 with
some help of XTF, but I still have no idea how to provoke 3.

Any ideas about the root cause of the fault and suggestions how to reproduce it
would be welcome. Does this crash really has something to do with PML? I doubt
because the original environment may hardly be called PML-heavy.

Thanks,
Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-10-27 17:42   ` Igor Druzhinin
@ 2017-11-02 19:46     ` Igor Druzhinin
  2017-11-07  8:07       ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Igor Druzhinin @ 2017-11-02 19:46 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Sergey Dyasli, Kevin Tian, Andrew Cooper, Anshul Makkar,
	raistlin, Jun Nakajima

On 27/10/17 18:42, Igor Druzhinin wrote:
> On 16/02/17 11:15, Jan Beulich wrote:
>> When __context_switch() is being bypassed during original context
>> switch handling, the vCPU "owning" the VMCS partially loses control of
>> it: It will appear non-running to remote CPUs, and hence their attempt
>> to pause the owning vCPU will have no effect on it (as it already
>> looks to be paused). At the same time the "owning" CPU will re-enable
>> interrupts eventually (the lastest when entering the idle loop) and
>> hence becomes subject to IPIs from other CPUs requesting access to the
>> VMCS. As a result, when __context_switch() finally gets run, the CPU
>> may no longer have the VMCS loaded, and hence any accesses to it would
>> fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
>>
>> Similarly, when __context_switch() is being bypassed also on the second
>> (switch-in) path, VMCS ownership may have been lost and hence needs
>> re-establishing. Since there's no existing hook to put this in, add a
>> new one.
>>
>> Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de>
>> Reported-by: Anshul Makkar <anshul.makkar@citrix.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v2: Drop the spin loop from vmx_vmc_reload(). Use the function in
>>     vmx_do_resume() instead of open coding it there (requiring the
>>     ASSERT()s to be adjusted/dropped). Drop the new
>>     ->ctxt_switch_same() hook.
>>
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -552,6 +552,20 @@ static void vmx_load_vmcs(struct vcpu *v
>>      local_irq_restore(flags);
>>  }
>>  
>> +void vmx_vmcs_reload(struct vcpu *v)
>> +{
>> +    /*
>> +     * As we may be running with interrupts disabled, we can't acquire
>> +     * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
>> +     * the VMCS can't be taken away from us anymore if we still own it.
>> +     */
>> +    ASSERT(v->is_running || !local_irq_is_enabled());
>> +    if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) )
>> +        return;
>> +
>> +    vmx_load_vmcs(v);
>> +}
>> +
>>  int vmx_cpu_up_prepare(unsigned int cpu)
>>  {
>>      /*
>> @@ -1678,10 +1692,7 @@ void vmx_do_resume(struct vcpu *v)
>>      bool_t debug_state;
>>  
>>      if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
>> -    {
>> -        if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) )
>> -            vmx_load_vmcs(v);
>> -    }
>> +        vmx_vmcs_reload(v);
>>      else
>>      {
>>          /*
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -936,6 +937,18 @@ static void vmx_ctxt_switch_from(struct
>>      if ( unlikely(!this_cpu(vmxon)) )
>>          return;
>>  
>> +    if ( !v->is_running )
>> +    {
>> +        /*
>> +         * When this vCPU isn't marked as running anymore, a remote pCPU's
>> +         * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
>> +         * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken
>> +         * away the VMCS from us. As we're running with interrupts disabled,
>> +         * we also can't call vmx_vmcs_enter().
>> +         */
>> +        vmx_vmcs_reload(v);
>> +    }
>> +
>>      vmx_fpu_leave(v);
>>      vmx_save_guest_msrs(v);
>>      vmx_restore_host_msrs();
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -174,6 +174,7 @@ void vmx_destroy_vmcs(struct vcpu *v);
>>  void vmx_vmcs_enter(struct vcpu *v);
>>  bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
>>  void vmx_vmcs_exit(struct vcpu *v);
>> +void vmx_vmcs_reload(struct vcpu *v);
>>  
>>  #define CPU_BASED_VIRTUAL_INTR_PENDING        0x00000004
>>  #define CPU_BASED_USE_TSC_OFFSETING           0x00000008
>>
> 
> Hi Jan,
> 
> I'm not entirely sure if it's something related but the end result looks
> similar to the issue that this patch solved. We are now getting reports of
> a similar race condition with the following stack trace on 4.7.1 with this
> patch backported but I'm pretty sure this should be the case for master
> as well:
> 
> (XEN) [480198.570165] Xen call trace:
> (XEN) [480198.570168]    [<ffff82d0801eb53e>] vmx.c#arch/x86/hvm/vmx/vmx.o.unlikely+0x136/0x1a8
> (XEN) [480198.570171]    [<ffff82d080160095>] domain.c#__context_switch+0x10c/0x3a4
> (XEN) [480198.570176]    [<ffff82d08016560c>] __sync_local_execstate+0x35/0x51
> (XEN) [480198.570179]    [<ffff82d08018bd82>] invalidate_interrupt+0x40/0x73
> (XEN) [480198.570183]    [<ffff82d08016ea1f>] do_IRQ+0x8c/0x5cb
> (XEN) [480198.570186]    [<ffff82d08022d93f>] common_interrupt+0x5f/0x70
> (XEN) [480198.570189]    [<ffff82d0801b32b0>] vpmu_destroy+0/0x100
> (XEN) [480198.570192]    [<ffff82d0801e7dc9>] vmx.c#vmx_vcpu_destroy+0x21/0x30
> (XEN) [480198.570195]    [<ffff82d0801c2bf6>] hvm_vcpu_destroy+0x70/0x77
> (XEN) [480198.570197]    [<ffff82d08016101e>] vcpu_destroy+0x5d/0x72
> (XEN) [480198.570201]    [<ffff82d080107510>] domain.c#complete_domain_destroy+0x49/0x182
> (XEN) [480198.570204]    [<ffff82d0801266d2>] rcupdate.c#rcu_process_callbacks+0x141/0x1a3
> (XEN) [480198.570207]    [<ffff82d080132f95>] softirq.c#__do_softirq+0x75/0x80
> (XEN) [480198.570209]    [<ffff82d080132fae>] process_pending_softirqs+0xe/0x10
> (XEN) [480198.570212]    [<ffff82d0801b256f>] mwait-idle.c#mwait_idle+0xf5/0x2c3
> (XEN) [480198.570214]    [<ffff82d0801e0d00>] vmx_intr_assist+0x3bf/0x4f2
> (XEN) [480198.570216]    [<ffff82d08015fd57>] domain.c#idle_loop+0x38/0x4d
> 
> So far all the attempts to get a repro locally failed - the race is quite rare -
> it only happens when (probably) the issue with stolen VMCS appears AND TLB flush
> IPI comes at the moment of domain destruction. For the issue to appear several
> conditions should be met:
> 1) TLB flush IPI should execute __sync_local_execstate and enter VMX context
> switch
> 2) This should come at the VCPU destroy loop in RCU callback
> 3) VMCS pointer should be invalid (possibly stolen or cleared somehow)
> 
> My idea was to provoke the crash somehow by simulating the conditions described
> above. Using Andrew's suggestion I now can satisfy conditions 1 and 2 with
> some help of XTF, but I still have no idea how to provoke 3.
> 
> Any ideas about the root cause of the fault and suggestions how to reproduce it
> would be welcome. Does this crash really has something to do with PML? I doubt
> because the original environment may hardly be called PML-heavy.
> 
> Thanks,
> Igor
> 

So we finally have complete understanding of what's going on:

Some vCPU has just migrated to another pCPU and we switched to idle but
per_cpu(curr_vcpu) on the current pCPU is still pointing to it - this is
how the current logic works. While we're in idle we're issuing
vcpu_destroy() for some other domain which eventually calls
vmx_vcpu_disable_pml() and trashes VMCS pointer on the current pCPU. At
this moment we get a TLB flush IPI from that same vCPU which is now
context switching on another pCPU - it appears to clean TLB after
itself. This vCPU is already marked is_running=1 by the scheduler. In
the IPI handler we enter __sync_local_execstate() and trying to call
vmx_ctxt_switch_from() for the migrated vCPU which is supposed to call
vmcs_reload() but doesn't do it because is_running==1. The next VMWRITE
crashes the hypervisor.

So the state transition diagram might look like:
pCPU1: vCPUx -> migrate to pCPU2 -> idle -> RCU callbacks ->
vcpu_destroy() -> vmx_vcpu_disable_pml() -> vmcs_clear()
pCPU2: context switch into vCPUx -> is_running = 1 -> TLB flush
pCPU1: IPI handler -> context switch out of vCPUx -> VMWRITE -> CRASH!

We can basically just fix the condition around vmcs_reload() call but
I'm not completely sure that it's the right way to do - I don't think
leaving per_cpu(curr_vcpu) pointing to a migrated vCPU is a good idea
(maybe we need to clean it). What are your thoughts?

CC-ing Dario

Thanks,
Igor



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-02 19:46     ` Igor Druzhinin
@ 2017-11-07  8:07       ` Jan Beulich
  2017-11-07 14:24         ` Igor Druzhinin
  2017-11-07 15:16         ` Jan Beulich
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Beulich @ 2017-11-07  8:07 UTC (permalink / raw)
  To: Igor Druzhinin
  Cc: Sergey Dyasli, Kevin Tian, Andrew Cooper, Anshul Makkar,
	raistlin, Jun Nakajima, xen-devel

>>> On 02.11.17 at 20:46, <igor.druzhinin@citrix.com> wrote:
>> Any ideas about the root cause of the fault and suggestions how to reproduce it
>> would be welcome. Does this crash really has something to do with PML? I doubt
>> because the original environment may hardly be called PML-heavy.

Well, PML-heaviness doesn't matter. It's the mere fact that PML
is enabled on the vCPU being destroyed.

> So we finally have complete understanding of what's going on:
> 
> Some vCPU has just migrated to another pCPU and we switched to idle but
> per_cpu(curr_vcpu) on the current pCPU is still pointing to it - this is
> how the current logic works. While we're in idle we're issuing
> vcpu_destroy() for some other domain which eventually calls
> vmx_vcpu_disable_pml() and trashes VMCS pointer on the current pCPU. At
> this moment we get a TLB flush IPI from that same vCPU which is now
> context switching on another pCPU - it appears to clean TLB after
> itself. This vCPU is already marked is_running=1 by the scheduler. In
> the IPI handler we enter __sync_local_execstate() and trying to call
> vmx_ctxt_switch_from() for the migrated vCPU which is supposed to call
> vmcs_reload() but doesn't do it because is_running==1. The next VMWRITE
> crashes the hypervisor.
> 
> So the state transition diagram might look like:
> pCPU1: vCPUx -> migrate to pCPU2 -> idle -> RCU callbacks ->

I'm not really clear about who/what is "idle" here: pCPU1,
pCPU2, or yet something else? If vCPUx migrated to pCPU2,
wouldn't it be put back into runnable state right away, and
hence pCPU2 can't be idle at this point? Yet for pCPU1 I don't
think its idleness would matter much, i.e. the situation could
also arise without it becoming idle afaics. pCPU1 making it
anywhere softirqs are being processed would suffice.

> vcpu_destroy() -> vmx_vcpu_disable_pml() -> vmcs_clear()
> pCPU2: context switch into vCPUx -> is_running = 1 -> TLB flush
> pCPU1: IPI handler -> context switch out of vCPUx -> VMWRITE -> CRASH!
> 
> We can basically just fix the condition around vmcs_reload() call but
> I'm not completely sure that it's the right way to do - I don't think
> leaving per_cpu(curr_vcpu) pointing to a migrated vCPU is a good idea
> (maybe we need to clean it). What are your thoughts?

per_cpu(curr_vcpu) can only validly be written inside
__context_switch(), hence the only way to achieve this would
be to force __context_switch() to be called earlier than out of
the TLB flush IPI handler, perhaps like in the (untested!) patch
below. Two questions then remain:
- Should we perhaps rather do this in an arch-independent way
  (i.e. ahead of the call to vcpu_destroy() in common code)?
- This deals with only a special case of the more general "TLB
  flush behind the back of a vmx_vmcs_enter() /
  vmx_vmcs_exit() section" - does this need dealing with in a
  more general way? Here I'm thinking of introducing a
  FLUSH_STATE flag to be passed to flush_mask() instead of
  the current flush_tlb_mask() in context_switch() and
  sync_vcpu_execstate(). This could at the same time be used
  for a small performance optimization: At least for HAP vCPU-s
  I don't think we really need the TLB part of the flushes here.

Jan

--- unstable.orig/xen/arch/x86/domain.c
+++ unstable/xen/arch/x86/domain.c
@@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
 
 void vcpu_destroy(struct vcpu *v)
 {
+    /*
+     * Flush all state for this vCPU before fully tearing it down. This is
+     * particularly important for HVM ones on VMX, so that this flushing of
+     * state won't happen from the TLB flush IPI handler behind the back of
+     * a vmx_vmcs_enter() / vmx_vmcs_exit() section.
+     */
+    sync_vcpu_execstate(v);
+
     xfree(v->arch.vm_event);
     v->arch.vm_event = NULL;
 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-07  8:07       ` Jan Beulich
@ 2017-11-07 14:24         ` Igor Druzhinin
  2017-11-07 14:55           ` Jan Beulich
  2017-11-09  9:54           ` Dario Faggioli
  2017-11-07 15:16         ` Jan Beulich
  1 sibling, 2 replies; 30+ messages in thread
From: Igor Druzhinin @ 2017-11-07 14:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Sergey Dyasli, Kevin Tian, Andrew Cooper, Anshul Makkar,
	raistlin, Jun Nakajima, xen-devel

On 07/11/17 08:07, Jan Beulich wrote:
>>>> On 02.11.17 at 20:46, <igor.druzhinin@citrix.com> wrote:
>>> Any ideas about the root cause of the fault and suggestions how to reproduce it
>>> would be welcome. Does this crash really has something to do with PML? I doubt
>>> because the original environment may hardly be called PML-heavy.
> 
> Well, PML-heaviness doesn't matter. It's the mere fact that PML
> is enabled on the vCPU being destroyed.
> 
>> So we finally have complete understanding of what's going on:
>>
>> Some vCPU has just migrated to another pCPU and we switched to idle but
>> per_cpu(curr_vcpu) on the current pCPU is still pointing to it - this is
>> how the current logic works. While we're in idle we're issuing
>> vcpu_destroy() for some other domain which eventually calls
>> vmx_vcpu_disable_pml() and trashes VMCS pointer on the current pCPU. At
>> this moment we get a TLB flush IPI from that same vCPU which is now
>> context switching on another pCPU - it appears to clean TLB after
>> itself. This vCPU is already marked is_running=1 by the scheduler. In
>> the IPI handler we enter __sync_local_execstate() and trying to call
>> vmx_ctxt_switch_from() for the migrated vCPU which is supposed to call
>> vmcs_reload() but doesn't do it because is_running==1. The next VMWRITE
>> crashes the hypervisor.
>>
>> So the state transition diagram might look like:
>> pCPU1: vCPUx -> migrate to pCPU2 -> idle -> RCU callbacks ->
> 
> I'm not really clear about who/what is "idle" here: pCPU1,
> pCPU2, or yet something else?

It's switching to the "current" idle context on pCPU1.

> If vCPUx migrated to pCPU2,
> wouldn't it be put back into runnable state right away, and
> hence pCPU2 can't be idle at this point? Yet for pCPU1 I don't
> think its idleness would matter much, i.e. the situation could
> also arise without it becoming idle afaics. pCPU1 making it
> anywhere softirqs are being processed would suffice.
> 

Idleness matters in that case because we are not switching
per_cpu(curr_vcpu) which I think is the main problem when vCPU migration
comes into play.

>> vcpu_destroy() -> vmx_vcpu_disable_pml() -> vmcs_clear()
>> pCPU2: context switch into vCPUx -> is_running = 1 -> TLB flush
>> pCPU1: IPI handler -> context switch out of vCPUx -> VMWRITE -> CRASH!
>>
>> We can basically just fix the condition around vmcs_reload() call but
>> I'm not completely sure that it's the right way to do - I don't think
>> leaving per_cpu(curr_vcpu) pointing to a migrated vCPU is a good idea
>> (maybe we need to clean it). What are your thoughts?
> 
> per_cpu(curr_vcpu) can only validly be written inside
> __context_switch(), hence the only way to achieve this would
> be to force __context_switch() to be called earlier than out of
> the TLB flush IPI handler, perhaps like in the (untested!) patch
> below. Two questions then remain:
> - Should we perhaps rather do this in an arch-independent way
>   (i.e. ahead of the call to vcpu_destroy() in common code)?
> - This deals with only a special case of the more general "TLB
>   flush behind the back of a vmx_vmcs_enter() /
>   vmx_vmcs_exit() section" - does this need dealing with in a
>   more general way? Here I'm thinking of introducing a
>   FLUSH_STATE flag to be passed to flush_mask() instead of
>   the current flush_tlb_mask() in context_switch() and
>   sync_vcpu_execstate(). This could at the same time be used
>   for a small performance optimization: At least for HAP vCPU-s
>   I don't think we really need the TLB part of the flushes here.
> 
> Jan
> 
> --- unstable.orig/xen/arch/x86/domain.c
> +++ unstable/xen/arch/x86/domain.c
> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>  
>  void vcpu_destroy(struct vcpu *v)
>  {
> +    /*
> +     * Flush all state for this vCPU before fully tearing it down. This is
> +     * particularly important for HVM ones on VMX, so that this flushing of
> +     * state won't happen from the TLB flush IPI handler behind the back of
> +     * a vmx_vmcs_enter() / vmx_vmcs_exit() section.
> +     */
> +    sync_vcpu_execstate(v);
> +
>      xfree(v->arch.vm_event);
>      v->arch.vm_event = NULL;
>  

I don't think this is going to fix the problem since vCPU we are
currently destroying has nothing to do with the vCPUx that actually
caused the problem by its migration. We still are going to call
vmx_vcpu_disable_pml() which loads and cleans VMCS on the current pCPU.
Perhaps I should improve my diagram:

pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle context
-> RCU callbacks -> vcpu_destroy(vCPUy of domain Y) ->
vmx_vcpu_disable_pml() -> vmx_vmcs_clear() (VMCS is trashed at this
point on pCPU1)

pCPU2: context switch into vCPUx -> vCPUx.is_running = 1 -> TLB flush
from context switch to clean TLB on pCPU1

(pCPU1 is still somewhere in vcpu_destroy() loop and with VMCS cleared
by vmx_vcpu_disable_pml())

pCPU1: IPI handler for TLB flush -> context switch out of vCPUx (this is
here because we haven't switched per_cpu(curr_vcpu) before) -> (no
vmcs_reload() here because pCPU2 already set is_running to 1) -> VMWRITE
-> CRASH!

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-07 14:24         ` Igor Druzhinin
@ 2017-11-07 14:55           ` Jan Beulich
  2017-11-07 15:52             ` Igor Druzhinin
  2017-11-09  9:54           ` Dario Faggioli
  1 sibling, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-11-07 14:55 UTC (permalink / raw)
  To: Igor Druzhinin
  Cc: Sergey Dyasli, Kevin Tian, Andrew Cooper, Anshul Makkar,
	raistlin, JunNakajima, xen-devel

>>> On 07.11.17 at 15:24, <igor.druzhinin@citrix.com> wrote:
> On 07/11/17 08:07, Jan Beulich wrote:
>> --- unstable.orig/xen/arch/x86/domain.c
>> +++ unstable/xen/arch/x86/domain.c
>> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>>  
>>  void vcpu_destroy(struct vcpu *v)
>>  {
>> +    /*
>> +     * Flush all state for this vCPU before fully tearing it down. This is
>> +     * particularly important for HVM ones on VMX, so that this flushing of
>> +     * state won't happen from the TLB flush IPI handler behind the back of
>> +     * a vmx_vmcs_enter() / vmx_vmcs_exit() section.
>> +     */
>> +    sync_vcpu_execstate(v);
>> +
>>      xfree(v->arch.vm_event);
>>      v->arch.vm_event = NULL;
> 
> I don't think this is going to fix the problem since vCPU we are
> currently destroying has nothing to do with the vCPUx that actually
> caused the problem by its migration. We still are going to call
> vmx_vcpu_disable_pml() which loads and cleans VMCS on the current pCPU.

Oh, right, wrong vCPU. This should be better:

--- unstable.orig/xen/arch/x86/domain.c
+++ unstable/xen/arch/x86/domain.c
@@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
 
 void vcpu_destroy(struct vcpu *v)
 {
+    /*
+     * Flush all state for the vCPU previously having run on the current CPU.
+     * This is in particular relevant for HVM ones on VMX, so that this
+     * flushing of state won't happen from the TLB flush IPI handler behind
+     * the back of a vmx_vmcs_enter() / vmx_vmcs_exit() section.
+     */
+    sync_local_execstate();
+
     xfree(v->arch.vm_event);
     v->arch.vm_event = NULL;
 
In that case the question then is whether (rather than generalizing
is, as mentioned for the earlier version) this wouldn't better go into
vmx_vcpu_destroy(), assuming anything called earlier from
hvm_vcpu_destroy() isn't susceptible to the problem (i.e. doesn't
play with VMCSes).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-07  8:07       ` Jan Beulich
  2017-11-07 14:24         ` Igor Druzhinin
@ 2017-11-07 15:16         ` Jan Beulich
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2017-11-07 15:16 UTC (permalink / raw)
  To: Igor Druzhinin
  Cc: Sergey Dyasli, Kevin Tian, Andrew Cooper, Anshul Makkar,
	raistlin, Jun Nakajima, xen-devel

>>> On 07.11.17 at 09:07, <JBeulich@suse.com> wrote:
>>>> On 02.11.17 at 20:46, <igor.druzhinin@citrix.com> wrote:
>>> Any ideas about the root cause of the fault and suggestions how to reproduce 
> it
>>> would be welcome. Does this crash really has something to do with PML? I 
> doubt
>>> because the original environment may hardly be called PML-heavy.
> 
> Well, PML-heaviness doesn't matter. It's the mere fact that PML
> is enabled on the vCPU being destroyed.
> 
>> So we finally have complete understanding of what's going on:
>> 
>> Some vCPU has just migrated to another pCPU and we switched to idle but
>> per_cpu(curr_vcpu) on the current pCPU is still pointing to it - this is
>> how the current logic works. While we're in idle we're issuing
>> vcpu_destroy() for some other domain which eventually calls
>> vmx_vcpu_disable_pml() and trashes VMCS pointer on the current pCPU. At
>> this moment we get a TLB flush IPI from that same vCPU which is now
>> context switching on another pCPU - it appears to clean TLB after
>> itself. This vCPU is already marked is_running=1 by the scheduler. In
>> the IPI handler we enter __sync_local_execstate() and trying to call
>> vmx_ctxt_switch_from() for the migrated vCPU which is supposed to call
>> vmcs_reload() but doesn't do it because is_running==1. The next VMWRITE
>> crashes the hypervisor.
>> 
>> So the state transition diagram might look like:
>> pCPU1: vCPUx -> migrate to pCPU2 -> idle -> RCU callbacks ->
> 
> I'm not really clear about who/what is "idle" here: pCPU1,
> pCPU2, or yet something else? If vCPUx migrated to pCPU2,
> wouldn't it be put back into runnable state right away, and
> hence pCPU2 can't be idle at this point? Yet for pCPU1 I don't
> think its idleness would matter much, i.e. the situation could
> also arise without it becoming idle afaics. pCPU1 making it
> anywhere softirqs are being processed would suffice.
> 
>> vcpu_destroy() -> vmx_vcpu_disable_pml() -> vmcs_clear()
>> pCPU2: context switch into vCPUx -> is_running = 1 -> TLB flush
>> pCPU1: IPI handler -> context switch out of vCPUx -> VMWRITE -> CRASH!
>> 
>> We can basically just fix the condition around vmcs_reload() call but
>> I'm not completely sure that it's the right way to do - I don't think
>> leaving per_cpu(curr_vcpu) pointing to a migrated vCPU is a good idea
>> (maybe we need to clean it). What are your thoughts?
> 
> per_cpu(curr_vcpu) can only validly be written inside
> __context_switch(), hence the only way to achieve this would
> be to force __context_switch() to be called earlier than out of
> the TLB flush IPI handler, perhaps like in the (untested!) patch
> below. Two questions then remain:
> - Should we perhaps rather do this in an arch-independent way
>   (i.e. ahead of the call to vcpu_destroy() in common code)?
> - This deals with only a special case of the more general "TLB
>   flush behind the back of a vmx_vmcs_enter() /
>   vmx_vmcs_exit() section" - does this need dealing with in a
>   more general way? Here I'm thinking of introducing a
>   FLUSH_STATE flag to be passed to flush_mask() instead of
>   the current flush_tlb_mask() in context_switch() and
>   sync_vcpu_execstate(). This could at the same time be used
>   for a small performance optimization: At least for HAP vCPU-s
>   I don't think we really need the TLB part of the flushes here.

Btw., for this second aspect below is what I have in mind.

Jan

x86: make CPU state flush requests explicit

Having this be an implied side effect of a TLB flush is not very nice:
It could (at least in theory) lead to unintended state flushes (see e.g.
https://lists.xenproject.org/archives/html/xen-devel/2017-11/msg00187.html 
for context). Introduce a flag to be used in the two places actually
wanting the state flushed, and conditionalize the
__sync_local_execstate() invocation in the IPI handler accordingly.

At the same time also conditionalize the flush_area_local() invocations,
to short-circuit the function ending up as a no-op anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I first thought we could also suppress the TLB flush part in the context
switch cases for HAP vCPU-s, but the per-domain mappings require that to
happen.

--- unstable.orig/xen/arch/x86/domain.c
+++ unstable/xen/arch/x86/domain.c
@@ -1699,7 +1699,7 @@ void context_switch(struct vcpu *prev, s
                   !cpumask_empty(&dirty_mask)) )
     {
         /* Other cpus call __sync_local_execstate from flush ipi handler. */
-        flush_tlb_mask(&dirty_mask);
+        flush_mask(&dirty_mask, FLUSH_TLB | FLUSH_STATE);
     }
 
     if ( prev != next )
@@ -1808,7 +1808,7 @@ void sync_vcpu_execstate(struct vcpu *v)
         sync_local_execstate();
 
     /* Other cpus call __sync_local_execstate from flush ipi handler. */
-    flush_tlb_mask(v->vcpu_dirty_cpumask);
+    flush_mask(v->vcpu_dirty_cpumask, FLUSH_TLB | FLUSH_STATE);
 }
 
 static int relinquish_memory(
--- unstable.orig/xen/arch/x86/smp.c
+++ unstable/xen/arch/x86/smp.c
@@ -207,9 +207,10 @@ void invalidate_interrupt(struct cpu_use
     unsigned int flags = flush_flags;
     ack_APIC_irq();
     perfc_incr(ipis);
-    if ( __sync_local_execstate() )
+    if ( (flags & FLUSH_STATE) && __sync_local_execstate() )
         flags &= ~(FLUSH_TLB | FLUSH_TLB_GLOBAL);
-    flush_area_local(flush_va, flags);
+    if ( flags & ~(FLUSH_STATE | FLUSH_ORDER_MASK) )
+        flush_area_local(flush_va, flags);
     cpumask_clear_cpu(smp_processor_id(), &flush_cpumask);
 }
 
@@ -219,7 +220,8 @@ void flush_area_mask(const cpumask_t *ma
 
     ASSERT(local_irq_is_enabled());
 
-    if ( cpumask_test_cpu(cpu, mask) )
+    if ( (flags & ~(FLUSH_STATE | FLUSH_ORDER_MASK)) &&
+         cpumask_test_cpu(cpu, mask) )
         flags = flush_area_local(va, flags);
 
     if ( (flags & ~FLUSH_ORDER_MASK) &&
--- unstable.orig/xen/include/asm-x86/flushtlb.h
+++ unstable/xen/include/asm-x86/flushtlb.h
@@ -101,6 +101,8 @@ void write_cr3(unsigned long cr3);
 #define FLUSH_CACHE      0x400
  /* VA for the flush has a valid mapping */
 #define FLUSH_VA_VALID   0x800
+ /* Flush CPU state */
+#define FLUSH_STATE      0x1000
 
 /* Flush local TLBs/caches. */
 unsigned int flush_area_local(const void *va, unsigned int flags);



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-07 14:55           ` Jan Beulich
@ 2017-11-07 15:52             ` Igor Druzhinin
  2017-11-07 16:31               ` Jan Beulich
  2017-11-09 10:05               ` Jan Beulich
  0 siblings, 2 replies; 30+ messages in thread
From: Igor Druzhinin @ 2017-11-07 15:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Sergey Dyasli, Kevin Tian, Andrew Cooper, Anshul Makkar,
	raistlin, JunNakajima, xen-devel

On 07/11/17 14:55, Jan Beulich wrote:
>>>> On 07.11.17 at 15:24, <igor.druzhinin@citrix.com> wrote:
>> On 07/11/17 08:07, Jan Beulich wrote:
>>> --- unstable.orig/xen/arch/x86/domain.c
>>> +++ unstable/xen/arch/x86/domain.c
>>> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>>>  
>>>  void vcpu_destroy(struct vcpu *v)
>>>  {
>>> +    /*
>>> +     * Flush all state for this vCPU before fully tearing it down. This is
>>> +     * particularly important for HVM ones on VMX, so that this flushing of
>>> +     * state won't happen from the TLB flush IPI handler behind the back of
>>> +     * a vmx_vmcs_enter() / vmx_vmcs_exit() section.
>>> +     */
>>> +    sync_vcpu_execstate(v);
>>> +
>>>      xfree(v->arch.vm_event);
>>>      v->arch.vm_event = NULL;
>>
>> I don't think this is going to fix the problem since vCPU we are
>> currently destroying has nothing to do with the vCPUx that actually
>> caused the problem by its migration. We still are going to call
>> vmx_vcpu_disable_pml() which loads and cleans VMCS on the current pCPU.
> 
> Oh, right, wrong vCPU. This should be better:
> 
> --- unstable.orig/xen/arch/x86/domain.c
> +++ unstable/xen/arch/x86/domain.c
> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>  
>  void vcpu_destroy(struct vcpu *v)
>  {
> +    /*
> +     * Flush all state for the vCPU previously having run on the current CPU.
> +     * This is in particular relevant for HVM ones on VMX, so that this
> +     * flushing of state won't happen from the TLB flush IPI handler behind
> +     * the back of a vmx_vmcs_enter() / vmx_vmcs_exit() section.
> +     */
> +    sync_local_execstate();
> +
>      xfree(v->arch.vm_event);
>      v->arch.vm_event = NULL;
>  
> In that case the question then is whether (rather than generalizing
> is, as mentioned for the earlier version) this wouldn't better go into
> vmx_vcpu_destroy(), assuming anything called earlier from
> hvm_vcpu_destroy() isn't susceptible to the problem (i.e. doesn't
> play with VMCSes).

Ah, ok. Does this also apply to the previous issue? May I revert that
change to test it?

There is one things that I'm worrying about with this approach:

At this place we just sync the idle context because we know that we are
going to deal with VMCS later. But what about other potential cases
(perhaps some softirqs) in which we are accessing a vCPU data structure
that is currently shared between different pCPUs. Maybe we'd better sync
the context as soon as possible after we switched to idle from a
migrated vCPU.

Igor

> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-07 15:52             ` Igor Druzhinin
@ 2017-11-07 16:31               ` Jan Beulich
  2017-11-09 10:05               ` Jan Beulich
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2017-11-07 16:31 UTC (permalink / raw)
  To: Igor Druzhinin, George Dunlap, raistlin
  Cc: Sergey Dyasli, Kevin Tian, Andrew Cooper, Anshul Makkar,
	JunNakajima, xen-devel

>>> On 07.11.17 at 16:52, <igor.druzhinin@citrix.com> wrote:
> On 07/11/17 14:55, Jan Beulich wrote:
>>>>> On 07.11.17 at 15:24, <igor.druzhinin@citrix.com> wrote:
>>> On 07/11/17 08:07, Jan Beulich wrote:
>>>> --- unstable.orig/xen/arch/x86/domain.c
>>>> +++ unstable/xen/arch/x86/domain.c
>>>> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>>>>  
>>>>  void vcpu_destroy(struct vcpu *v)
>>>>  {
>>>> +    /*
>>>> +     * Flush all state for this vCPU before fully tearing it down. This is
>>>> +     * particularly important for HVM ones on VMX, so that this flushing of
>>>> +     * state won't happen from the TLB flush IPI handler behind the back of
>>>> +     * a vmx_vmcs_enter() / vmx_vmcs_exit() section.
>>>> +     */
>>>> +    sync_vcpu_execstate(v);
>>>> +
>>>>      xfree(v->arch.vm_event);
>>>>      v->arch.vm_event = NULL;
>>>
>>> I don't think this is going to fix the problem since vCPU we are
>>> currently destroying has nothing to do with the vCPUx that actually
>>> caused the problem by its migration. We still are going to call
>>> vmx_vcpu_disable_pml() which loads and cleans VMCS on the current pCPU.
>> 
>> Oh, right, wrong vCPU. This should be better:
>> 
>> --- unstable.orig/xen/arch/x86/domain.c
>> +++ unstable/xen/arch/x86/domain.c
>> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>>  
>>  void vcpu_destroy(struct vcpu *v)
>>  {
>> +    /*
>> +     * Flush all state for the vCPU previously having run on the current CPU.
>> +     * This is in particular relevant for HVM ones on VMX, so that this
>> +     * flushing of state won't happen from the TLB flush IPI handler behind
>> +     * the back of a vmx_vmcs_enter() / vmx_vmcs_exit() section.
>> +     */
>> +    sync_local_execstate();
>> +
>>      xfree(v->arch.vm_event);
>>      v->arch.vm_event = NULL;
>>  
>> In that case the question then is whether (rather than generalizing
>> is, as mentioned for the earlier version) this wouldn't better go into
>> vmx_vcpu_destroy(), assuming anything called earlier from
>> hvm_vcpu_destroy() isn't susceptible to the problem (i.e. doesn't
>> play with VMCSes).
> 
> Ah, ok. Does this also apply to the previous issue? May I revert that
> change to test it?

Feel free to try it, but I had checked that previous patch earlier
today, and right now I don't think the two issues are related.

> There is one things that I'm worrying about with this approach:
> 
> At this place we just sync the idle context because we know that we are
> going to deal with VMCS later. But what about other potential cases
> (perhaps some softirqs) in which we are accessing a vCPU data structure
> that is currently shared between different pCPUs. Maybe we'd better sync
> the context as soon as possible after we switched to idle from a
> migrated vCPU.

Well, yes, I had pointed out in the earlier reply that this is just to
deal with the specific case here. Whether to sync earlier after a
migration I'm not really sure about - the way it's written right now
is meant to deal with migration across CPUs. If so, this would
perhaps belong into scheduler code (and hence cover ARM as
well), and till now I wasn't able to figure a good place where to
put this.

George, Dario, do you have any thoughts both on the general
idea as well as where to put the necessary code?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-07 14:24         ` Igor Druzhinin
  2017-11-07 14:55           ` Jan Beulich
@ 2017-11-09  9:54           ` Dario Faggioli
  2017-11-09 10:17             ` Jan Beulich
  1 sibling, 1 reply; 30+ messages in thread
From: Dario Faggioli @ 2017-11-09  9:54 UTC (permalink / raw)
  To: Igor Druzhinin, Jan Beulich
  Cc: Sergey Dyasli, Kevin Tian, Andrew Cooper, Anshul Makkar,
	Jun Nakajima, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1855 bytes --]

On Tue, 2017-11-07 at 14:24 +0000, Igor Druzhinin wrote:
> Perhaps I should improve my diagram:
> 
> pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle
> context
> -> RCU callbacks -> vcpu_destroy(vCPUy of domain Y) ->
> vmx_vcpu_disable_pml() -> vmx_vmcs_clear() (VMCS is trashed at this
> point on pCPU1)
> 
> pCPU2: context switch into vCPUx -> vCPUx.is_running = 1 -> TLB flush
> from context switch to clean TLB on pCPU1
> 
Sorry, there must be something I'm missing (or misunderstanding).

What is this code that checks is_running and triggers the TLB flush?

But, more important, how come you are context switching to something
that has is_running == 1 ? That should not be possible.

In fact, from a scheduling code perspective, since you're mentioning
vCPU migration between pCPUs:

 pCPU1
 .
 .
 //vCPUx->is_running is 1
 vCPUx->pause_flags |= _VPF_migrating
 schedule()
  idle->is_running = 1
  //vCPUx->pause_flags != 0 ==> it's blocked and can't be scheduled!
  context_switch( prev=vCPUx, next=idle )
   set_current( idle )
   //let's be lazy! don't call __context_switch()
   context_saved( vCPUx )
    vCPUx->is_running = 0
    SCHED_OP( context_saved ) //NULL for Credit1
    vcpu_migrate( vCPUx )
     if ( vCPUx->is_running || !test_and_clear(_VPF_migrating) )
      return;
     vcpu_wake( vCPUx )
 .
 .
 .

So, basically, the scheduler on pCPU2 can decide to pick vCPUx from the
runqueue and switch to it _only_ if it has gone through vcpu_wake(),
which must actually have woken up it, which happens if _VPF_migrating
has been cleared, which means is_running was 0 already.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-07 15:52             ` Igor Druzhinin
  2017-11-07 16:31               ` Jan Beulich
@ 2017-11-09 10:05               ` Jan Beulich
  2017-11-09 10:36                 ` Dario Faggioli
  1 sibling, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-11-09 10:05 UTC (permalink / raw)
  To: Igor Druzhinin
  Cc: Sergey Dyasli, Kevin Tian, George Dunlap, Andrew Cooper,
	Anshul Makkar, raistlin, JunNakajima, xen-devel

>>> On 07.11.17 at 16:52, <igor.druzhinin@citrix.com> wrote:
> There is one things that I'm worrying about with this approach:
> 
> At this place we just sync the idle context because we know that we are
> going to deal with VMCS later. But what about other potential cases
> (perhaps some softirqs) in which we are accessing a vCPU data structure
> that is currently shared between different pCPUs. Maybe we'd better sync
> the context as soon as possible after we switched to idle from a
> migrated vCPU.

Short of feedback from the scheduler maintainers I've looked
into this some more. Calling sync_vcpu_execstate() out of
vcpu_move_locked() would seem feasible, but there are a number
of other places where ->processor of a vCPU is being updated,
and where the vCPU was not (obviously) put to sleep already:

- credit1's csched_runq_steal()
- credit2's migrate()
- csched2_schedule()
- null's vcpu_assign() when called out of null_schedule()
- rt_schedule()

I don't think it is reasonable to call sync_vcpu_execstate() in all of
those places, and hence VMX'es special VMCS management may
indeed better be taken care of by VMX-specific code (i.e. as
previously indicated the sync_local_execstate() invocation moved
from vcpu_destroy() - where my most recent patch draft had put
it - to vmx_vcpu_destroy()). And we'd have to live with other
VMX code paths having similar asynchronous behavior needing to
similarly take care of the requirement.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-09  9:54           ` Dario Faggioli
@ 2017-11-09 10:17             ` Jan Beulich
  2017-11-09 10:36               ` Sergey Dyasli
  2017-11-09 10:39               ` Dario Faggioli
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Beulich @ 2017-11-09 10:17 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Sergey Dyasli, Kevin Tian, Igor Druzhinin, Andrew Cooper,
	Anshul Makkar, JunNakajima, xen-devel

>>> On 09.11.17 at 10:54, <raistlin@linux.it> wrote:
> On Tue, 2017-11-07 at 14:24 +0000, Igor Druzhinin wrote:
>> Perhaps I should improve my diagram:
>> 
>> pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle
>> context
>> -> RCU callbacks -> vcpu_destroy(vCPUy of domain Y) ->
>> vmx_vcpu_disable_pml() -> vmx_vmcs_clear() (VMCS is trashed at this
>> point on pCPU1)
>> 
>> pCPU2: context switch into vCPUx -> vCPUx.is_running = 1 -> TLB flush
>> from context switch to clean TLB on pCPU1
>> 
> Sorry, there must be something I'm missing (or misunderstanding).
> 
> What is this code that checks is_running and triggers the TLB flush?

I don't see where Igor said is_running is being checked around a
TLB flush. The TLB flush itself is what happens first thing in
context_switch() (and it's really using the TLB flush interface to
mainly effect the state flush, with the TLB flush being an implied
side effect; I've already got a series of further patches to make
this less implicit).

> But, more important, how come you are context switching to something
> that has is_running == 1 ? That should not be possible.

That's not what Igor's diagram says - it's indicating the fact that
is_running is being set to 1 in the process of context switching
into vCPUx.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-09 10:17             ` Jan Beulich
@ 2017-11-09 10:36               ` Sergey Dyasli
  2017-11-09 11:01                 ` Dario Faggioli
  2017-11-09 10:39               ` Dario Faggioli
  1 sibling, 1 reply; 30+ messages in thread
From: Sergey Dyasli @ 2017-11-09 10:36 UTC (permalink / raw)
  To: JBeulich, raistlin
  Cc: Igor Druzhinin, Kevin Tian, Sergey Dyasli, Andrew Cooper,
	anshul.makkar, jun.nakajima, xen-devel

On Thu, 2017-11-09 at 03:17 -0700, Jan Beulich wrote:
> > > > On 09.11.17 at 10:54, <raistlin@linux.it> wrote:
> > 
> > On Tue, 2017-11-07 at 14:24 +0000, Igor Druzhinin wrote:
> > > Perhaps I should improve my diagram:
> > > 
> > > pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle
> > > context
> > > -> RCU callbacks -> vcpu_destroy(vCPUy of domain Y) ->
> > > vmx_vcpu_disable_pml() -> vmx_vmcs_clear() (VMCS is trashed at this
> > > point on pCPU1)
> > > 
> > > pCPU2: context switch into vCPUx -> vCPUx.is_running = 1 -> TLB flush
> > > from context switch to clean TLB on pCPU1
> > > 
> > 
> > Sorry, there must be something I'm missing (or misunderstanding).
> > 
> > What is this code that checks is_running and triggers the TLB flush?
> 
> I don't see where Igor said is_running is being checked around a
> TLB flush. The TLB flush itself is what happens first thing in
> context_switch() (and it's really using the TLB flush interface to
> mainly effect the state flush, with the TLB flush being an implied
> side effect; I've already got a series of further patches to make
> this less implicit).
> 
> > But, more important, how come you are context switching to something
> > that has is_running == 1 ? That should not be possible.
> 
> That's not what Igor's diagram says - it's indicating the fact that
> is_running is being set to 1 in the process of context switching
> into vCPUx.

Jan, Dario,

Igor was referring to the following situation:


pCPU1                                   pCPU2
=====                                   =====
current == vCPU1
context_switch(next == idle)
!! __context_switch() is skipped
vcpu_migrate(vCPU1)
RCU callbacks
vmx_vcpu_destroy()
vmx_vcpu_disable_pml()
current_vmcs = 0

                                        schedule(next == vCPU1)
                                        vCPU1->is_running = 1;
                                        context_switch(next == vCPU1)
                                        flush_tlb_mask(&dirty_mask);
                                    
                                <--- IPI

__sync_local_execstate()
__context_switch(prev == vCPU1)
vmx_ctxt_switch_from(vCPU1)
vCPU1->is_running == 1
!! vmx_vmcs_reload() is skipped


I hope that this better illustrates the root cause.

-- 
Thanks,
Sergey
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-09 10:05               ` Jan Beulich
@ 2017-11-09 10:36                 ` Dario Faggioli
  2017-11-09 12:58                   ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Dario Faggioli @ 2017-11-09 10:36 UTC (permalink / raw)
  To: Jan Beulich, Igor Druzhinin
  Cc: Sergey Dyasli, Kevin Tian, George Dunlap, Andrew Cooper,
	Anshul Makkar, JunNakajima, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3414 bytes --]

On Thu, 2017-11-09 at 03:05 -0700, Jan Beulich wrote:
> > > > On 07.11.17 at 16:52, <igor.druzhinin@citrix.com> wrote:
> > 
> > There is one things that I'm worrying about with this approach:
> > 
> > At this place we just sync the idle context because we know that we
> > are
> > going to deal with VMCS later. But what about other potential cases
> > (perhaps some softirqs) in which we are accessing a vCPU data
> > structure
> > that is currently shared between different pCPUs. Maybe we'd better
> > sync
> > the context as soon as possible after we switched to idle from a
> > migrated vCPU.
> 
> Short of feedback from the scheduler maintainers I've looked
> into this some more.
>
Sorry, as you may have seen by the other email, I was looking into this
today.

> Calling sync_vcpu_execstate() out of
> vcpu_move_locked() would seem feasible, but there are a number
> of other places where ->processor of a vCPU is being updated,
> and where the vCPU was not (obviously) put to sleep already:
> 
> - credit1's csched_runq_steal()
> - credit2's migrate()
> - csched2_schedule()
> - null's vcpu_assign() when called out of null_schedule()
> - rt_schedule()
> 
> I don't think it is reasonable to call sync_vcpu_execstate() in all
> of
> those places, 
>
Yes, I agree.

> and hence VMX'es special VMCS management may
> indeed better be taken care of by VMX-specific code (i.e. as
> previously indicated the sync_local_execstate() invocation moved
> from vcpu_destroy() - where my most recent patch draft had put
> it - to vmx_vcpu_destroy()). 
>
I was still trying to form an opinion about the issue, and was leaning
toward suggesting exactly the same.

In fact, the point of lazy context switch is exactly that: trying to
save syncing the state. Of course, that requires that we identify all
the places and occasions where having the state out of sync may be a
problem, and sync it!.

What seems to me to be happening here is as follows:

 a. a pCPU becomes idle
 b. we do the lazy switch, i.e., the context of the previously running 
    vCPU stays on the pCPU
 c. *something* happens which wants to either play with or alter the 
    context currently loaded on the pCPU (in this case it's VMX bits  
    of the context, but it could be other parts of it too) that is 
    loaded on the pCPU

Well, I'm afraid I only see two solutions:
1) we get rid of lazy context switch;
2) whatever it is that is happening at point c above, it needs to be 
   aware that we use lazy context switch, and make sure to sync the 
   context before playing with or altering it;

> And we'd have to live with other
> VMX code paths having similar asynchronous behavior needing to
> similarly take care of the requirement.
> 
Exactly.

And in fact, in this thread, migration of vCPUs between pCPUs was
mentioned, and it was being considered to treat that in a special way. 

But it looks to me that something very similar may, at least in theory,
happen any time a lazy context switch occurs, no matter whether the
pCPU has become idle because the previously running vCPU wants to move,
or because it blocked for whatever other reason.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-09 10:17             ` Jan Beulich
  2017-11-09 10:36               ` Sergey Dyasli
@ 2017-11-09 10:39               ` Dario Faggioli
  1 sibling, 0 replies; 30+ messages in thread
From: Dario Faggioli @ 2017-11-09 10:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Sergey Dyasli, Kevin Tian, Igor Druzhinin, Andrew Cooper,
	Anshul Makkar, JunNakajima, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1264 bytes --]

On Thu, 2017-11-09 at 03:17 -0700, Jan Beulich wrote:
> > > > On 09.11.17 at 10:54, <raistlin@linux.it> wrote:
> > On Tue, 2017-11-07 at 14:24 +0000, Igor Druzhinin wrote:
> > > Perhaps I should improve my diagram:
> > > 
> > > pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle
> > > context
> > > -> RCU callbacks -> vcpu_destroy(vCPUy of domain Y) ->
> > > vmx_vcpu_disable_pml() -> vmx_vmcs_clear() (VMCS is trashed at
> > > this
> > > point on pCPU1)
> > > 
> > > pCPU2: context switch into vCPUx -> vCPUx.is_running = 1 -> TLB
> > > flush
> > > from context switch to clean TLB on pCPU1
> > 
> > But, more important, how come you are context switching to
> > something
> > that has is_running == 1 ? That should not be possible.
> 
> That's not what Igor's diagram says - it's indicating the fact that
> is_running is being set to 1 in the process of context switching
> into vCPUx.
> 
Ah, ok. So I was right: I indeed was misunderstanding something, i.e.,
the diagram itself. :-)

Now I get it.

Sorry for the noise,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-09 10:36               ` Sergey Dyasli
@ 2017-11-09 11:01                 ` Dario Faggioli
  2017-11-09 13:08                   ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Dario Faggioli @ 2017-11-09 11:01 UTC (permalink / raw)
  To: Sergey Dyasli, JBeulich
  Cc: Igor Druzhinin, Kevin Tian, Andrew Cooper, anshul makkar,
	jun.nakajima, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3630 bytes --]

On Thu, 2017-11-09 at 10:36 +0000, Sergey Dyasli wrote:
> On Thu, 2017-11-09 at 03:17 -0700, Jan Beulich wrote:
> > > > > On 09.11.17 at 10:54, <raistlin@linux.it> wrote:
> > > 
> > > On Tue, 2017-11-07 at 14:24 +0000, Igor Druzhinin wrote:
> > > > Perhaps I should improve my diagram:
> > > > 
> > > > pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle
> > > > context
> > > > -> RCU callbacks -> vcpu_destroy(vCPUy of domain Y) ->
> > > > vmx_vcpu_disable_pml() -> vmx_vmcs_clear() (VMCS is trashed at
> > > > this
> > > > point on pCPU1)
> > > > 
> > > > pCPU2: context switch into vCPUx -> vCPUx.is_running = 1 -> TLB
> > > > flush
> > > > from context switch to clean TLB on pCPU1
> > > > 
> > > 
> > > Sorry, there must be something I'm missing (or misunderstanding).
> > > 
> > > What is this code that checks is_running and triggers the TLB
> > > flush?
> > 
> > I don't see where Igor said is_running is being checked around a
> > TLB flush. The TLB flush itself is what happens first thing in
> > context_switch() (and it's really using the TLB flush interface to
> > mainly effect the state flush, with the TLB flush being an implied
> > side effect; I've already got a series of further patches to make
> > this less implicit).
> > 
> > > But, more important, how come you are context switching to
> > > something
> > > that has is_running == 1 ? That should not be possible.
> > 
> > That's not what Igor's diagram says - it's indicating the fact that
> > is_running is being set to 1 in the process of context switching
> > into vCPUx.
> 
> Jan, Dario,
> 
Hi,

> Igor was referring to the following situation:
> 
> 
> pCPU1                                   pCPU2
> =====                                   =====
> current == vCPU1
> context_switch(next == idle)
> !! __context_switch() is skipped
> vcpu_migrate(vCPU1)
> RCU callbacks
> vmx_vcpu_destroy()
> vmx_vcpu_disable_pml()
> current_vmcs = 0
> 
>                                         schedule(next == vCPU1)
>                                         vCPU1->is_running = 1;
>                                         context_switch(next == vCPU1)
>                                         flush_tlb_mask(&dirty_mask);
>                                     
>                                 <--- IPI
> 
> __sync_local_execstate()
> __context_switch(prev == vCPU1)
> vmx_ctxt_switch_from(vCPU1)
> vCPU1->is_running == 1
> !! vmx_vmcs_reload() is skipped
> 
> I hope that this better illustrates the root cause.
> 
Yes, I think this is clearer, and easier to understand. But that's
probably a mater of habit and personal taste, so sorry again for
misunderstanding it in its other form.

Anyway, as I was trying to explain replaying to Jan, although in this
situation the issue manifests as a consequence of vCPU migration, I
think it is indeed more general, as in, without even the need to
consider a second pCPU:

pCPU1
=====
current == vCPU1
context_switch(next == idle)
!! __context_switch() is skipped
vcpu_migrate(vCPU1)
anything_that_uses_or_touches_context()

So, it must be anything_that_uses_or_touches_context() --knowing it
will be reading or touching the pCPU context-- that syncs the state,
before using or touching it (which is, e.g., what Jan's patch does).

The only other solution I see, is to get rid of lazy context switch.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-09 10:36                 ` Dario Faggioli
@ 2017-11-09 12:58                   ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2017-11-09 12:58 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Sergey Dyasli, Kevin Tian, Igor Druzhinin, George Dunlap,
	Andrew Cooper, Anshul Makkar, JunNakajima, xen-devel

>>> On 09.11.17 at 11:36, <raistlin@linux.it> wrote:
> Well, I'm afraid I only see two solutions:
> 1) we get rid of lazy context switch;
> 2) whatever it is that is happening at point c above, it needs to be 
>    aware that we use lazy context switch, and make sure to sync the 
>    context before playing with or altering it;

3) Better centralize the updating of v->processor, so that it becomes
reasonable to sync state there. Igor's idea of flushing state once it
is known (or at least pretty certain) that the vCPU won't run on the
prior pCPU next time it gets scheduled is certainly a reasonable one.
It just doesn't fit well with how the individual schedulers currently
behave.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-09 11:01                 ` Dario Faggioli
@ 2017-11-09 13:08                   ` Jan Beulich
  2017-11-09 14:16                     ` Dario Faggioli
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-11-09 13:08 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Sergey Dyasli, Kevin Tian, Igor Druzhinin, AndrewCooper,
	anshul makkar, jun.nakajima, xen-devel

>>> On 09.11.17 at 12:01, <raistlin@linux.it> wrote:
> Anyway, as I was trying to explain replaying to Jan, although in this
> situation the issue manifests as a consequence of vCPU migration, I
> think it is indeed more general, as in, without even the need to
> consider a second pCPU:
> 
> pCPU1
> =====
> current == vCPU1
> context_switch(next == idle)
> !! __context_switch() is skipped
> vcpu_migrate(vCPU1)
> anything_that_uses_or_touches_context()
> 
> So, it must be anything_that_uses_or_touches_context() --knowing it
> will be reading or touching the pCPU context-- that syncs the state,
> before using or touching it (which is, e.g., what Jan's patch does).

Well, generally after the vcpu_migrate(vCPU1) above we expect
nothing to happen at all on the pCPU, until another (non-idle)
vCPU gets scheduled onto it. The problem is with tasklet / softirq
(and hence also RCU) work. Tasklets already take care of this by
calling sync_local_execstate() before calling the handler. But
for softirqs this isn't really an option; I'm surprised to see that
tasklet code does this independently of what kind of tasklet that
is. Softirq tasklets aren't used very often, so I wonder if we have
a latent bug here. Otoh, if this was actually fine, adding a similar
call to rcu_do_batch() (or its caller) would be an option, I think.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-09 13:08                   ` Jan Beulich
@ 2017-11-09 14:16                     ` Dario Faggioli
  2017-11-09 14:39                       ` Jan Beulich
  2017-11-09 16:38                       ` Jan Beulich
  0 siblings, 2 replies; 30+ messages in thread
From: Dario Faggioli @ 2017-11-09 14:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Sergey Dyasli, Kevin Tian, Igor Druzhinin, AndrewCooper,
	anshul makkar, jun.nakajima, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2804 bytes --]

On Thu, 2017-11-09 at 06:08 -0700, Jan Beulich wrote:
> > > > On 09.11.17 at 12:01, <raistlin@linux.it> wrote:
> > 
> > pCPU1
> > =====
> > current == vCPU1
> > context_switch(next == idle)
> > !! __context_switch() is skipped
> > vcpu_migrate(vCPU1)
> > anything_that_uses_or_touches_context()
> > 
> > So, it must be anything_that_uses_or_touches_context() --knowing it
> > will be reading or touching the pCPU context-- that syncs the
> > state,
> > before using or touching it (which is, e.g., what Jan's patch
> > does).
> 
> Well, generally after the vcpu_migrate(vCPU1) above we expect
> nothing to happen at all on the pCPU, until another (non-idle)
> vCPU gets scheduled onto it. 
>
Ah, yes, my bad! What if I take vcpu_migrate() out of the above exec-
trace (which is what I wanted to do in my email already)?

pCPU1
=====
current == vCPU1
context_switch(next == idle)
!! __context_switch() is skipped
anything_that_uses_or_touches_context()

Point being, is the underlying and general "issue" here, really bound
to migrations, or is it something intrinsic of lazy context switch? I'm
saying it's the latter.

That being said, sure it makes sense to assume that, if we migrated the
vCPU from pCPU1 to pCPU2, it's highly unlikely that it will resume the
execution on pCPU1, and hence there is no point in leaving its context
there, and we could very well sync. It's a reasonable best-effort
measure, but can we rely on it for correctness? I don't think we can.

And generalizing the idea enough that we could then rely on it for
correctness, tends to be close enough to not doing lazy context switch
at all, I think.

> The problem is with tasklet / softirq
> (and hence also RCU) work. 
>
Yes.

> Tasklets already take care of this by
> calling sync_local_execstate() before calling the handler. But
> for softirqs this isn't really an option; I'm surprised to see that
> tasklet code does this independently of what kind of tasklet that
> is. 
>
Good point. Weird indeed.

> Softirq tasklets aren't used very often, so I wonder if we have
> a latent bug here. Otoh, if this was actually fine, adding a similar
> call to rcu_do_batch() (or its caller) would be an option, I think.
> 
We can have a look.

What about the effect on performance, though?

I mean, assuming that lazy context switch does a good job, with respect
to that, by avoiding synching in enough case where it is actually not
necessary, how do things change if we start to sync at any softirq,
even when the handler would have not required that?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-09 14:16                     ` Dario Faggioli
@ 2017-11-09 14:39                       ` Jan Beulich
  2017-11-09 16:38                       ` Jan Beulich
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2017-11-09 14:39 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Sergey Dyasli, Kevin Tian, Igor Druzhinin, AndrewCooper,
	anshul makkar, jun.nakajima, xen-devel

>>> On 09.11.17 at 15:16, <raistlin@linux.it> wrote:
> Ah, yes, my bad! What if I take vcpu_migrate() out of the above exec-
> trace (which is what I wanted to do in my email already)?
> 
> pCPU1
> =====
> current == vCPU1
> context_switch(next == idle)
> !! __context_switch() is skipped
> anything_that_uses_or_touches_context()
> 
> Point being, is the underlying and general "issue" here, really bound
> to migrations, or is it something intrinsic of lazy context switch? I'm
> saying it's the latter.

The general issue doesn't require vcpu_migrate(), I agree. The
specific VMX issue here does, though.

Thing is - I'm not convinced there's a general issue here in the first
place: Asynchronous code isn't supposed to modify state behind
the back of synchronous code. It just so happens that VMX code is
structured to violate that assumption when PML is in use.

> That being said, sure it makes sense to assume that, if we migrated the
> vCPU from pCPU1 to pCPU2, it's highly unlikely that it will resume the
> execution on pCPU1, and hence there is no point in leaving its context
> there, and we could very well sync. It's a reasonable best-effort
> measure, but can we rely on it for correctness? I don't think we can.

We can't right now, but code (from an abstract pov at least)
could be structured so we could rely on it.

> And generalizing the idea enough that we could then rely on it for
> correctness, tends to be close enough to not doing lazy context switch
> at all, I think.

I don't think so, no - we could still leave state in hardware in
anticipation that no other non-idle vCPU would be scheduled
on the local CPU. That's something that ought to help in
particular pinned vCPU-s.

>> The problem is with tasklet / softirq
>> (and hence also RCU) work. 
>>
> Yes.
> 
>> Tasklets already take care of this by
>> calling sync_local_execstate() before calling the handler. But
>> for softirqs this isn't really an option; I'm surprised to see that
>> tasklet code does this independently of what kind of tasklet that
>> is. 
>>
> Good point. Weird indeed.

I've added an item to my todo list to see whether I can figure
whether this is an okay thing to do.

>> Softirq tasklets aren't used very often, so I wonder if we have
>> a latent bug here. Otoh, if this was actually fine, adding a similar
>> call to rcu_do_batch() (or its caller) would be an option, I think.
>> 
> We can have a look.
> 
> What about the effect on performance, though?
> 
> I mean, assuming that lazy context switch does a good job, with respect
> to that, by avoiding synching in enough case where it is actually not
> necessary, how do things change if we start to sync at any softirq,
> even when the handler would have not required that?

I wasn't suggesting this for softirqs, but only (if at all) for RCU. But
yes, there would a performance hit from this; not sure how small
or large. But as you can see, for tasklets the hit is taken
unconditionally.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths
  2017-11-09 14:16                     ` Dario Faggioli
  2017-11-09 14:39                       ` Jan Beulich
@ 2017-11-09 16:38                       ` Jan Beulich
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2017-11-09 16:38 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Sergey Dyasli, Kevin Tian, Igor Druzhinin, AndrewCooper,
	anshul makkar, jun.nakajima, xen-devel

>>> On 09.11.17 at 15:16, <raistlin@linux.it> wrote:
> On Thu, 2017-11-09 at 06:08 -0700, Jan Beulich wrote:
>> Tasklets already take care of this by
>> calling sync_local_execstate() before calling the handler. But
>> for softirqs this isn't really an option; I'm surprised to see that
>> tasklet code does this independently of what kind of tasklet that
>> is. 
>>
> Good point. Weird indeed.
> 
>> Softirq tasklets aren't used very often, so I wonder if we have
>> a latent bug here. Otoh, if this was actually fine, adding a similar
>> call to rcu_do_batch() (or its caller) would be an option, I think.
>> 
> We can have a look.

I'm sorry for the noise here - I've once again forgotten that
sync_local_execstate() does nothing if a lazy switch hasn't
happened already. Hence leaving the potentially bad
performance effect aside, doing the same for RCU (or more
generally softirqs) would be an option.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-11-09 16:38 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-16 11:10 [PATCH v2 0/2] x86: context switch handling adjustments Jan Beulich
2017-02-16 11:15 ` [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths Jan Beulich
2017-02-16 12:27   ` Andrew Cooper
2017-02-16 12:35     ` Jan Beulich
2017-02-17  3:48       ` Tian, Kevin
2017-02-17  8:40   ` Sergey Dyasli
2017-02-17  9:01     ` Jan Beulich
2017-10-27 17:42   ` Igor Druzhinin
2017-11-02 19:46     ` Igor Druzhinin
2017-11-07  8:07       ` Jan Beulich
2017-11-07 14:24         ` Igor Druzhinin
2017-11-07 14:55           ` Jan Beulich
2017-11-07 15:52             ` Igor Druzhinin
2017-11-07 16:31               ` Jan Beulich
2017-11-09 10:05               ` Jan Beulich
2017-11-09 10:36                 ` Dario Faggioli
2017-11-09 12:58                   ` Jan Beulich
2017-11-09  9:54           ` Dario Faggioli
2017-11-09 10:17             ` Jan Beulich
2017-11-09 10:36               ` Sergey Dyasli
2017-11-09 11:01                 ` Dario Faggioli
2017-11-09 13:08                   ` Jan Beulich
2017-11-09 14:16                     ` Dario Faggioli
2017-11-09 14:39                       ` Jan Beulich
2017-11-09 16:38                       ` Jan Beulich
2017-11-09 10:39               ` Dario Faggioli
2017-11-07 15:16         ` Jan Beulich
2017-02-16 11:16 ` [PATCH v2 2/2] x86: package up context switch hook pointers Jan Beulich
2017-02-16 11:23   ` Andrew Cooper
2017-02-17  3:49   ` Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.